Video: Overlay Histogram in R (Normal, Density, Another Series)

This video explains how to overlay histogram plots in R for 3 common cases: overlaying a histogram with a normal curve, overlaying a histogram with a density curve, and overlaying a histogram with a second data series plotted on a secondary axis.

Note: Towards the end of the video (maybe minute 14 or so), I make a language error when talking about the padj parameter in the mtext function…the setting doesn’t “left truncated” the label, I meant “right align”, “left align”, etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#Step 0:  load/prepare data

#Read in data
sample_data <- read.csv("~/Desktop/test_data.csv")

# "Explode" counts by age back to unsummarized "raw" data
age.exploded <- rep.int(sample_data$age, sample_data$count)


#1. Histogram with normal distributon overlaid or density curve


#1A.  Create histogram
hist(age.exploded, xlim= c(0,20), ylim= c(0,.2), breaks=seq(min(age.exploded),
  max(age.exploded), length=22), xlab = "Age", ylab= "Percentage of Accounts",
  main = "Age Distribution of Accounts\n (where 0 <= age <= 20)",
  prob= TRUE, col= "lightgray")

#1B.  Do one of the following, either put the normal distribution on the histogram
#     or put the smoothed density function

#Calculate normal distribution having mean/sd equal to data plotted in the
#histogram above
points(seq(min(age.exploded), max(age.exploded), length.out=500),
       dnorm(seq(min(age.exploded), max(age.exploded), length.out=500),
             mean(age.exploded), sd(age.exploded)), type="l", col="red")

#Add smoothed density function to histogram, smoothness toggled using
#"adjust" parameter
lines(density(age.exploded, adjust = 2), col = "blue")

#2 Histogram with line plot overlaid

#2A.  Create histogram with extra border space on right-hand side

#Extra border space "2" on right  (bottom, left, top, right)
par(oma=c(0,0,0,2))

hist(age.exploded, xlim= c(0,20), ylim= c(0,.2),
     breaks=seq(min(age.exploded), max(age.exploded), length=22), xlab = "Age",
     ylab= "Percentage of Accounts", main = "Age Distribution of Accounts vs. Subscription Rate \n (where reported age <= 20)",
     prob= TRUE, col= "lightgray")

#2B.  Add overlaid line plot, create a right-side numeric axis
par(new=T)
plot(sample_data$subscribe_pct, xlab= "", ylab="", type = "b", col = "red", axes=FALSE)  
axis(4)

#2C.  Add right-side axis label

mtext(text="Subscription Rate",side=4, outer=TRUE, padj=1)

File Download:

Histogram overlay in R code and sample data file

  • RSiteCatalyst Version 1.4.16 Release Notes
  • Using RSiteCatalyst With Microsoft PowerBI Desktop
  • RSiteCatalyst Version 1.4.14 Release Notes
  • RSiteCatalyst Version 1.4.13 Release Notes
  • RSiteCatalyst Version 1.4.12 (and 1.4.11) Release Notes
  • Self-Service Adobe Analytics Data Feeds!
  • RSiteCatalyst Version 1.4.10 Release Notes
  • WordPress to Jekyll: A 30x Speedup
  • Bulk Downloading Adobe Analytics Data
  • Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis
  • Adobe: Give Credit. You DID NOT Write RSiteCatalyst.
  • RSiteCatalyst Version 1.4.8 Release Notes
  • Adobe Analytics Clickstream Data Feed: Loading To Relational Database
  • Calling RSiteCatalyst From Python
  • RSiteCatalyst Version 1.4.7 (and 1.4.6.) Release Notes
  • RSiteCatalyst Version 1.4.5 Release Notes
  • Getting Started: Adobe Analytics Clickstream Data Feed
  • RSiteCatalyst Version 1.4.4 Release Notes
  • RSiteCatalyst Version 1.4.3 Release Notes
  • RSiteCatalyst Version 1.4.2 Release Notes
  • Destroy Your Data Using Excel With This One Weird Trick!
  • RSiteCatalyst Version 1.4.1 Release Notes
  • Visualizing Website Pathing With Sankey Charts
  • Visualizing Website Structure With Network Graphs
  • RSiteCatalyst Version 1.4 Release Notes
  • Maybe I Don't Really Know R After All
  • Building JSON in R: Three Methods
  • Real-time Reporting with the Adobe Analytics API
  • RSiteCatalyst Version 1.3 Release Notes
  • Adobe Analytics Implementation Documentation in 60 Seconds
  • RSiteCatalyst Version 1.2 Release Notes
  • Clustering Search Keywords Using K-Means Clustering
  • RSiteCatalyst Version 1.1 Release Notes
  • Anomaly Detection Using The Adobe Analytics API
  • (not provided): Using R and the Google Analytics API
  • My Top 20 Least Useful Omniture Reports
  • For Maximum User Understanding, Customize the SiteCatalyst Menu
  • Effect Of Modified Bounce Rate In Google Analytics
  • Adobe Discover 3: First Impressions
  • Using Omniture SiteCatalyst Target Report To Calculate YOY growth
  • ODSC webinar: End-to-End Data Science Without Leaving the GPU
  • PyData NYC 2018: End-to-End Data Science Without Leaving the GPU
  • Data Science Without Leaving the GPU
  • Getting Started With OmniSci, Part 2: Electricity Dataset
  • Getting Started With OmniSci, Part 1: Docker Install and Loading Data
  • Parallelizing Distance Calculations Using A GPU With CUDAnative.jl
  • Building a Data Science Workstation (2017)
  • JuliaCon 2015: Everyday Analytics and Visualization (video)
  • Vega.jl, Rebooted
  • Sessionizing Log Data Using data.table [Follow-up #2]
  • Sessionizing Log Data Using dplyr [Follow-up]
  • Sessionizing Log Data Using SQL
  • Review: Data Science at the Command Line
  • Introducing Twitter.jl
  • Code Refactoring Using Metaprogramming
  • Evaluating BreakoutDetection
  • Creating A Stacked Bar Chart in Seaborn
  • Visualizing Analytics Languages With VennEuler.jl
  • String Interpolation for Fun and Profit
  • Using Julia As A "Glue" Language
  • Five Hard-Won Lessons Using Hive
  • Using SQL Workbench with Apache Hive
  • Getting Started With Hadoop, Final: Analysis Using Hive & Pig
  • Quickly Create Dummy Variables in a Data Frame
  • Using Amazon EC2 with IPython Notebook
  • Adding Line Numbers in IPython/Jupyter Notebooks
  • Fun With Just-In-Time Compiling: Julia, Python, R and pqR
  • Getting Started Using Hadoop, Part 4: Creating Tables With Hive
  • Tabular Data I/O in Julia
  • Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob
  • A Beginner's Look at Julia
  • Getting Started Using Hadoop, Part 3: Loading Data
  • Innovation Will Never Be At The Push Of A Button
  • Getting Started Using Hadoop, Part 2: Building a Cluster
  • Getting Started Using Hadoop, Part 1: Intro
  • Instructions for Installing & Using R on Amazon EC2
  • Video: SQL Queries in R using sqldf
  • Video: Overlay Histogram in R (Normal, Density, Another Series)
  • Video: R, RStudio, Rcmdr & rattle
  • Getting Started Using R, Part 2: Rcmdr
  • Getting Started Using R, Part 1: RStudio
  • Learning R Has Really Made Me Appreciate SAS