This video explains how to overlay histogram plots in R for 3 common cases: overlaying a histogram with a normal curve, overlaying a histogram with a density curve, and overlaying a histogram with a second data series plotted on a secondary axis.
Note: Towards the end of the video (maybe minute 14 or so), I make a language error when talking about the padj
parameter in the mtext function…the setting doesn’t “left truncated” the label, I meant “right align”, “left align”, etc.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#Step 0: load/prepare data
#Read in data
sample_data <- read.csv("~/Desktop/test_data.csv")
# "Explode" counts by age back to unsummarized "raw" data
age.exploded <- rep.int(sample_data$age, sample_data$count)
#1. Histogram with normal distributon overlaid or density curve
#1A. Create histogram
hist(age.exploded, xlim= c(0,20), ylim= c(0,.2), breaks=seq(min(age.exploded),
max(age.exploded), length=22), xlab = "Age", ylab= "Percentage of Accounts",
main = "Age Distribution of Accounts\n (where 0 <= age <= 20)",
prob= TRUE, col= "lightgray")
#1B. Do one of the following, either put the normal distribution on the histogram
# or put the smoothed density function
#Calculate normal distribution having mean/sd equal to data plotted in the
#histogram above
points(seq(min(age.exploded), max(age.exploded), length.out=500),
dnorm(seq(min(age.exploded), max(age.exploded), length.out=500),
mean(age.exploded), sd(age.exploded)), type="l", col="red")
#Add smoothed density function to histogram, smoothness toggled using
#"adjust" parameter
lines(density(age.exploded, adjust = 2), col = "blue")
#2 Histogram with line plot overlaid
#2A. Create histogram with extra border space on right-hand side
#Extra border space "2" on right (bottom, left, top, right)
par(oma=c(0,0,0,2))
hist(age.exploded, xlim= c(0,20), ylim= c(0,.2),
breaks=seq(min(age.exploded), max(age.exploded), length=22), xlab = "Age",
ylab= "Percentage of Accounts", main = "Age Distribution of Accounts vs. Subscription Rate \n (where reported age <= 20)",
prob= TRUE, col= "lightgray")
#2B. Add overlaid line plot, create a right-side numeric axis
par(new=T)
plot(sample_data$subscribe_pct, xlab= "", ylab="", type = "b", col = "red", axes=FALSE)
axis(4)
#2C. Add right-side axis label
mtext(text="Subscription Rate",side=4, outer=TRUE, padj=1)
File Download: