Video: SQL Queries in R using sqldf

This video covers how to run SQL queries using the ‘sqldf’ package within R. This sqldf tutorial was part of a Keystone Solutions podcast discussion about data science and what skills beginning analysts should be learning to improve their skill set.

The example files from this tutorial can be downloaded from this link:

Example Data files


Adding a "Back to Top" Link on WordPress

In a previous post, I discussed how to remove “Powered By WordPress” from the footer of the Scrappy theme.  You might also want to add a “Back to Top” link in the footer, especially if your blog has a lot of vertical distance from the top to the bottom.  Here’s how to do it…

Step 1:  Modifying the Scrappy header.php file

The first step in creating our ‘Back to Top’ link is to modify the header.php file in our WordPress child theme by adding an empty HTML link using the <a> tag.  Although you can place an “anchor” like this anywhere you want on your site, we’ll add this empty link to the very top of the page for this tutorial.

Find the line of code in your header.php file that says <div id=page class="hfeed site">.  Here we’ll add our extra line of code with the tag (line 5 of the code snippet):

1
2
3
4
5
6
7
8
</head>

<body <?php body_class(); ?>>
<div id="page" class="hfeed site">
<a name= "TopOfPage"/a>
        <?php do_action( 'before' ); ?>
        <div class="wrapper">
                <header id="masthead" class="site-header" role="banner">

Normally when using an '<a>' tag, we would also use an href in order to create a link.  However, in this case we’re just defining an empty element in the page that we can refer to later using our ‘Back to Top’ link.

Step 2: Modifying the Scrappy footer.php file

With our anchor in place, we can now add our link.  For this tutorial, we’re going to place the link right above the widget area in the Scrappy footer.

Opening up our footer.php file, we need to look for the code <div class = "footer-sidebars">.  Underneath this line, we’ll add another <a> tag, but this time, we’ll add an href tag in order to have a link to send the page back to the top (line 5 of the code snippet):

1
2
3
4
5
6
7
8
9
10
                       </div><!-- #main -->
       </div><!-- .wrapper -->
       <footer id="colophon" class="site-footer" role="contentinfo">
               <div class="footer-sidebars">
                       <a href=#TopOfPage> Your Text Goes Here </a>
                       <?php get_sidebar( 'footer1' );
                                 get_sidebar( 'footer2' );
                                 get_sidebar( 'footer3' ); ?>
                       <div class="stripes">&nbsp;</div>
               </div>

Notice that the link we have here uses the same “TopOfPage” reference as we did in Step 1, this time with a # sign in front of the word. This lets the page code that we want to point to the “TopOfPage” anchor elsewhere on site. Note also that we don’t need to make any domain-specific references like we would do with a “normal” http://www.-type of link.

Obviously, feel free to change the reference to “Your Text Goes Here” to be whatever message you’d like the link to say 🙂

Success!

Once you are done with these two changes, the bottom of your Scrappy WordPress theme should look similar to this:

scrappy-wordpress-theme-back-to-top

'Back to Top' link added to the bottom of the Scrappy WordPress theme

The styling of the link should be right-aligned to your main article width and the link styling will be handled automatically based on the rules set in your CSS file.


Video: Overlay Histogram in R (Normal, Density, Another Series)

This video explains how to overlay histogram plots in R for 3 common cases: overlaying a histogram with a normal curve, overlaying a histogram with a density curve, and overlaying a histogram with a second data series plotted on a secondary axis.

Note: Towards the end of the video (maybe minute 14 or so), I make a language error when talking about the padj parameter in the mtext function…the setting doesn’t “left truncated” the label, I meant “right align”, “left align”, etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#Step 0:  load/prepare data

#Read in data
sample_data <- read.csv("~/Desktop/test_data.csv")

# "Explode" counts by age back to unsummarized "raw" data
age.exploded <- rep.int(sample_data$age, sample_data$count)


#1. Histogram with normal distributon overlaid or density curve


#1A.  Create histogram
hist(age.exploded, xlim= c(0,20), ylim= c(0,.2), breaks=seq(min(age.exploded),
  max(age.exploded), length=22), xlab = "Age", ylab= "Percentage of Accounts",
  main = "Age Distribution of Accounts\n (where 0 <= age <= 20)",
  prob= TRUE, col= "lightgray")

#1B.  Do one of the following, either put the normal distribution on the histogram
#     or put the smoothed density function

#Calculate normal distribution having mean/sd equal to data plotted in the
#histogram above
points(seq(min(age.exploded), max(age.exploded), length.out=500),
       dnorm(seq(min(age.exploded), max(age.exploded), length.out=500),
             mean(age.exploded), sd(age.exploded)), type="l", col="red")

#Add smoothed density function to histogram, smoothness toggled using
#"adjust" parameter
lines(density(age.exploded, adjust = 2), col = "blue")

#2 Histogram with line plot overlaid

#2A.  Create histogram with extra border space on right-hand side

#Extra border space "2" on right  (bottom, left, top, right)
par(oma=c(0,0,0,2))

hist(age.exploded, xlim= c(0,20), ylim= c(0,.2),
     breaks=seq(min(age.exploded), max(age.exploded), length=22), xlab = "Age",
     ylab= "Percentage of Accounts", main = "Age Distribution of Accounts vs. Subscription Rate \n (where reported age <= 20)",
     prob= TRUE, col= "lightgray")

#2B.  Add overlaid line plot, create a right-side numeric axis
par(new=T)
plot(sample_data$subscribe_pct, xlab= "", ylab="", type = "b", col = "red", axes=FALSE)  
axis(4)

#2C.  Add right-side axis label

mtext(text="Subscription Rate",side=4, outer=TRUE, padj=1)

File Download:

Histogram overlay in R code and sample data file


  • RSiteCatalyst Version 1.4.13 Release Notes
  • RSiteCatalyst Version 1.4.12 (and 1.4.11) Release Notes
  • Self-Service Adobe Analytics Data Feeds!
  • RSiteCatalyst Version 1.4.10 Release Notes
  • WordPress to Jekyll: A 30x Speedup
  • Bulk Downloading Adobe Analytics Data
  • Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis
  • Adobe: Give Credit. You DID NOT Write RSiteCatalyst.
  • RSiteCatalyst Version 1.4.8 Release Notes
  • Adobe Analytics Clickstream Data Feed: Loading To Relational Database
  • Calling RSiteCatalyst From Python
  • RSiteCatalyst Version 1.4.7 (and 1.4.6.) Release Notes
  • RSiteCatalyst Version 1.4.5 Release Notes
  • Getting Started: Adobe Analytics Clickstream Data Feed
  • RSiteCatalyst Version 1.4.4 Release Notes
  • RSiteCatalyst Version 1.4.3 Release Notes
  • RSiteCatalyst Version 1.4.2 Release Notes
  • Destroy Your Data Using Excel With This One Weird Trick!
  • RSiteCatalyst Version 1.4.1 Release Notes
  • Visualizing Website Pathing With Sankey Charts
  • Visualizing Website Structure With Network Graphs
  • RSiteCatalyst Version 1.4 Release Notes
  • Maybe I Don't Really Know R After All
  • Building JSON in R: Three Methods
  • Real-time Reporting with the Adobe Analytics API
  • RSiteCatalyst Version 1.3 Release Notes
  • Adobe Analytics Implementation Documentation in 60 Seconds
  • RSiteCatalyst Version 1.2 Release Notes
  • Clustering Search Keywords Using K-Means Clustering
  • RSiteCatalyst Version 1.1 Release Notes
  • Anomaly Detection Using The Adobe Analytics API
  • (not provided): Using R and the Google Analytics API
  • My Top 20 Least Useful Omniture Reports
  • For Maximum User Understanding, Customize the SiteCatalyst Menu
  • Effect Of Modified Bounce Rate In Google Analytics
  • Adobe Discover 3: First Impressions
  • Using Omniture SiteCatalyst Target Report To Calculate YOY growth
  • Google Analytics Individual Qualification (IQ) - Passed!
  • Google Analytics SEO reports: Not Ready For Primetime?
  • An Afternoon With Edward Tufte
  • Google Analytics Custom Variables: A Page-Level Example
  • Xchange 2011: Think Tank and Harbor Cruise
  • Google Analytics for WordPress: Two Methods
  • WordPress Stats or Google Analytics? Yes!
  • Parallelizing Distance Calculations Using A GPU With CUDAnative.jl
  • Building a Data Science Workstation (2017)
  • JuliaCon 2015: Everyday Analytics and Visualization (video)
  • Vega.jl, Rebooted
  • Sessionizing Log Data Using data.table [Follow-up #2]
  • Sessionizing Log Data Using dplyr [Follow-up]
  • Sessionizing Log Data Using SQL
  • Review: Data Science at the Command Line
  • Introducing Twitter.jl
  • Code Refactoring Using Metaprogramming
  • Evaluating BreakoutDetection
  • Creating A Stacked Bar Chart in Seaborn
  • Visualizing Analytics Languages With VennEuler.jl
  • String Interpolation for Fun and Profit
  • Using Julia As A "Glue" Language
  • Five Hard-Won Lessons Using Hive
  • Using SQL Workbench with Apache Hive
  • Getting Started With Hadoop, Final: Analysis Using Hive & Pig
  • Quickly Create Dummy Variables in a Data Frame
  • Using Amazon EC2 with IPython Notebook
  • Adding Line Numbers in IPython/Jupyter Notebooks
  • Fun With Just-In-Time Compiling: Julia, Python, R and pqR
  • Getting Started Using Hadoop, Part 4: Creating Tables With Hive
  • Tabular Data I/O in Julia
  • Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob
  • A Beginner's Look at Julia
  • Getting Started Using Hadoop, Part 3: Loading Data
  • Innovation Will Never Be At The Push Of A Button
  • Getting Started Using Hadoop, Part 2: Building a Cluster
  • Getting Started Using Hadoop, Part 1: Intro
  • Instructions for Installing & Using R on Amazon EC2
  • Video: SQL Queries in R using sqldf
  • Video: Overlay Histogram in R (Normal, Density, Another Series)
  • Video: R, RStudio, Rcmdr & rattle
  • Getting Started Using R, Part 2: Rcmdr
  • Getting Started Using R, Part 1: RStudio
  • Learning R Has Really Made Me Appreciate SAS