(not provided): Using R and the Google Analytics API

(not provided) terms from Google average 35%-60% of all organic search terms

(not provided) terms from Google average 35%-60% of all Google organic search terms

For power users of Google Analytics, there is a heavy dose of spreadsheet work that accompanies any decent analysis.  But even with Excel in tow, it’s often difficult to get the data just right without resorting to formula hacks and manual table formatting.  This is where the Google Analytics API and R can come very much in handy.

Connecting to the Google Analytics API using R

I’m not going to say that connecting to the Google Analytics API is easy per se, but with the rga package written by “skardhamar” on GitHub, it’s easier than if you had to develop the connection code yourself!  However, before you can get started making calls to the Google Analytics API, you need to register within the Google Analytics API console.  There you can define a new project and then you’ll be able to make your API calls via R.

After you have your API access straightened out, the GitHub page for the rga package has all the details in how to authenticate using the rga.open function.  I chose to use the “where” argument so that I can continuously hit the API across many sessions without having to do browser authentication each time.

rga.open(instance = "ga", where = "~/Documents/R/ga-api")

Analyzing (not provided) as a Google Analytics organic search term

Once connected to the Google Analytics API, now it’s time to submit our API calls.  I used two API calls to create the graph at the top of the post, which shows the percentage of all Google organic search terms that are listed as “(not provided)” for the entire history of this blog.  The two API calls were to download the number of total organic search term visits by date from Google and the number of “(not provided)” visits by date, also from Google.  Here’s the API call for the “(not provided)” data (replace XXXXXXXX with your profile ID):

visits_notprovided.df <- ga$getData(XXXXXXXX,
start.date = "2011-01-01",
end.date = "2013-01-10",
metrics = "ga:visits",
filters = "ga:keyword==(not provided);ga:source==google;ga:medium==organic",
dimensions = "ga:date",
max = 1500,
sort = "ga:date")

The result of this API call provides an R data frame containing two columns: date and number of visits where the search term was “(not provided)”.

Munging the data using R

After pulling the data into R, all that’s left is to merge the data frames, do a few calculations, then make the boxplot.  Because the default object returned by the rga package is a data frame, it’s trivial to use the merge function in R to join the data frames, then use a few calculated columns to create the percentage of visits that are “(not provided)”

What was that Google, only 10% of searches are supposed to be (not provided)?

By now, it’s beating a dead horse that the percentage of “(not provided)” search results from Google FAR exceeds what they said it would.  This blog gets about 5,000 visits a month, and due to the technical nature of the blog many of the users are using Chrome (which does secure search automatically) or from iOS (which also does secure search).  But at minimum, this graph illustrates the power of using the Google Analytics API via R; I can update this graph at my leisure by running my script, and I can create a graphic that’s not possible within Excel.

Full code:

  1. I’m having problems with the SSL certificate
    “SSL certificate problem, verify that the CA cert is OK”
    I was able to find some solutions using Curl by setting “ssl.verifypeer = FALSE” but I could not find a way to do it using the  rga package, do you have any ideas?
    Anyway, nice post I’m new to R and trying to learn.

  2. Well, I was able to make it work but I do not know if it is the most appropriate way.
    The SSL error happens in the rga.open command, but somehow I made it work using the solution proposed in the address below:
    However, after I used the solution above I was able to run your code without errors in the rga.open…
    …after testing I discovered that the line: “options(RCurlOptions = list(capath = system.file(“CurlSSL”, “cacert.pem”, package = “RCurl”), ssl.verifypeer = FALSE))” was kind of making the certificate work… I’ll run more tests and send to you the results…
    Thank you

  3. randyzwitch Sandor I think the error has something to do with running OAuth on Windows in R. This post on StackOverflow has a suggestion that has worked for me with both twitteR and rga: http://stackoverflow.com/questions/8122879/roauth-on-windows-using-r
    Specifically running this code before trying to authenticate:library(RCurl)
    options(RCurlOptions= list(cainfo = system.file(“CurlSSL”,”cacert.pem”,package=”RCurl”)))