Google Analytics SEO reports: Not Ready For Primetime?

On October 4th, Google announced that the Webmaster Tools/Google Analytics integration was now available to all users. The three new reports (Queries, Landing Pages, and Geographical Summary) are intended to allow site owners and content creators to monitor how well their content is indexed in Google for their keywords of interest across time, all within the Google Analytics interface.  However, based on my preliminary research from the first few days of data, I have to question the current algorithm’s accuracy.

Google Analytics SEO reports:  Impressions, Clicks,  Average Position, CTR

google-seo-query-report

Google Analytics SEO Report: Queries

All three reports have the same format, showing Impressions in Google search, Clicks, Average Link position (Organic) and Click-through Rate (CTR).  You can show this data by:

  • Search query: to understand how specific search terms are ranking
  • Landing page: to show how well individual pages (and their position) lead to clicks
  • Geography:  to understand how well your pages are ranking in your target market(s)

To avoid problems of false precision, these reports appear to round impressions to the nearest 10 for numbers less than 1000, and then to the nearest hundred when impressions are > 1000. Similarly, clicks aren’t reported when there are less than 10, although a CTR is reported…which is it Google, not enough data or is it measured precisely?

I rank WHERE for a female body part?

google-seo-report-womens-body-part

Ranked 8th on average for these keywords Google? I think not.

In the web analytics world, if you aren’t comfortable with imprecision and incomplete data, you’ll get burned out pretty quickly.  My above example of the exact click-through rate calculated from an inexact display of clicks is a minor nitpick.  However, when I see data from the table above being written into my account, I have to wonder just how precisely Google is measuring their impressions data.

The table above is from my other blog about the Duke MBA; I’m QUITE certain it doesn’t rank, on average, 8th for anything to do with female body parts!  I’d be the most in demand SEO in the world if I could pull that off, without even having the words on my page.  I would’ve been comfortable chalking that result up to a weird bug, had the page the query references was mangled.  It turns out, they all link to the same exact blog post, the story of a current (female, naturally) student’s journey from a small town in India to attending business school.  From what I can tell, the error is persistent as well, showing a small number of impressions every day.

Web Analytics:  Again, it all comes down to the Analyst

The above example is somewhat tongue-in-cheek; obviously it’s a data error, and I’m not running a multi-million dollar e-commerce website.  Heck, I’m not even paying for Google Analytics.  But had I been part of the Beta test of the Google Analytics/Webmaster Tools integration, I think I would’ve provided the following comments:

  • Don’t show search terms where there are low number of impressions: if you are getting 50 impressions per day and you rank 213th, you’re not really ranking for that term
  • Accuracy vs. Precision:  Either round the numbers or don’t.  Rounding impressions, putting <10 for clicks, then dividing the two to provide a CTR doesn’t provide confidence in the data
  • Provide the same reporting drill-down capabilities from Webmaster Tools within Google Analytics: To find out which page is ranking for this error term, I started in Google Analytics, but needed to go to Webmaster Tools.  Kinda defeats the purpose of having the data in the Google Analytics interface.

I’m confident now that Google has decided to step into the paid web analytics arena that these reports will only improve over time.  For now, I’ll be taking a sharp eye to the results, manually typing the queries into Google where necessary to see if I’m truly ranking where it says I am.

(And yes, I verified I don’t rank 8th for pornography terms ;))


An Afternoon With Edward Tufte

tufte-data-visualization

There had to be 400+ people in the seminar!

Yesterday, I had the opportunity to attend the “Presenting Data and Information” seminar hosted by Edward Tufte in Philadelphia.  A world-renowned expert in the field of data presentation/visualization, Edward Tufte has written seven books outlining terrible and fantastic examples of data display (and how to make sure your charts and tables fall into the latter!)

Unfortunately, as great as each of these books are at explaining methods for data visualization, the seminar was little more than a topical discussion of his book material, rather than a concise summary of what pitfalls to avoid. However, for the relatively low cost of the seminar ($380) and receiving hardcover editions of 4 of 7 of Tufte’s works, there are many worse ways to spend your time and money if you are a data enthusiast!

Course materials

Each attendee received a copy of the following books:

These books are so dense with information that it will probably take me a month or more to read each book!

If your data are boring, you’ve got the wrong data

Of the many positives of this seminar, I appreciated how Tufte hammered on a few main topics, the most important of which is 'If your data is boring, you've got the wrong data'.  I think this is often overlooked when thinking about success in business; if your meetings are dull and people dread when you send out a meeting request, you need better content!  It's (usually) not a visualization problem, and in many ways, it's not a presentation style problem.  If you've got great data, people will overlook an annoying presenter.  But without content that speaks to what the audience is interested in knowing, you might as well not give a presentation at all.

Don't fall into the PowerPoint trap!

The other main point that Tufte really hammered on was if you let the limitations of a tool like PowerPoint dictate how you perform and present analysis, then you've failed as an analyst.  Humans have an extraordinary ability to process dense amounts of information; by limiting yourself to presenting your analysis in 3 bullets and 10 words per page, you are just perpetuating the 'stupidity' (his words) of that 'authoritarian form of communication'.

As an alternative to PowerPoint style charts and graphs, the seminar really focused on hand-drawn illustrations from Galileo's Sunspot discovery and examples from cartographers about how to present multi-variate data structures.  Even though paper (or a PowerPoint slide on-screen) is limited in two dimensions, there are many ways to increase the information density to six or more dimensions.  Sparklines were also discussed in detail, to keep the data in-line with text and to be able to show data trends where the actual numbers aren't necessarily important (or are presented elsewhere).

Simultaneously negative and pie-in-the-sky

I'm not going to focus too much on the negatives here, but one thing that really surprised me about this seminar was how negative in tone the presentation seemed.  I realize part of it was sarcasm (and possibly an affectation), but I would've preferred approaching the topic as what can/should be done to advance the cause, instead of what 'sucks'. Everyone in the room is acutely aware of what sucks in the PowerPoint culture of the business world; moving past that is what everyone was there to learn.

Simultaneously, when talking about improvements, most of them seemed to be unrealistic to actually implement in the real world (not all of us live in Ivory Tower academia, Dr. Tufte 🙂 ).  Suggestions like stripping slides of 'administrative overhead' like corporate logos and style sheets, bringing the level of a presentation WAY up as if everyone is as smart as the presenter, and writing long prose instead of highlighting comments are just unrealistic for most workers.  Most of the suggestions are corporate culture issues, and ones that a lowly data analyst isn't going to be able to change.

Summary: The right data should be able to 'sell' any presentation

In the end, a 5-hour seminar isn't going to change the business world or turn anyone into a super-analyst. But hearing Dr. Tufte speak about elegant design in data visualizations reminded me that I'm the one that controls the outcome of any presentation.  With the right data, shown properly, I should be able to 'sell' anyone on an idea without having to do any salesmanship at all.  The data is what sells an idea, not slick talking and 3 bullets per page.


Google Analytics Custom Variables: A Page-Level Example

Once you’ve implemented Google Analytics on your WordPress blog, you’ll likely find that the default reports aren’t providing the site-specific information you are looking for…or, maybe just not at the level of aggregation you’d prefer.   Google Analytics custom variables provide a method of capturing your site-specific information, depending on whether the information changes once per visitor, once per session, or once per page.  Examples of custom variable usage includes:

  • Demographic information, such as Gender (Visitor-level, never changes)
  • Visitor logs in to your website (Session-level, may not log in during future visits)
  • Each section of the website a visitor “touches” (Page-level, changing multiple times during a session)

This tutorial will cover the Page-level custom variable type, capturing the WordPress Category for each blog post.  With this information, we’ll be able to see which categories of posts are most popular on your WordPress blog over time.

Setting a Google Analytics custom variable

To set a Google Analytics custom variable, we need to use the following syntax:

1
_setCustomVar(index, name, value, opt_scope)

The index section of the variable indicates which of the five allowable custom variables we want to use to record our information (slot 1-5).  name indicates what we want to call our variable.  value is going to be the actual value we are looking to save.  And finally, opt_scope represents whether we want the variable to be page-level, session-level, or visitor-level.

Recording WordPress category into a custom variable

In order to capture the WordPress category in a Google Analytics custom variable, we’re going to use a combination of PHP, WordPress functions, and Google Analytics code.  Here’s the code snippet we’re going to use:

1
2
3
4
5
6
7
8
if (is_single () ) {

$category = get_the_category();
echo "_gaq.push(['_setCustomVar', 2,'Category','". $category[0]->cat_name. "', 3]);";

} else {

}

The is_single part of the code is a WordPress function, which evaluates whether or not a given page is a single post.  Since only single post pages have categories, we use this function to set the Google Analytics custom variable only when there is going to be a category value available on the page.  The $category part of the code is a PHP variable that stores the entire array of WordPress info that goes along with the get_the_category function. Finally, the part of the code that starts echo is the PHP code needed to build the Google Analytics custom variable string we want to have. 

Within this code, you can see the _setCustomVar code described in the first part of the tutorial; we’re setting the index value to 2, which means we’re using Google Analytics Custom Variable 2.  The name of the variable will be Category, the Value to be set is the WordPress category value (from the "'". $category[0]->cat\_name. "' variable), and the opt_scope value is set to 3, which means page-level.

Incorporating custom variable code into Google Analytics tracking code

According to Justin Cutroni, who literally wrote the book on Google Analytics, we want to put our custom variable code BEFORE the _trackPageview portion of the Google Analytics tracking code whenever possible.  This is because on the last page of a visit, if your custom variable code is after the _trackPageview code, Google Analytics won’t “see” the custom variable code, since the data has to tag along with a _trackPageview call. Here’s what the final set of code will look like (place in your header.php file):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<script type="text/javascript">

var _gaq =_gaq || [];
 _gaq.push(['_setAccount', 'UA-XXXXXXXX-X']);

if (is_single () ) {

$category = get_the_category();
echo "_gaq.push(['_setCustomVar', 2,'Category','". $category[0]->cat_name. "', 3]);";

} else {

}

_gaq.push(['_trackPageview']);
_gaq.push(['_trackPageLoadTime']);

(function() {

 var ga = document.createElement('script');
 ga.type = 'text/javascript';
 ga.async = true;

 ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
 var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga,s);
})();

</script>

Example of Custom Variable report

Here’s what the report will look like in Google Analytics.  To see the report, go to Visitors -> Demographics -> Custom Variables.

google-analytics-custom-variables

WordPress Categories in Google Analytics Custom Variable 2


  • RSiteCatalyst Version 1.4.12 (and 1.4.11) Release Notes
  • Self-Service Adobe Analytics Data Feeds!
  • RSiteCatalyst Version 1.4.10 Release Notes
  • WordPress to Jekyll: A 30x Speedup
  • Bulk Downloading Adobe Analytics Data
  • Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis
  • Adobe: Give Credit. You DID NOT Write RSiteCatalyst.
  • RSiteCatalyst Version 1.4.8 Release Notes
  • Adobe Analytics Clickstream Data Feed: Loading To Relational Database
  • Calling RSiteCatalyst From Python
  • RSiteCatalyst Version 1.4.7 (and 1.4.6.) Release Notes
  • RSiteCatalyst Version 1.4.5 Release Notes
  • Getting Started: Adobe Analytics Clickstream Data Feed
  • RSiteCatalyst Version 1.4.4 Release Notes
  • RSiteCatalyst Version 1.4.3 Release Notes
  • RSiteCatalyst Version 1.4.2 Release Notes
  • Destroy Your Data Using Excel With This One Weird Trick!
  • RSiteCatalyst Version 1.4.1 Release Notes
  • Visualizing Website Pathing With Sankey Charts
  • Visualizing Website Structure With Network Graphs
  • RSiteCatalyst Version 1.4 Release Notes
  • Maybe I Don't Really Know R After All
  • Building JSON in R: Three Methods
  • Real-time Reporting with the Adobe Analytics API
  • RSiteCatalyst Version 1.3 Release Notes
  • Adobe Analytics Implementation Documentation in 60 Seconds
  • RSiteCatalyst Version 1.2 Release Notes
  • Clustering Search Keywords Using K-Means Clustering
  • RSiteCatalyst Version 1.1 Release Notes
  • Anomaly Detection Using The Adobe Analytics API
  • (not provided): Using R and the Google Analytics API
  • My Top 20 Least Useful Omniture Reports
  • For Maximum User Understanding, Customize the SiteCatalyst Menu
  • Effect Of Modified Bounce Rate In Google Analytics
  • Adobe Discover 3: First Impressions
  • Using Omniture SiteCatalyst Target Report To Calculate YOY growth
  • Google Analytics Individual Qualification (IQ) - Passed!
  • Google Analytics SEO reports: Not Ready For Primetime?
  • An Afternoon With Edward Tufte
  • Google Analytics Custom Variables: A Page-Level Example
  • Xchange 2011: Think Tank and Harbor Cruise
  • Google Analytics for WordPress: Two Methods
  • WordPress Stats or Google Analytics? Yes!
  • Building a Data Science Workstation (2017)
  • JuliaCon 2015: Everyday Analytics and Visualization (video)
  • Vega.jl, Rebooted
  • Sessionizing Log Data Using data.table [Follow-up #2]
  • Sessionizing Log Data Using dplyr [Follow-up]
  • Sessionizing Log Data Using SQL
  • Review: Data Science at the Command Line
  • Introducing Twitter.jl
  • Code Refactoring Using Metaprogramming
  • Evaluating BreakoutDetection
  • Creating A Stacked Bar Chart in Seaborn
  • Visualizing Analytics Languages With VennEuler.jl
  • String Interpolation for Fun and Profit
  • Using Julia As A "Glue" Language
  • Five Hard-Won Lessons Using Hive
  • Using SQL Workbench with Apache Hive
  • Getting Started With Hadoop, Final: Analysis Using Hive & Pig
  • Quickly Create Dummy Variables in a Data Frame
  • Using Amazon EC2 with IPython Notebook
  • Adding Line Numbers in IPython/Jupyter Notebooks
  • Fun With Just-In-Time Compiling: Julia, Python, R and pqR
  • Getting Started Using Hadoop, Part 4: Creating Tables With Hive
  • Tabular Data I/O in Julia
  • Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob
  • A Beginner's Look at Julia
  • Getting Started Using Hadoop, Part 3: Loading Data
  • Innovation Will Never Be At The Push Of A Button
  • Getting Started Using Hadoop, Part 2: Building a Cluster
  • Getting Started Using Hadoop, Part 1: Intro
  • Instructions for Installing & Using R on Amazon EC2
  • Video: SQL Queries in R using sqldf
  • Video: Overlay Histogram in R (Normal, Density, Another Series)
  • Video: R, RStudio, Rcmdr & rattle
  • Getting Started Using R, Part 2: Rcmdr
  • Getting Started Using R, Part 1: RStudio
  • Learning R Has Really Made Me Appreciate SAS