An Afternoon With Edward Tufte

tufte-data-visualization

There had to be 400+ people in the seminar!

Yesterday, I had the opportunity to attend the “Presenting Data and Information” seminar hosted by Edward Tufte in Philadelphia.  A world-renowned expert in the field of data presentation/visualization, Edward Tufte has written seven books outlining terrible and fantastic examples of data display (and how to make sure your charts and tables fall into the latter!)

Unfortunately, as great as each of these books are at explaining methods for data visualization, the seminar was little more than a topical discussion of his book material, rather than a concise summary of what pitfalls to avoid. However, for the relatively low cost of the seminar ($380) and receiving hardcover editions of 4 of 7 of Tufte’s works, there are many worse ways to spend your time and money if you are a data enthusiast!

Course materials

Each attendee received a copy of the following books:

These books are so dense with information that it will probably take me a month or more to read each book!

If your data are boring, you’ve got the wrong data

Of the many positives of this seminar, I appreciated how Tufte hammered on a few main topics, the most important of which is 'If your data is boring, you've got the wrong data'.  I think this is often overlooked when thinking about success in business; if your meetings are dull and people dread when you send out a meeting request, you need better content!  It's (usually) not a visualization problem, and in many ways, it's not a presentation style problem.  If you've got great data, people will overlook an annoying presenter.  But without content that speaks to what the audience is interested in knowing, you might as well not give a presentation at all.

Don't fall into the PowerPoint trap!

The other main point that Tufte really hammered on was if you let the limitations of a tool like PowerPoint dictate how you perform and present analysis, then you've failed as an analyst.  Humans have an extraordinary ability to process dense amounts of information; by limiting yourself to presenting your analysis in 3 bullets and 10 words per page, you are just perpetuating the 'stupidity' (his words) of that 'authoritarian form of communication'.

As an alternative to PowerPoint style charts and graphs, the seminar really focused on hand-drawn illustrations from Galileo's Sunspot discovery and examples from cartographers about how to present multi-variate data structures.  Even though paper (or a PowerPoint slide on-screen) is limited in two dimensions, there are many ways to increase the information density to six or more dimensions.  Sparklines were also discussed in detail, to keep the data in-line with text and to be able to show data trends where the actual numbers aren't necessarily important (or are presented elsewhere).

Simultaneously negative and pie-in-the-sky

I'm not going to focus too much on the negatives here, but one thing that really surprised me about this seminar was how negative in tone the presentation seemed.  I realize part of it was sarcasm (and possibly an affectation), but I would've preferred approaching the topic as what can/should be done to advance the cause, instead of what 'sucks'. Everyone in the room is acutely aware of what sucks in the PowerPoint culture of the business world; moving past that is what everyone was there to learn.

Simultaneously, when talking about improvements, most of them seemed to be unrealistic to actually implement in the real world (not all of us live in Ivory Tower academia, Dr. Tufte 🙂 ).  Suggestions like stripping slides of 'administrative overhead' like corporate logos and style sheets, bringing the level of a presentation WAY up as if everyone is as smart as the presenter, and writing long prose instead of highlighting comments are just unrealistic for most workers.  Most of the suggestions are corporate culture issues, and ones that a lowly data analyst isn't going to be able to change.

Summary: The right data should be able to 'sell' any presentation

In the end, a 5-hour seminar isn't going to change the business world or turn anyone into a super-analyst. But hearing Dr. Tufte speak about elegant design in data visualizations reminded me that I'm the one that controls the outcome of any presentation.  With the right data, shown properly, I should be able to 'sell' anyone on an idea without having to do any salesmanship at all.  The data is what sells an idea, not slick talking and 3 bullets per page.


Google Analytics Custom Variables: A Page-Level Example

Once you’ve implemented Google Analytics on your WordPress blog, you’ll likely find that the default reports aren’t providing the site-specific information you are looking for…or, maybe just not at the level of aggregation you’d prefer.   Google Analytics custom variables provide a method of capturing your site-specific information, depending on whether the information changes once per visitor, once per session, or once per page.  Examples of custom variable usage includes:

  • Demographic information, such as Gender (Visitor-level, never changes)
  • Visitor logs in to your website (Session-level, may not log in during future visits)
  • Each section of the website a visitor “touches” (Page-level, changing multiple times during a session)

This tutorial will cover the Page-level custom variable type, capturing the WordPress Category for each blog post.  With this information, we’ll be able to see which categories of posts are most popular on your WordPress blog over time.

Setting a Google Analytics custom variable

To set a Google Analytics custom variable, we need to use the following syntax:

1
_setCustomVar(index, name, value, opt_scope)

The index section of the variable indicates which of the five allowable custom variables we want to use to record our information (slot 1-5).  name indicates what we want to call our variable.  value is going to be the actual value we are looking to save.  And finally, opt_scope represents whether we want the variable to be page-level, session-level, or visitor-level.

Recording WordPress category into a custom variable

In order to capture the WordPress category in a Google Analytics custom variable, we’re going to use a combination of PHP, WordPress functions, and Google Analytics code.  Here’s the code snippet we’re going to use:

1
2
3
4
5
6
7
8
if (is_single () ) {

$category = get_the_category();
echo "_gaq.push(['_setCustomVar', 2,'Category','". $category[0]->cat_name. "', 3]);";

} else {

}

The is_single part of the code is a WordPress function, which evaluates whether or not a given page is a single post.  Since only single post pages have categories, we use this function to set the Google Analytics custom variable only when there is going to be a category value available on the page.  The $category part of the code is a PHP variable that stores the entire array of WordPress info that goes along with the get_the_category function. Finally, the part of the code that starts echo is the PHP code needed to build the Google Analytics custom variable string we want to have. 

Within this code, you can see the _setCustomVar code described in the first part of the tutorial; we’re setting the index value to 2, which means we’re using Google Analytics Custom Variable 2.  The name of the variable will be Category, the Value to be set is the WordPress category value (from the "'". $category[0]->cat\_name. "' variable), and the opt_scope value is set to 3, which means page-level.

Incorporating custom variable code into Google Analytics tracking code

According to Justin Cutroni, who literally wrote the book on Google Analytics, we want to put our custom variable code BEFORE the _trackPageview portion of the Google Analytics tracking code whenever possible.  This is because on the last page of a visit, if your custom variable code is after the _trackPageview code, Google Analytics won’t “see” the custom variable code, since the data has to tag along with a _trackPageview call. Here’s what the final set of code will look like (place in your header.php file):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<script type="text/javascript">

var _gaq =_gaq || [];
 _gaq.push(['_setAccount', 'UA-XXXXXXXX-X']);

if (is_single () ) {

$category = get_the_category();
echo "_gaq.push(['_setCustomVar', 2,'Category','". $category[0]->cat_name. "', 3]);";

} else {

}

_gaq.push(['_trackPageview']);
_gaq.push(['_trackPageLoadTime']);

(function() {

 var ga = document.createElement('script');
 ga.type = 'text/javascript';
 ga.async = true;

 ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
 var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga,s);
})();

</script>

Example of Custom Variable report

Here’s what the report will look like in Google Analytics.  To see the report, go to Visitors -> Demographics -> Custom Variables.

google-analytics-custom-variables

WordPress Categories in Google Analytics Custom Variable 2


How To Remove "This entry was posted in" on WordPress single posts

In prior posts, I’ve commented that I’m a fan of clean, sleek design when it comes to WordPress themes.  I’ve added the “breadcrumb” style navigation to the top of my posts, which makes the “This Entry was Posted in " and "Bookmark the Permalink" text at the bottom of each post redundant.

Here’s how to remove/modify both of these messages through a simple change to the content-single.php file.

Removing all text at the bottom of the single post

To make all of text disappear at the bottom of each post, all we need to do is comment out a few lines of code. Open your content-single.php file from your Twenty Eleven child theme and find the following lines of code:

1
2
3
<footer class="entry-meta">
<?php
            /* translators: used between list items, there is a space after the comma */

We’ll use our HTML comment tag to comment out the PHP code that starts the line below this one, and close the comment tag at the end of the PHP script.  When done correctly, the code will look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<!--     <?php
            /* translators: used between list items, there is a space after the comma */
            $categories_list = get_the_category_list( __( ', ', 'twentyeleven' ) );
            /* translators: used between list items, there is a space after the comma */
            $tag_list = get_the_tag_list( '', __( ', ', 'twentyeleven' ) );
            if ( '' != $tag_list ) {
                $utility_text = __( 'This entry was posted in %1$s and tagged %2$s by <a href="%6$s">%5$s</a>. Bookmark the <a href="%3$s" title="Permalink to %4$s" rel="bookmark">permalink</a>.', 'twentyeleven' );
            } elseif ( '' != $categories_list ) {
                $utility_text = __( 'This entry was posted in %1$s by <a href="%6$s">%5$s</a>. Bookmark the <a href="%3$s" title="Permalink to %4$s" rel="bookmark">permalink</a>.', 'twentyeleven' );
            } else {
                $utility_text = __( 'This entry was posted by <a href="%6$s">%5$s</a>. Bookmark the <a href="%3$s" title="Permalink to %4$s" rel="bookmark">permalink</a>.', 'twentyeleven' );
            }
            printf(
                $utility_text,
                $categories_list,
                $tag_list,
                esc_url( get_permalink() ),
                the_title_attribute( 'echo=0' ),
                get_the_author(),
                esc_url( get_author_posts_url( get_the_author_meta( 'ID' ) ) )
            );
        ?> -->

Hit save and you’re done, no more “This Entry was Posted in” or “Bookmark the Permalink” verbiage at the end of your posts!

Modifying the text at the bottom of the post to just keep the Post Tags

Perhaps you don’t want to remove the text entirely from the bottom of the post, but just want to leave the tags behind (for SEO purposes or whatever).  To do this, we’ll modify the same piece of code, but instead of commenting out all of the PHP code, we’ll comment out a smaller piece of code, then redefine the $utility_text  PHP variable.

The piece of code we want to comment out is shown below.  Note that because this code is within a PHP code block, we need to comment the code out using a “forward slash-star, star-backslash” comment tag:

1
2
3
4
5
6
7
8
/*  if ( '' != $tag_list ) {
				$utility_text = __( 'This entry was posted in %1$s and tagged %2$s by %5$s. Bookmark the permalink.', 'twentyeleven' );
			} elseif ( '' != $categories_list ) {
				$utility_text = __( 'This entry was posted in %1$s by %5$s. Bookmark the permalink.', 'twentyeleven' );
			} else {
				$utility_text = __( 'This entry was posted by %5$s. Bookmark the permalink.', 'twentyeleven' );
			}
*/

With this code commented out, we can now define the $utility_text variable as we want.  To show just the text “Tagged: ", add the following code just below the commented code above:

1
$utility_text = _( 'Tagged: %2$s');

Once you hit save, the bottom of each of your single posts will show the tags that the post belongs to.


  • RSiteCatalyst Version 1.4.13 Release Notes
  • RSiteCatalyst Version 1.4.12 (and 1.4.11) Release Notes
  • Self-Service Adobe Analytics Data Feeds!
  • RSiteCatalyst Version 1.4.10 Release Notes
  • WordPress to Jekyll: A 30x Speedup
  • Bulk Downloading Adobe Analytics Data
  • Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis
  • Adobe: Give Credit. You DID NOT Write RSiteCatalyst.
  • RSiteCatalyst Version 1.4.8 Release Notes
  • Adobe Analytics Clickstream Data Feed: Loading To Relational Database
  • Calling RSiteCatalyst From Python
  • RSiteCatalyst Version 1.4.7 (and 1.4.6.) Release Notes
  • RSiteCatalyst Version 1.4.5 Release Notes
  • Getting Started: Adobe Analytics Clickstream Data Feed
  • RSiteCatalyst Version 1.4.4 Release Notes
  • RSiteCatalyst Version 1.4.3 Release Notes
  • RSiteCatalyst Version 1.4.2 Release Notes
  • Destroy Your Data Using Excel With This One Weird Trick!
  • RSiteCatalyst Version 1.4.1 Release Notes
  • Visualizing Website Pathing With Sankey Charts
  • Visualizing Website Structure With Network Graphs
  • RSiteCatalyst Version 1.4 Release Notes
  • Maybe I Don't Really Know R After All
  • Building JSON in R: Three Methods
  • Real-time Reporting with the Adobe Analytics API
  • RSiteCatalyst Version 1.3 Release Notes
  • Adobe Analytics Implementation Documentation in 60 Seconds
  • RSiteCatalyst Version 1.2 Release Notes
  • Clustering Search Keywords Using K-Means Clustering
  • RSiteCatalyst Version 1.1 Release Notes
  • Anomaly Detection Using The Adobe Analytics API
  • (not provided): Using R and the Google Analytics API
  • My Top 20 Least Useful Omniture Reports
  • For Maximum User Understanding, Customize the SiteCatalyst Menu
  • Effect Of Modified Bounce Rate In Google Analytics
  • Adobe Discover 3: First Impressions
  • Using Omniture SiteCatalyst Target Report To Calculate YOY growth
  • Google Analytics Individual Qualification (IQ) - Passed!
  • Google Analytics SEO reports: Not Ready For Primetime?
  • An Afternoon With Edward Tufte
  • Google Analytics Custom Variables: A Page-Level Example
  • Xchange 2011: Think Tank and Harbor Cruise
  • Google Analytics for WordPress: Two Methods
  • WordPress Stats or Google Analytics? Yes!
  • Parallelizing Distance Calculations Using A GPU With CUDAnative.jl
  • Building a Data Science Workstation (2017)
  • JuliaCon 2015: Everyday Analytics and Visualization (video)
  • Vega.jl, Rebooted
  • Sessionizing Log Data Using data.table [Follow-up #2]
  • Sessionizing Log Data Using dplyr [Follow-up]
  • Sessionizing Log Data Using SQL
  • Review: Data Science at the Command Line
  • Introducing Twitter.jl
  • Code Refactoring Using Metaprogramming
  • Evaluating BreakoutDetection
  • Creating A Stacked Bar Chart in Seaborn
  • Visualizing Analytics Languages With VennEuler.jl
  • String Interpolation for Fun and Profit
  • Using Julia As A "Glue" Language
  • Five Hard-Won Lessons Using Hive
  • Using SQL Workbench with Apache Hive
  • Getting Started With Hadoop, Final: Analysis Using Hive & Pig
  • Quickly Create Dummy Variables in a Data Frame
  • Using Amazon EC2 with IPython Notebook
  • Adding Line Numbers in IPython/Jupyter Notebooks
  • Fun With Just-In-Time Compiling: Julia, Python, R and pqR
  • Getting Started Using Hadoop, Part 4: Creating Tables With Hive
  • Tabular Data I/O in Julia
  • Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob
  • A Beginner's Look at Julia
  • Getting Started Using Hadoop, Part 3: Loading Data
  • Innovation Will Never Be At The Push Of A Button
  • Getting Started Using Hadoop, Part 2: Building a Cluster
  • Getting Started Using Hadoop, Part 1: Intro
  • Instructions for Installing & Using R on Amazon EC2
  • Video: SQL Queries in R using sqldf
  • Video: Overlay Histogram in R (Normal, Density, Another Series)
  • Video: R, RStudio, Rcmdr & rattle
  • Getting Started Using R, Part 2: Rcmdr
  • Getting Started Using R, Part 1: RStudio
  • Learning R Has Really Made Me Appreciate SAS