RSiteCatalyst Version 1.4.3 Release Notes

It’s a new year, so…new version of RSiteCatalyst on CRAN! For the most part, this release fixes a handful of bugs that weren’t noticed with the prior release 1.4.2 (oops!), but there are pieces of additional functionality.

New functionality: Data Feed monitoring

For those of you having hourly or daily data feeds delivered via FTP, you can now find out the details of a data feed and all of a company’s feeds & the processing status of each using GetFeed() and GetFeeds() respectively.

For example, calling GetFeed() with a specific feed number will return the following information as a data frame:

rsitecatalyst-getfeed

 

Similarly, if you call GetFeeds(“report-suite”), you’ll get the following information as a data frame:

rsitecatalyst-getfeeds

I only have one feed set up for testing, but if there were more feeds delivered each day, they would show up as additional rows in the data frame. The interpretation here is that the daily feed for 1/5/15 was delivered (the 05:00:00 is GMT).

Bug Fixes

RSiteCatalyst v1.4.2 attempted to fix an issue where QueueRanked would error if two SAINT classifications were used. Unfortunately, by fixing that issue, QueueRanked ONLY worked with SAINT Classifications. This was only out in the wild for a month, so hopefully it didn’t really affect anyone.

Additionally, the segment.id and segment.name weren’t printing out to the data frame in the Queue* functions. This has also been fixed.

Test Suite Using Travis CI

To avoid future errors like the ones mentioned above, a full test suite using testthat has been added to RSiteCatalyst and monitored via Travis CI. While there is coverage for every public function within the package, there are likely additional tests that can be added for functionality I didn’t cover. If anyone out there has particularly weird cases they use and aren’t incorporated in the test suite, please feel free to file an issue or submit a pull request and I’ll figure out how to incorporate it into the test suite.

DataWarehouse API

Finally, the last bit of changes to RSiteCatalyst in v1.4.3 are internal preparations for a new package I plan to release in the coming months: AdobeDW. Several folks have asked for the ability to control Data Warehouse reports via R; for various reasons, I thought it made sense to break this out from RSiteCatalyst into its own package. If there are any R-and-Adobe-Analytics enthusiasts out there that would like to help development, please let me know!

Feature Requests/Bugs

As always, if you come across bugs or have feature requests, please continue to use the RSiteCatalyst GitHub Issues page to submit issues. Don’t worry about cluttering up the page with tickets, please fill out a new issue for anything you encounter (with code you’ve already tried and is failing), unless you are SURE that it is the same problem someone else is facing.

And finally, like I end every blog post about RSiteCatalyst, please note that I’m not an Adobe employee. This hasn’t been an issue for a few months, so maybe next time I won’t end the post with this boilerplate :)

RSiteCatalyst Version 1.4.2 Release Notes

RSiteCatalyst version 1.4.2 is now available on CRAN. This update was primarily bug fixes with one additional feature added.

  1. Fixed QueueRanked function to allow multiple SAINT classifications to be specified. This allows for breaking down a SAINT classification with another SAINT classification, such as breaking down tracking codes by marketing channel and by campaign
  2. Fixed bug in internal function, to allow for using the same element multiple times in a QueueRanked function call. This was a necessary fix for allowing multiple SAINT classifications in #1
  3. Exported previous internal function SubmitJsonQueueReport to allow for submitting JSON requests directly to the Adobe Analytics API without all of the R function scaffolding. This approximates the same functionality as the Adobe API Explorer.

For the most part, this isn’t a release that most people will notice any differences from version 1.4.1. That said, special thanks go out to Jason Morgan (@framingeinstein) for identifying the two bugs that were fixed AND submitting fixes.

Feature Requests/Bugs

As always, if you come across bugs or have feature requests, please continue to use the RSiteCatalyst GitHub Issues page to submit issues. Don’t worry about cluttering up the page with tickets, please fill out a new issue for anything you encounter (with code you’ve already tried and is failing), unless you are SURE that it is the same problem someone else is facing.

And finally, like I end every blog post about RSiteCatalyst, please note that I’m not an Adobe employee. Please don’t send me your API credentials, expect immediate replies (especially for you e-commerce folks sweating the holiday season!) or ask to set up phone calls to troubleshoot your problems. This is open-source software…Willem Paling and I did the hard part writing it, you’re expected to support yourself as best as possible unless you believe you’re encountering a bug. Then use GitHub :)

 

RSiteCatalyst Version 1.4.1 Release Notes

Changes

Version 1.4.1 of RSiteCatalyst is now available on CRAN. There were a handful of bug fixes and new features added, including:

  • Fixed bug in QueueRanked function where only 10 results were returned when requesting multiple element reports. Function now returns up to 50,000 per breakdown (API limit)
  • Created better error message to inform user to login with credentials instead of making function call without proper API credentials
  • Added support for using SAINT classifications in QueueRanked/QueueTrended functions
  • Added more error checking to make functions fail more elegantly
  • Added remaining GET methods from Reporting/Administration API

Additional GET methods

This version of RSiteCatalyst has roughly 20 new GET methods, mostly providing additional report suite information for those who might desire to generate their documentation programmatically rather than manually. New API methods include (but are not limited to):

  • GetMarketingChannelRules: Get a list of all criteria used to build the Marketing Channels report
  • GetReportDescription: For a given bookmark_id, get the report definition
  • GetListVariables: Get a list of the List Variables defined for a report suite
  • GetLogins: Get all logins for a given Company

If you were the type of person who enjoyed this blog post showing how to auto-generate Adobe Analytics documentation, I encourage you to take a look at these newly incorporated functions and use them to improve your documentation even further.

Feature Requests/Bugs

If you come across any bugs, or have any feature requests, please continue to use the RSiteCatalyst GitHub Issues page to make tickets. While I’ve responded to many of you via the maintainer email provided in the R package itself, it’s much more efficient (and you’re much more likely to get a response) if you use the GitHub Issues page. Don’t worry about cluttering up the page with tickets, please fill out a new issue for anything you encounter, unless you are SURE that it is the same problem someone else is facing.

And finally, like I end every blog post about RSiteCatalyst, please note that I’m not an Adobe employee. Please don’t send me your API credentials, expect immediate replies or ask to set up phone calls to troubleshoot your problems. This is open-source software…Willem Paling and I did the hard part writing it, you’re expected to support yourself as best as possible unless you believe you’re encountering a bug. Then use GitHub :)

Evaluating BreakoutDetection

A couple of weeks ago, Twitter open-sourced their BreakoutDetection package for R, a package designed to determine shifts in time-series data. The Twitter announcement does a great job of explaining the main technique for detection (E-Divisive with Medians), so I won’t rehash that material here. Rather, I wanted to see how this package works relative to the anomaly detection feature in the Adobe Analytics API, which I’ve written about previously.

Getting Time-Series Data Using RSiteCatalyst

To use a real-world dataset to evaluate this package, I’m going to use roughly ten months of daily pageviews generated from my blog. The hypothesis here is that if the BreakoutDetection package works well, it should be able to detect the boundaries around when I publish a blog post (of which the dates I know with certainty) and when articles of mine get shared on sites such as Reddit. From past experience, I get about a 3-day lift in pageviews post-publishing, as the article gets tweeted out, published on R-Bloggers or JuliaBloggers and shared accordingly.

Here’s the code to get daily pageviews using RSiteCatalyst (Adobe Analytics):One thing to notice here is that BreakoutDetection requires either a single R vector or a specifically formatted data frame. In this case, because I have a timestamp, I use lines 17-18 to get the data into the required format.


BreakoutDetection – Default Example

In the Twitter announcement, they provide an example, so let’s evaluate those defaults first:

breakoutdetection-defaults

In order to validate my hypothesis, the package would need to detect 12 ‘breakouts’ or so, as I’ve published 12 blog posts during the sample time period. Mentally drawing lines between the red boundaries, we can see three definitive upward mean shifts, but far fewer than the 12 I expected.

BreakoutDetection – Modifying The Parameters

Given that the chart above doesn’t fit how I think my data are generated, we can modify two main parameters: beta and min.size. From the documentation:

  • beta: A real numbered constant used to further control the amount of penalization. This is the default form of penalization, if neither (or both) beta or (and) percent are supplied this argument will be used. The default value is beta=0.008.
  • min.size:  The minimum number of observations between change points

The first parameter I’m going to experiment with is min.size, because it requires no in-depth knowledge of the EDM technique! The value used in the first example was 24 (days) between intervals, which seems extreme in my case. It’s reasonable that I might publish a blog post per week, so let’s back that number down to 5 and see how the result changes:breakout-5

With 17 predicted intervals, we’ve somewhat overshot the number of blog posts mark. Not that the package is wrong per se; the boundaries are surrounding many of the spikes in the data, but perhaps having this many breakpoints isn’t useful from a monitoring standpoint. So setting the min.size parameter somewhere between 5 and 24 points would give us more than 3 breakouts, but less than 17. There is also the beta parameter that can be played with, but I’ll leave that as an exercise for another day.

Anomaly Detection – Adobe Analytics

From my prior post about Anomaly Detection with the Adobe Analytics API, Adobe has chosen to use Holt-Winters/Exponential Smoothing as their technique. Here’s what that looks like for the same time-period (code as GitHub Gist):

adobe_analytics

Even though the idea of both techniques are similar, it’s clear that the two methods don’t quite represent the same thing. In the case of the Adobe Analytics Anomaly Detection, it’s looking datapoint-by-datapoint, with a smoothing model built from the prior 35 points. If a point exceeds the upper- or lower-control limits, then it’s an anomaly, but not necessarily indicative of a true level shift like the BreakoutDetection package is measuring.

Conclusion

The BreakoutDetection package is definitely cool, but it is a bit raw, especially the default graphics. But the package definitely does work, as evidenced by how well it put boundaries around the traffic spikes when I set the min.size parameter equal to five.

Additionally, I tried to read more about the underlying methodology, but the only references that come up in Google seem to be references to the R package itself! I wish I had a better feeling for how the beta parameter influences the graph, but I guess that will come over time as I use the package more.  But I’m definitely glad that Twitter open-sourced this package, as I’ve often wondered about how to detect level shifts in a more operational setting, and now I have a method to do so.

Visualizing Website Pathing With Sankey Charts

In my prior post on visualizing website structure using network graphs, I referenced that network graphs showed the pairwise relationships between two pages (in a bi-directional manner). However, if you want to analyze how your visitors are pathing through your site, you can visualize your data using a Sankey chart.

Visualizing Single Page-to-Next Page Pathing

Most digital analytics tools allow you to visualize the path between pages. In the case of Adobe Analytics, the Next Page Flow diagram is limited to 10 second-level branches in the visualization. However, the Adobe Analytics API has no such limitation, and as such we can use RSiteCatalyst to create the following visualization (GitHub Gist containing R code):

The data processing for this visualization is near identical to the network diagrams. We can use QueuePathing() from RSiteCatalyst to download our pathing data, except in this case, I specified an exact page name as the first level of the pathing pattern instead of using the ::anything:: operator. In all Sankey charts created by d3Network, you can hover over the right-hand side nodes to see the values (you can also drag around the nodes on either side if you desire!). It’s pretty clear from this diagram that I need to do a better job retaining my visitors, as the most common path from this page is to leave. :(

Continue reading