Building JSON in R: Three Methods

When I set out to build RSiteCatalyst, I had a few major goals: learn R, build CRAN-worthy package and learn the Adobe Analytics API. As I reflect back on how the package has evolved over the past two years and what I’ve learned, I think my greatest learning was around how to deal with JSON (and strings in general).  

JSON is ubiquitous as a data-transfer mechanism over the web, and R does a decent job providing the functionality to not only read JSON but also to create JSON. There are at least three methods I know of to build JSON strings, and this post will cover the pros and cons of each method.

Method 1: Building JSON using paste

As a beginning R user, I didn’t have the awareness of how many great user-contributed packages are out there. So throughout the RSiteCatalyst source code you can see gems like:

1
2
3
4
5
6
7
8
#"metrics" would be a user input into a function arguments
metrics <- c("a", "b", "c")

#Loop over the metrics list, appending proper curly braces
metrics_conv <- lapply(metrics, function(x) paste('{"id":', '"', x, '"', '}', sep=""))

#Collapse the list into a proper comma separated string
metrics_final <- paste(metrics_conv, collapse=", ")

The code above loops over a character vector (using lapply instead of a for loop like a good R user!), appending curly braces, then flattening the list down to a string. While this code works, it’s a quite brittle way to build JSON. You end up needing to worry about matching quotation marks, remembering if you need curly braces, brackets or singletons…overall, it’s a maintenance nightmare to build strings this way.

Of course, if you have a really simple JSON string you need to build, paste() doesn’t have to be off-limits, but for a majority of the cases I’ve seen, it’s probably not a good idea.

Method 2: Building JSON using sprintf

Somewhere in the middle of building version 1 of RSiteCatalyst, I started learning Python. For those of you who aren’t familiar, Python has a string interpolation operator %, which allows you to do things like the following:

1
2
3
In [1]: print "Here's a string subtitution for my name: %s" %("Randy")

Out[1]: "Here's a string subtitution for my name: Randy"

Thinking that this was the most useful thing I’d ever seen in programming, I naturally searched to see if R had the same functionality. Of course, I quickly learned that all C-based languages have printf/sprintf, and R is no exception. So I started building JSON using sprintf in the following manner:

1
2
3
4
5
elements_list = sprintf('{"id":"%s",
                          "top": "%s",
                          "startingWith":"%s",
                          "search":{"type":"%s", "keywords":[%s]}
                          }', element, top, startingWith, searchType, searchKW2)

In this example, we’re now passing R objects into the sprintf() function, with %s tokens everywhere we need to substitute text. This is certainly an improvement over paste(), especially given that Adobe provides example JSON via their API explorer. So I copied the example strings, replaced their examples with my tokens and voilà! Better JSON string building.

Method 3: Building JSON using a package (jsonlite, rjson or RJSONIO)

While sprintf() allowed for much easier JSON, there is still a frequent code smell in RSiteCatalyst, as evidenced by the following:

1
2
3
4
5
6
7
8
9
#Converts report_suites to JSON
if(length(report_suites)>1){
  report_suites <- toJSON(report_suites)
} else {
  report_suites <- toJSON(list(report_suites))
}

#API request
json <- postRequest("ReportSuite.GetTrafficVars",paste('{"rsid_list":', report_suites , '}'))

At some point, I realized that using the toJSON() function from rjson would take care of the formatting R objects to strings, yet I didn’t make the leap to understanding that I could build the whole string using R objects translated by toJSON()! So I have more hard-to-maintain code where I’m checking the class/length of objects and formatting them. The efficient way to do this using rjson would be:

1
2
3
4
5
6
7
#Efficient method
library(rjson)
report_suites <- list(rsid_list=c("A", "B", "C"))
request.body <- toJSON(report_suites)

#API request
json <- postRequest("ReportSuite.GetTrafficVars", request.body)

With the code above, we’re building JSON in a very R-looking manner; just R objects and functions, and in return getting the output we want. While it’s slightly less obvious what is being created by request.body, there’s literally zero bracket-matching, quoting issues or anything else to worry about in building our JSON. That’s not to say that there isn’t a learning curve to using a JSON package, but I’d rather figure out whether I need a character vector or list than burn my eyes out looking for mismatched quotes and brackets!

Collaborating Makes You A Better Programmer

Like any pursuit, you can get pretty far on your own through hard work and self-study. However, I wouldn’t be nearly where I am without collaborating with others (especially learning about how to build JSON properly in R!). A majority of the RSiteCatalyst code for the upcoming version 1.4 was re-written by Willem Paling, where he added consistency to keyword arguments, switched to jsonlite for better JSON parsing to Data Frames, and most importantly for the topic of this post, cleaned up the method of building all the required JSON strings!

Edit 5/13: For a more thorough example of building complex JSON using jsonlite, check out this example from the v1.4 branch of RSiteCatalyst. The linked example R code populates the required arguments from this JSON outline provide by Adobe.

  • RSiteCatalyst Version 1.4.16 Release Notes
  • Using RSiteCatalyst With Microsoft PowerBI Desktop
  • RSiteCatalyst Version 1.4.14 Release Notes
  • RSiteCatalyst Version 1.4.13 Release Notes
  • RSiteCatalyst Version 1.4.12 (and 1.4.11) Release Notes
  • Self-Service Adobe Analytics Data Feeds!
  • RSiteCatalyst Version 1.4.10 Release Notes
  • WordPress to Jekyll: A 30x Speedup
  • Bulk Downloading Adobe Analytics Data
  • Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis
  • Adobe: Give Credit. You DID NOT Write RSiteCatalyst.
  • RSiteCatalyst Version 1.4.8 Release Notes
  • Adobe Analytics Clickstream Data Feed: Loading To Relational Database
  • Calling RSiteCatalyst From Python
  • RSiteCatalyst Version 1.4.7 (and 1.4.6.) Release Notes
  • RSiteCatalyst Version 1.4.5 Release Notes
  • Getting Started: Adobe Analytics Clickstream Data Feed
  • RSiteCatalyst Version 1.4.4 Release Notes
  • RSiteCatalyst Version 1.4.3 Release Notes
  • RSiteCatalyst Version 1.4.2 Release Notes
  • Destroy Your Data Using Excel With This One Weird Trick!
  • RSiteCatalyst Version 1.4.1 Release Notes
  • Visualizing Website Pathing With Sankey Charts
  • Visualizing Website Structure With Network Graphs
  • RSiteCatalyst Version 1.4 Release Notes
  • Maybe I Don't Really Know R After All
  • Building JSON in R: Three Methods
  • Real-time Reporting with the Adobe Analytics API
  • RSiteCatalyst Version 1.3 Release Notes
  • Adobe Analytics Implementation Documentation in 60 Seconds
  • RSiteCatalyst Version 1.2 Release Notes
  • Clustering Search Keywords Using K-Means Clustering
  • RSiteCatalyst Version 1.1 Release Notes
  • Anomaly Detection Using The Adobe Analytics API
  • (not provided): Using R and the Google Analytics API
  • My Top 20 Least Useful Omniture Reports
  • For Maximum User Understanding, Customize the SiteCatalyst Menu
  • Effect Of Modified Bounce Rate In Google Analytics
  • Adobe Discover 3: First Impressions
  • Using Omniture SiteCatalyst Target Report To Calculate YOY growth
  • ODSC webinar: End-to-End Data Science Without Leaving the GPU
  • PyData NYC 2018: End-to-End Data Science Without Leaving the GPU
  • Data Science Without Leaving the GPU
  • Getting Started With OmniSci, Part 2: Electricity Dataset
  • Getting Started With OmniSci, Part 1: Docker Install and Loading Data
  • Parallelizing Distance Calculations Using A GPU With CUDAnative.jl
  • Building a Data Science Workstation (2017)
  • JuliaCon 2015: Everyday Analytics and Visualization (video)
  • Vega.jl, Rebooted
  • Sessionizing Log Data Using data.table [Follow-up #2]
  • Sessionizing Log Data Using dplyr [Follow-up]
  • Sessionizing Log Data Using SQL
  • Review: Data Science at the Command Line
  • Introducing Twitter.jl
  • Code Refactoring Using Metaprogramming
  • Evaluating BreakoutDetection
  • Creating A Stacked Bar Chart in Seaborn
  • Visualizing Analytics Languages With VennEuler.jl
  • String Interpolation for Fun and Profit
  • Using Julia As A "Glue" Language
  • Five Hard-Won Lessons Using Hive
  • Using SQL Workbench with Apache Hive
  • Getting Started With Hadoop, Final: Analysis Using Hive & Pig
  • Quickly Create Dummy Variables in a Data Frame
  • Using Amazon EC2 with IPython Notebook
  • Adding Line Numbers in IPython/Jupyter Notebooks
  • Fun With Just-In-Time Compiling: Julia, Python, R and pqR
  • Getting Started Using Hadoop, Part 4: Creating Tables With Hive
  • Tabular Data I/O in Julia
  • Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob
  • A Beginner's Look at Julia
  • Getting Started Using Hadoop, Part 3: Loading Data
  • Innovation Will Never Be At The Push Of A Button
  • Getting Started Using Hadoop, Part 2: Building a Cluster
  • Getting Started Using Hadoop, Part 1: Intro
  • Instructions for Installing & Using R on Amazon EC2
  • Video: SQL Queries in R using sqldf
  • Video: Overlay Histogram in R (Normal, Density, Another Series)
  • Video: R, RStudio, Rcmdr & rattle
  • Getting Started Using R, Part 2: Rcmdr
  • Getting Started Using R, Part 1: RStudio
  • Learning R Has Really Made Me Appreciate SAS