Learning R Has Really Made Me Appreciate SAS

For the past 18 months, it seems like all I’ve heard about in the digital marketing industry is “big data”, and with that, mentions of using Hadoop and R to solve these sorts of problems.  Why are these tools the most often mentioned?  Because they are open source, i.e. free of charge!

But as I’ve tried to learn R, I keep asking myself…are all of my colleagues out of their minds?  Or, am I just beyond learning something new?  As of right now, R is just one big hack on top of a hack to me, and the software is only “free” if you don’t consider lost productivity.

Need new functionality, just download another R package!

One of the biggest “pros” I see thrown around for R relative to a tool like SAS is that when new statistical techniques are invented, someone will code it in R immediately.  A company like SAS make take 5 years to implement the feature, or it may not get implemented at all.  That’s all fine and good, but the problem I’ve found is that there are 10 ways to do something in R, and I spend more time downloading packages (along with other packages that are dependencies) than I do learning A SINGLE WAY to do something correctly.

For example, take trying to get summary statistics by group.  In SAS, you use a Proc Summary statement, with either a BY group statement or a CLASS statement.  It’s fairly simple and it works.

proc summary data= hs0; var _numeric_; class prgtype; output out=results mean= /autolabel autoname inherit; run;

In R, I ran the following code, which should be roughly equivalent:

by(hs0, hs0$prgtype, mean)

Very simple, fewer lines…and technically wrong, throwing a 6 unhelpful errors for a single line of code.  Because it was decided that “mean” as a function would be deprecated in R.  WHY???  It’s so simple, why modify the language like that?

According to the error message, I’m supposed to use colMeans instead…but once you get to how, you’re on your own, the Help documentation is garbage.  Some combination of “by” and “colMeans” might work, but I don’t have an example to follow.

Google sent me to the Quick-R website, and I found a “descriptive statistics” article with by group processing…with the recommendation of using the “psych” package or the “doBy” package.  But CRAN won’t let me download all of the dependencies, so again, stuck trying to do the simplest thing in statistics.

Let’s be fast and run everything in RAM!

My next favorite hassle in R is that you are expected to continuously monitor how many data elements you have active in a workspace.  R runs completely in RAM (as opposed to SAS which runs a combination of RAM for processing and hard disks for storage), so if you want to do something really “big”, you will quickly choke your computer.  I tried to work with a single day of Omniture data from the raw data feed, and my MacBook Pro with 6GB of memory was shot.  I believe the file was 700,000 rows by 300 columns, but I could be mis-remembering.  That’s not even enough data to think about performance-tuning a program in SAS, any slop code will run quickly.

How does one solve these memory errors in R?  Port to Amazon cloud seems to be the most commonly given suggestion.  But that’s more setup time, getting an R instance over to Amazon, your data over to Amazon..and now you are renting hardware.

R is great for data visualization!

From what I’ve seen from the demo(graphics) tutorial, R does have some pretty impressive visualization capabilities.  Contour maps, histograms, boxplots…there seems to be a lot of capability here beyond the realm of a tool like Excel (which, besides not being free, isn’t really for visualization).  SAS has some graphics capabilities, but they are a bit hard to master.

But for all of the hassle to get your data formatted properly, downloading endless packages, avoiding memory errors, you could just pay for Tableau and get working.  Then, once you have your visualizations done in Tableau, if you are using Tableau server you can share interactive dashboards with others.  As far as I know, R graphics are static image exports, so you’re stuck with “flat” presentations.

Maybe, it’s just me

For R diehards, the above verbiage probably just sounds like whining from someone who is too new to appreciate the greatness of R or too stuck in the “old SAS way”.  That’s certainly possible.  But from my first several weeks of trying to use R, the level of frustration is way beyond anything I experienced when I was learning SAS.

Luckily, I don’t currently have any consulting projects that require R or SAS at the moment, so I can continue to try and learn why everyone thinks R is so great.  But from where I sit right now, the licensing fee from SAS doesn’t seem so bad when it allows me to get to doing productive work instead of building my own statistics software piece-by-piece.

Comments

  1. QuintonAnderson says:

    I am afraid that is the nature of open source. You just have to plow through the endless issues and get stuff done. You actually get quite good at this because you start getting a feel for the idioms that a particular community commonly employs and then things start moving quickly. I am neither a SAS or R expert, but what you are describing is a normal open source thing, some love it, some hate, it will eventually take over.

    • @QuintonAnderson Thanks Quinton. Now that I’m 7 months into using R, I’ve gotten past the initial hurdle of dealing with the syntax warts of R. So using R isn’t as bad, though the amount of time searching for answers can still be quite daunting. It would’ve been great if they had named the language something distinct instead of a single letter, since R occurs quite a bit in the English language :)

  2. flaviomargarito says:

    randyzwitch QuintonAnderson You can use http://www.rseek.org/ instead of google…

  3. Francis Lim says:

    I’m four months into R and couldn’t agree with you more !! I also have programmed in S-Plus for several years, but mostly for quick hits for multivariate analyses packages by Breiman & Friedman, Tibshirani, etc.

    I’m a diehard SAS-ophile. What would take me 30 mins to code in SAS Macros can take entire days or even a week in R. It’s maddening, and yet, I get that it’s like learning a new language so I’m trying to be patient. Will give it another four months before delivering my final verdict.

    But for now… SAS Macros rules!! =)

    • Randy Zwitch says:

      I forgot that I wrote this…let’s just say, I’m no longer the SAS enthusiast I once was. Now that I know Python (and R), I can no longer use SAS at all, just too inflexible.

      Hopefully with a few more months time, you’ll get the flow down with R and SAS will just become a distant memory :)

Leave a Reply