Creating an OmniSci ODBC Connection in RStudio Server

Edit 10/1/2018: When I wrote this blog post, the company and product were named MapD. I’ve changed the title to reflect the new company name, but left the MapD references below to hopefully avoid confusion

MapD ODBC RStudio Server

In my post Installing MapD on Microsoft Azure, I showed how to install MapD Community Edition on Microsoft Azure, using Ubuntu 16.04 LTS as the base image. One thing I glossed over during the firewall/security section was that I opened ports for Jupyter Notebook and other data science tools, but I didn’t actually show how to install any of those tools.

For this post, I’ll cover how to install MapD ODBC drivers and create a connection within RStudio server.

1. Installing RStudio Server on Microsoft Azure

With an Ubuntu VM running MapD, installing RStudio Server takes but a handful of commands. The RStudio Server download/install page has fantastic instructions, but if you are looking for Azure-specific RStudio Server install instructions, this blog post from Jumping Rivers does a great job.

2. Installing an ODBC Driver Manager

There are two major ODBC driver managers for Linux and macOS: unixODBC and iODBC. I have had more overall ODBC driver installation success with unixODBC than iODBC; here are the instructions for building unixODBC from source:

#download source and extract
wget ftp://ftp.unixodbc.org/pub/unixODBC/unixODBC-2.3.7.tar.gz
gunzip unixODBC*.tar.gz
tar xvf unixODBC*.tar

#compile and install
cd unixODBC-2.3.7
./configure
make
sudo make install

If you want to check everything is installed correctly, you can run the following command:

odbc_config --cflags

#result
-DHAVE_UNISTD_H -DHAVE_PWD_H -DHAVE_SYS_TYPES_H -DHAVE_LONG_LONG -DSIZEOF_LONG_INT=8 -I/usr/local/include

3. Installing MapD ODBC Driver System-wide

With unixODBC installed, the next step is to install the MapD ODBC drivers. ODBC drivers for MapD are provided as part of MapD Enterprise Edition, so you’ll need to contact your sales representative to get the appropriate version for your MapD installation.

For Linux, the MapD ODBC drivers are provided as a tarball, which when extracted provides all of the necessary ODBC driver files:

#make a directory to extract files into
mkdir mapd_odbc && cd mapd_odbc
tar -xvf ../mapd_odbc_installer_linux_3.80.1.36.tar.gz

#move to /opt/mapd/mapd_odbc (or wherever the other MapD files are)
cd .. && mv mapd_odbc /opt/mapd/mapd_odbc

By convention, MapD suggests placing the ODBC drivers in the same directory as your installation (frequently, /opt/mapd). Wherever you choose to place the directory, you need add that location into the /etc/odbcinst.ini file:

[MapD Driver]
Driver          = /opt/mapd/mapd_odbc/libs/libODBC.so

At this point, we have everything we need to define a connection string within R using odbc:

library(odbc)

conn <- dbConnect(odbc::odbc(),
                  Driver    = "MapD Driver",
                  Server    = "localhost",
                  Database  = "mapd",
                  UID       = "mapd",
                  PWD       = "helloRusers!",
                  Port      = 9091)

Depending on your use case/security preferences, there are two downsides to this method: 1) the credentials are in plain-text in the middle of the script and 2) the RStudio Connection window also shows the credentials in connection window in plain-text until you delete the connection. This can be remedied by defining a DSN (data source name).

4. Defining A DSN

A DSN is what people usually think of when installing ODBC drivers, as it holds some/all of the actual details for connecting to the database. DSN files can be placed in two locations: system-wide in /etc/obdc.ini or in an individual user’s home directory (needs to be ~/.odbc.ini, a hidden file).

In order to have the credentials completely masked in the RStudio session, place the following in the /etc/obdc.ini file:

[MapD Production]
Driver=MapD Driver
PWD=helloRusers!
UID=mapd
HOST=localhost
DATABASE=mapd
PORT=9091

Within the RStudio Connection pane, we can now test our DSN:

MapD ODBC RStudio Server DSN Test

With the DSN defined, the R connection code becomes much shorter, with no credentials exposed within the R session:

library(DBI)
con <- dbConnect(odbc::odbc(), "MapD Production")

ODBC: A Big Bag Of Hurt, But Super Useful

While the instructions above aren’t the easiest to work through, once you have ODBC set up and working one time, it’s usually just a matter of appending various credentials to the existing files to add databases.

From a MapD perspective, ODBC is supported through our Enterprise Edition, but it is the slowest way to work with the database. Up to this point, we’ve focused mostly on supporting Python through the pymapd package and the MapD Ibis backend, but there’s no reason technical reason why R can’t also be a first-class citizen.

So if you’re interested in helping develop an R package for MapD, whether using reticulate to wrap pymapd or to help develop Apache Thrift bindings and Apache Arrow native code, send me a Twitter message or connect via LinkedIn (or any other way to contact me) and we’ll figure out how to collaborate!

  • RSiteCatalyst Version 1.4.16 Release Notes
  • Using RSiteCatalyst With Microsoft PowerBI Desktop
  • RSiteCatalyst Version 1.4.14 Release Notes
  • RSiteCatalyst Version 1.4.13 Release Notes
  • RSiteCatalyst Version 1.4.12 (and 1.4.11) Release Notes
  • Self-Service Adobe Analytics Data Feeds!
  • RSiteCatalyst Version 1.4.10 Release Notes
  • WordPress to Jekyll: A 30x Speedup
  • Bulk Downloading Adobe Analytics Data
  • Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis
  • Adobe: Give Credit. You DID NOT Write RSiteCatalyst.
  • RSiteCatalyst Version 1.4.8 Release Notes
  • Adobe Analytics Clickstream Data Feed: Loading To Relational Database
  • Calling RSiteCatalyst From Python
  • RSiteCatalyst Version 1.4.7 (and 1.4.6.) Release Notes
  • RSiteCatalyst Version 1.4.5 Release Notes
  • Getting Started: Adobe Analytics Clickstream Data Feed
  • RSiteCatalyst Version 1.4.4 Release Notes
  • RSiteCatalyst Version 1.4.3 Release Notes
  • RSiteCatalyst Version 1.4.2 Release Notes
  • Destroy Your Data Using Excel With This One Weird Trick!
  • RSiteCatalyst Version 1.4.1 Release Notes
  • Visualizing Website Pathing With Sankey Charts
  • Visualizing Website Structure With Network Graphs
  • RSiteCatalyst Version 1.4 Release Notes
  • Maybe I Don't Really Know R After All
  • Building JSON in R: Three Methods
  • Real-time Reporting with the Adobe Analytics API
  • RSiteCatalyst Version 1.3 Release Notes
  • Adobe Analytics Implementation Documentation in 60 Seconds
  • RSiteCatalyst Version 1.2 Release Notes
  • Clustering Search Keywords Using K-Means Clustering
  • RSiteCatalyst Version 1.1 Release Notes
  • Anomaly Detection Using The Adobe Analytics API
  • (not provided): Using R and the Google Analytics API
  • My Top 20 Least Useful Omniture Reports
  • For Maximum User Understanding, Customize the SiteCatalyst Menu
  • Effect Of Modified Bounce Rate In Google Analytics
  • Adobe Discover 3: First Impressions
  • Using Omniture SiteCatalyst Target Report To Calculate YOY growth
  • ODSC webinar: End-to-End Data Science Without Leaving the GPU
  • PyData NYC 2018: End-to-End Data Science Without Leaving the GPU
  • Data Science Without Leaving the GPU
  • Getting Started With OmniSci, Part 2: Electricity Dataset
  • Getting Started With OmniSci, Part 1: Docker Install and Loading Data
  • Parallelizing Distance Calculations Using A GPU With CUDAnative.jl
  • Building a Data Science Workstation (2017)
  • JuliaCon 2015: Everyday Analytics and Visualization (video)
  • Vega.jl, Rebooted
  • Sessionizing Log Data Using data.table [Follow-up #2]
  • Sessionizing Log Data Using dplyr [Follow-up]
  • Sessionizing Log Data Using SQL
  • Review: Data Science at the Command Line
  • Introducing Twitter.jl
  • Code Refactoring Using Metaprogramming
  • Evaluating BreakoutDetection
  • Creating A Stacked Bar Chart in Seaborn
  • Visualizing Analytics Languages With VennEuler.jl
  • String Interpolation for Fun and Profit
  • Using Julia As A "Glue" Language
  • Five Hard-Won Lessons Using Hive
  • Using SQL Workbench with Apache Hive
  • Getting Started With Hadoop, Final: Analysis Using Hive & Pig
  • Quickly Create Dummy Variables in a Data Frame
  • Using Amazon EC2 with IPython Notebook
  • Adding Line Numbers in IPython/Jupyter Notebooks
  • Fun With Just-In-Time Compiling: Julia, Python, R and pqR
  • Getting Started Using Hadoop, Part 4: Creating Tables With Hive
  • Tabular Data I/O in Julia
  • Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob
  • A Beginner's Look at Julia
  • Getting Started Using Hadoop, Part 3: Loading Data
  • Innovation Will Never Be At The Push Of A Button
  • Getting Started Using Hadoop, Part 2: Building a Cluster
  • Getting Started Using Hadoop, Part 1: Intro
  • Instructions for Installing & Using R on Amazon EC2
  • Video: SQL Queries in R using sqldf
  • Video: Overlay Histogram in R (Normal, Density, Another Series)
  • Video: R, RStudio, Rcmdr & rattle
  • Getting Started Using R, Part 2: Rcmdr
  • Getting Started Using R, Part 1: RStudio
  • Learning R Has Really Made Me Appreciate SAS