Using Amazon EC2 with IPython Notebook

Last week, I wrote a guest blog post at Bad Hessian about how to use IPython Notebook along with Amazon EC2 as your data science & analytics platform. I won’t reproduce the whole article here, but if you are interested in step-by-step instruction on how to setup an Amazon EC2 instance to use IPython Notebook, see the SlideShare presentation below which outlines the steps needed to setup a remote IPython Notebook environment (or, PDF download).

If you already have experience setting up EC2 images and just need the IPython Notebook settings, here are the commands that are needed to set up your IPython public notebook server.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#### Start IPython, generate SHA1 password to use for IPython Notebook server

$ ipython
Python 2.7.5 |Anaconda 1.8.0 (x86_64)| (default, Oct 24 2013, 07:02:20)
Type "copyright", "credits" or "license" for more information.

IPython 1.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from IPython.lib import passwd

In [2]: passwd()
Enter password:
Verify password:
Out[2]: 'sha1:207eb1f4671f:92af695...'

#### Create nbserver profile

$ ipython profile create nbserver
[ProfileCreate] Generating default config file: u'/.ipython/profile_nbserver/ipython_config.py'
[ProfileCreate] Generating default config file: u'/.ipython/profile_nbserver/ipython_qtconsole_config.py'
[ProfileCreate] Generating default config file: u'/.ipython/profile_nbserver/ipython_notebook_config.py'
[ProfileCreate] Generating default config file: u'/.ipython/profile_nbserver/ipython_nbconvert_config.py'

#### Create self-signed SSL certificate

$ openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem

#### Modify ipython_notebook_config.py configuration file
#### Add these lines to the top of the file; no other changes necessary
#### Obviously, you'll want to add your path to the .pem key and your password

# Configuration file for ipython-notebook.

c = get_config()

# Kernel config
c.IPKernelApp.pylab = 'inline'  # if you want plotting support always

# Notebook config
c.NotebookApp.certfile = u'/home/ubuntu/certificates/mycert.pem'
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.password = u'sha1:207eb1f4671f:92af695...'
# It is a good idea to put it on a known, fixed port
c.NotebookApp.port = 8888

#### Start IPython Notebook on the remote server

$ ipython notebook --profile=nbserver

Happy IPython Notebooking!


Adding Line Numbers in IPython/Jupyter Notebooks

Lately, I’ve been using Jupyter Notebooks for all of my Python and Julia coding. The ability to develop and submit small snippets of code and create plots inline is just so useful that it has broken the stranglehold of using an IDE while I’m coding. However, the one thing that was missing for a smooth transition was line numbers in the cells; luckily, this can be achieved in two ways.

Keyboard Shortcut

The easiest way to add line numbers to a Jupyter Notebook is to use the keyboard shortcut, which is Ctrl-m to enter Command Mode, then type L. Just highlight the cell you are interested in adding line numbers to, then hit the keyboard shortcut to toggle the line numbers.

ipython-notebook-line-numbers

Add Line Numbers to All Cells at Startup

While the keyboard shortcut is great for toggling line numbers on/off, I prefer having line numbers always on. Luckily, the IPython Dev folks on Twitter were kind enough to explain how to do this:

I use OSX with the default ‘profile_default’ profile, so the path for my custom.js file for IPython is:

/Users/randyzwitch/.ipython/profile_default/static/custom/

Similarly, you can do the same for IJulia:

/Users/randyzwitch/.ipython/profile_julia/static/custom

If you are using a different operating system than OSX, or you are using OSX and you don’t see a custom.js file in these locations, a quick search for custom.js will get you to the right file location. Once you open up the custom.js file, you can place the line of JavaScript anywhere in the file, as long as it’s not inside any of any pre-existing functions in the file.

Once you place the line of JavaScript in your file, you’ll need to restart IPython/IJulia completely for the change to take effect. After that, you’ll have line numbers in each cell, each Notebook!

Edit 11/4/2015: Thanks to reader Nat Dunn, I’ve been made aware that the above method no longer works, which isn’t a surprise given the amount of changes between IPython Notebook to the entire Jupyter project in the past 2 years.

For the (currently) correct method of adding line numbers to Jupyter Notebook by default, please see Nat’s post with the correct instructions on modifying the custom.js file.


RSiteCatalyst Version 1.2 Release Notes

Version 1.2 of the RSiteCatalyst package to access the Adobe Analytics API is now available on CRAN! Changes include:

  • Removed RCurl package dependency
  • Changed argument order for GetAdminConsoleLog to avoid error when date not passed
  • Return proper numeric type for metric columns
  • Fixed bug in GetEVars function
  • Added validate:true flag to API to improve error reporting
  • Removed remaining references to Omniture

For the most part, the only noticeable change for most users will be that you no longer need to call as.numeric() on a DataFrame after getting the results of an API call, as all functions now return the proper numeric type.

Changes from Development Version

For any of you out there that may have installed the 1.2 development version directly from GitHub, the only difference between the 1.2 development version and the stable, CRAN version of the package is that support for the Adobe Analytics Real Time API has been removed. This functionality will continue to be developed on the 1.3 development branch on GitHub.

Testing

For this release, I’ve made a more concerted effort to test RSiteCatalyst on various platforms outside of OSX (where I do my development). RSiteCatalyst works in the following environments:

  • OSX Lion and prior
  • Ubuntu 12.04 LTS
  • Windows 7 64-bit SP1
  • Windows 8.1 64-bit
  • R 2.15.2 and newer
  • R and RStudio

If your environment is not listed above, it is still likely the case that RSiteCatalyst will work in your environment, as there is no operating-system-specific code in the package. If you are finding issues, validate that you have all package dependencies installed, your Adobe account has Web Service Access privileges (set in Admin panel), you have permission access to the report suites you are trying to access (also an Admin panel setting) and that your company doesn’t have any firewall settings that would prevent API access.

Support

If you run into any problems with RSiteCatalyst, please file an issue on GitHub so it can be tracked properly. Note that I’m not an Adobe employee, so I can only provide so much support, as in most cases I can’t validate your settings to ensure you are set up correctly (nor do I have any inside information about how the system works 🙂 )


  • RSiteCatalyst Version 1.4.12 (and 1.4.11) Release Notes
  • Self-Service Adobe Analytics Data Feeds!
  • RSiteCatalyst Version 1.4.10 Release Notes
  • WordPress to Jekyll: A 30x Speedup
  • Bulk Downloading Adobe Analytics Data
  • Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis
  • Adobe: Give Credit. You DID NOT Write RSiteCatalyst.
  • RSiteCatalyst Version 1.4.8 Release Notes
  • Adobe Analytics Clickstream Data Feed: Loading To Relational Database
  • Calling RSiteCatalyst From Python
  • RSiteCatalyst Version 1.4.7 (and 1.4.6.) Release Notes
  • RSiteCatalyst Version 1.4.5 Release Notes
  • Getting Started: Adobe Analytics Clickstream Data Feed
  • RSiteCatalyst Version 1.4.4 Release Notes
  • RSiteCatalyst Version 1.4.3 Release Notes
  • RSiteCatalyst Version 1.4.2 Release Notes
  • Destroy Your Data Using Excel With This One Weird Trick!
  • RSiteCatalyst Version 1.4.1 Release Notes
  • Visualizing Website Pathing With Sankey Charts
  • Visualizing Website Structure With Network Graphs
  • RSiteCatalyst Version 1.4 Release Notes
  • Maybe I Don't Really Know R After All
  • Building JSON in R: Three Methods
  • Real-time Reporting with the Adobe Analytics API
  • RSiteCatalyst Version 1.3 Release Notes
  • Adobe Analytics Implementation Documentation in 60 Seconds
  • RSiteCatalyst Version 1.2 Release Notes
  • Clustering Search Keywords Using K-Means Clustering
  • RSiteCatalyst Version 1.1 Release Notes
  • Anomaly Detection Using The Adobe Analytics API
  • (not provided): Using R and the Google Analytics API
  • My Top 20 Least Useful Omniture Reports
  • For Maximum User Understanding, Customize the SiteCatalyst Menu
  • Effect Of Modified Bounce Rate In Google Analytics
  • Adobe Discover 3: First Impressions
  • Using Omniture SiteCatalyst Target Report To Calculate YOY growth
  • Google Analytics Individual Qualification (IQ) - Passed!
  • Google Analytics SEO reports: Not Ready For Primetime?
  • An Afternoon With Edward Tufte
  • Google Analytics Custom Variables: A Page-Level Example
  • Xchange 2011: Think Tank and Harbor Cruise
  • Google Analytics for WordPress: Two Methods
  • WordPress Stats or Google Analytics? Yes!
  • Building a Data Science Workstation (2017)
  • JuliaCon 2015: Everyday Analytics and Visualization (video)
  • Vega.jl, Rebooted
  • Sessionizing Log Data Using data.table [Follow-up #2]
  • Sessionizing Log Data Using dplyr [Follow-up]
  • Sessionizing Log Data Using SQL
  • Review: Data Science at the Command Line
  • Introducing Twitter.jl
  • Code Refactoring Using Metaprogramming
  • Evaluating BreakoutDetection
  • Creating A Stacked Bar Chart in Seaborn
  • Visualizing Analytics Languages With VennEuler.jl
  • String Interpolation for Fun and Profit
  • Using Julia As A "Glue" Language
  • Five Hard-Won Lessons Using Hive
  • Using SQL Workbench with Apache Hive
  • Getting Started With Hadoop, Final: Analysis Using Hive & Pig
  • Quickly Create Dummy Variables in a Data Frame
  • Using Amazon EC2 with IPython Notebook
  • Adding Line Numbers in IPython/Jupyter Notebooks
  • Fun With Just-In-Time Compiling: Julia, Python, R and pqR
  • Getting Started Using Hadoop, Part 4: Creating Tables With Hive
  • Tabular Data I/O in Julia
  • Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob
  • A Beginner's Look at Julia
  • Getting Started Using Hadoop, Part 3: Loading Data
  • Innovation Will Never Be At The Push Of A Button
  • Getting Started Using Hadoop, Part 2: Building a Cluster
  • Getting Started Using Hadoop, Part 1: Intro
  • Instructions for Installing & Using R on Amazon EC2
  • Video: SQL Queries in R using sqldf
  • Video: Overlay Histogram in R (Normal, Density, Another Series)
  • Video: R, RStudio, Rcmdr & rattle
  • Getting Started Using R, Part 2: Rcmdr
  • Getting Started Using R, Part 1: RStudio
  • Learning R Has Really Made Me Appreciate SAS