randyzwitch.com | Visualizing Analytics Languages With VennEuler.jl

It often doesn’t take much to get me off track, and on a holiday weekend…well, I was just begging for a fun way to shirk. Enter Harlan Harris:

someone redo this area-prop'l Venn w/ my Julia pkg! http://t.co/Mh8rXZbRgY http://t.co/RDWNQHTw3S http://t.co/ljujd9DG0T via @revodavid

— Harlan Harris (@HarlanH) August 29, 2014

Hey, I’m someone looking for something to do! And I like writing Julia code! So let’s have a look at recreating this diagram in Julia using VennEuler.jl (IJulia Notebook link):

Source: Revolution R/KDNuggets

http://blog.revolutionanalytics.com/2014/08/r-tops-kdnuggets-data-analysis-software-poll-for-4th-consecutive-year.html

Installing VennEuler.jl

Because VennEuler.jl is not in METADATA as of the time of writing, instead of using Pkg.add() you’ll need to run:

1
Pkg.clone("https://github.com/HarlanH/VennEuler.jl.git")

Note that VennEuler uses some of the more exotic packages (at least to me) like NLopt and Cairo, so you might need to have a few additional dependencies installed with the package.

Data

The data was a bit confusing to me at first, since the percentages add up to more than 100% (people could vote multiple times). In order to create a dataset to use, I took the percentages, multiplied by 1000, then re-created the voting pattern. The data for the graph can be downloaded from this link.

Code - Circles

With a few modifications, I basically re-purposed Harlan’s code from the package test files. The circle result is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
using VennEuler

data, labels = readcsv("/home/rzwitch/Desktop/kdnuggets_language_survey_2014.csv", header=true)
data = bool(data)
labels = vec(labels)

#Circles
eo = make_euler_object(labels, data, EulerSpec()) # circles, for now

(minf,minx,ret) = optimize(eo, random_state(eo), ftol=-1, xtol=0.0025, maxtime=120, pop=1000)
println("got $minf at $minx (returned $ret)")

render("/home/rzwitch/Desktop/kd.svg", eo, minx)

venneulercircles

Since the percentage of R, SAS, and Python users isn’t too dramatically different (49.81%, 33.42%, 40.97% respectively) and the visualizations are circles, it’s a bit hard to tell that R is about 16% points higher than SAS and 9% points higher than Python.

Code - Rectangles

Alternatively, we can use rectangles to represent the areas:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
using VennEuler

data, labels = readcsv("/home/rzwitch/Desktop/kdnuggets_language_survey_2014.csv", header=true)
data = bool(data)
labels = vec(labels)

# Rectangles
eo = make_euler_object(labels, data, [EulerSpec(:rectangle), EulerSpec(:rectangle, [.5, .5, .4], [0, 0, 0]),
    EulerSpec(:rectangle)],
    sizesum=.3)


(minf,minx,ret) = optimize_iteratively(eo, random_state(eo), ftol=-1, xtol=0.0025, maxtime=5, pop=100)
println("phase 1: got $minf at $minx (returned $ret)")
(minf,minx,ret) = optimize(eo, minx, ftol=-1, xtol=0.001, maxtime=30, pop=100)
println("phase 2: got $minf at $minx (returned $ret)")

render("/home/rzwitch/Desktop/kd-rects.svg", eo, minx)

venneulerrectangles

Here, it’s a slight bit easier to see that SAS and Python are about the same area-wise and that R is larger, although the different dimensions do obscure this fact a bit.

Summary

If I spent more time with this package, I’m sure I could make something even more aesthetically pleasing. And for that matter, it’s still a pre-production package that will no doubt get better in the future. But at the very least, there is a way to create an area-proportional representation of relationships using VennEuler.jl in Julia.

Visualizing Analytics Languages With VennEuler.jl

Installing VennEuler.jl

Data

Code - Circles

Code - Rectangles

Summary

Tags