It often doesn’t take much to get me off track, and on a holiday weekend…well, I was just begging for a fun way to shirk. Enter Harlan Harris:
someone redo this area-prop'l Venn w/ my Julia pkg! http://t.co/Mh8rXZbRgY http://t.co/RDWNQHTw3S http://t.co/ljujd9DG0T via @revodavid
— Harlan Harris (@HarlanH) August 29, 2014
Hey, I’m someone looking for something to do! And I like writing Julia code! So let’s have a look at recreating this diagram in Julia using VennEuler.jl (IJulia Notebook link):
Installing VennEuler.jl
Because VennEuler.jl is not in METADATA as of the time of writing, instead of using Pkg.add() you’ll need to run:
1
Pkg.clone("https://github.com/HarlanH/VennEuler.jl.git")
Note that VennEuler uses some of the more exotic packages (at least to me) like NLopt and Cairo, so you might need to have a few additional dependencies installed with the package.
Data
The data was a bit confusing to me at first, since the percentages add up to more than 100% (people could vote multiple times). In order to create a dataset to use, I took the percentages, multiplied by 1000, then re-created the voting pattern. The data for the graph can be downloaded from this link.
Code - Circles
With a few modifications, I basically re-purposed Harlan’s code from the package test files. The circle result is as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
using VennEuler
data, labels = readcsv("/home/rzwitch/Desktop/kdnuggets_language_survey_2014.csv", header=true)
data = bool(data)
labels = vec(labels)
#Circles
eo = make_euler_object(labels, data, EulerSpec()) # circles, for now
(minf,minx,ret) = optimize(eo, random_state(eo), ftol=-1, xtol=0.0025, maxtime=120, pop=1000)
println("got $minf at $minx (returned $ret)")
render("/home/rzwitch/Desktop/kd.svg", eo, minx)
Since the percentage of R, SAS, and Python users isn’t too dramatically different (49.81%
, 33.42%
, 40.97%
respectively) and the visualizations are circles, it’s a bit hard to tell that R is about 16% points higher than SAS and 9% points higher than Python.
Code - Rectangles
Alternatively, we can use rectangles to represent the areas:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
using VennEuler
data, labels = readcsv("/home/rzwitch/Desktop/kdnuggets_language_survey_2014.csv", header=true)
data = bool(data)
labels = vec(labels)
# Rectangles
eo = make_euler_object(labels, data, [EulerSpec(:rectangle), EulerSpec(:rectangle, [.5, .5, .4], [0, 0, 0]),
EulerSpec(:rectangle)],
sizesum=.3)
(minf,minx,ret) = optimize_iteratively(eo, random_state(eo), ftol=-1, xtol=0.0025, maxtime=5, pop=100)
println("phase 1: got $minf at $minx (returned $ret)")
(minf,minx,ret) = optimize(eo, minx, ftol=-1, xtol=0.001, maxtime=30, pop=100)
println("phase 2: got $minf at $minx (returned $ret)")
render("/home/rzwitch/Desktop/kd-rects.svg", eo, minx)
Here, it’s a slight bit easier to see that SAS and Python are about the same area-wise and that R is larger, although the different dimensions do obscure this fact a bit.
Summary
If I spent more time with this package, I’m sure I could make something even more aesthetically pleasing. And for that matter, it’s still a pre-production package that will no doubt get better in the future. But at the very least, there is a way to create an area-proportional representation of relationships using VennEuler.jl in Julia.