It often doesn’t take much to get me off track, and on a holiday weekend…well, I was just begging for a fun way to shirk. Enter Harlan Harris:
— Harlan Harris (@HarlanH) August 29, 2014
Hey, I’m someone looking for something to do! And I like writing Julia code! So let’s have a look at recreating this diagram in Julia using VennEuler.jl (IJulia Notebook link):
Because VennEuler.jl is not in METADATA as of the time of writing, instead of using Pkg.add() you’ll need to run:
Note that VennEuler uses some of the more exotic packages (at least to me) like NLopt and Cairo, so you might need to have a few additional dependencies installed with the package.
The data was a bit confusing to me at first, since the percentages add up to more than 100% (people could vote multiple times). In order to create a dataset to use, I took the percentages, multiplied by 1000, then re-created the voting pattern. The data for the graph can be downloaded from this link.
Code - Circles
With a few modifications, I basically re-purposed Harlan’s code from the package test files. The circle result is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 using VennEuler data, labels = readcsv("/home/rzwitch/Desktop/kdnuggets_language_survey_2014.csv", header=true) data = bool(data) labels = vec(labels) #Circles eo = make_euler_object(labels, data, EulerSpec()) # circles, for now (minf,minx,ret) = optimize(eo, random_state(eo), ftol=-1, xtol=0.0025, maxtime=120, pop=1000) println("got $minf at $minx (returned $ret)") render("/home/rzwitch/Desktop/kd.svg", eo, minx)
Since the percentage of R, SAS, and Python users isn’t too dramatically different (
40.97% respectively) and the visualizations are circles, it’s a bit hard to tell that R is about 16% points higher than SAS and 9% points higher than Python.
Code - Rectangles
Alternatively, we can use rectangles to represent the areas:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 using VennEuler data, labels = readcsv("/home/rzwitch/Desktop/kdnuggets_language_survey_2014.csv", header=true) data = bool(data) labels = vec(labels) # Rectangles eo = make_euler_object(labels, data, [EulerSpec(:rectangle), EulerSpec(:rectangle, [.5, .5, .4], [0, 0, 0]), EulerSpec(:rectangle)], sizesum=.3) (minf,minx,ret) = optimize_iteratively(eo, random_state(eo), ftol=-1, xtol=0.0025, maxtime=5, pop=100) println("phase 1: got $minf at $minx (returned $ret)") (minf,minx,ret) = optimize(eo, minx, ftol=-1, xtol=0.001, maxtime=30, pop=100) println("phase 2: got $minf at $minx (returned $ret)") render("/home/rzwitch/Desktop/kd-rects.svg", eo, minx)
Here, it’s a slight bit easier to see that SAS and Python are about the same area-wise and that R is larger, although the different dimensions do obscure this fact a bit.
If I spent more time with this package, I’m sure I could make something even more aesthetically pleasing. And for that matter, it’s still a pre-production package that will no doubt get better in the future. But at the very least, there is a way to create an area-proportional representation of relationships using VennEuler.jl in Julia.