<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>randyzwitch.com</title>
    <description>Data Science, Programming, Technology, Design</description>
    <link>
    http://randyzwitch.com</link>

    
      
      
      
    
      
      
      <item>
        <title>A Beginner's Look at BenchmarkTools.jl</title>
        
          <description>&lt;p&gt;For the number of years I’ve been programming using Julia, I’ve never really been concerned with performance. Which is to say, I’ve appreciated that &lt;em&gt;other people&lt;/em&gt; are interested in performance and have proven that Julia can be as fast as any other performance language out there. But I’ve never been one to pour over the &lt;a href=&quot;https://docs.julialang.org/en/v1/manual/performance-tips/&quot;&gt;Performance Tips&lt;/a&gt; section of the Julia manual trying to squeeze every last bit of performance.&lt;/p&gt;

&lt;p&gt;But now that I’ve released &lt;a href=&quot;https://www.omnisci.com/blog/announcing-omnisci.jl-a-julia-client-for-omnisci&quot;&gt;OmniSci.jl&lt;/a&gt;, and as a company one of our major selling points is &lt;a href=&quot;https://www.omnisci.com/platform&quot;&gt;accelerated analytics&lt;/a&gt;, I figured it was time to stop assuming I wrote decent-ish code and really pay attention to performance. This post highlights my experience as a beginner, and hopefully will show how others can get started in learning to optimize their Julia code.&lt;/p&gt;

&lt;h2 id=&quot;read-the-manuals&quot;&gt;Read The Manuals!&lt;/h2&gt;

&lt;p&gt;As I mentioned above, I’ve written Julia for many years now, and in that time I’ve grown up with many of the tips in the performance tips section of the documentation. Things like &lt;a href=&quot;https://docs.julialang.org/en/v1/manual/performance-tips/#Write-%22type-stable%22-functions-1&quot;&gt;“write type stable functions”&lt;/a&gt; and &lt;a href=&quot;https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-global-variables-1&quot;&gt;“avoid global variables”&lt;/a&gt; are things that I’ve internalized as good programming practices, as opposed to doing them just because they are performant. But with this long familiarity with the language comes laziness, and by not reading the BenchmarkTools.jl documentation, I started off benchmarking incorrectly. Consider this example:&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Random&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OmniSci&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Base&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Threads&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;#change defaults, since examples long-running&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DEFAULT_PARAMETERS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seconds&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;
&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DEFAULT_PARAMETERS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;samples&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;#generate test data&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;gendata&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;typemin&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;typemax&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;gendata&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generic&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; with&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int64_10x6&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gendata&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int64&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;#Test whether broadcasting more/less efficient than pre-allocating results array&lt;/span&gt;
       &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; preallocate&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

           &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OmniSci&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TStringValue&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;undef&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

           &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OmniSci&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TStringValue&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;
           &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

           &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;
       &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;preallocate&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generic&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; with&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;@benchmark&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v61&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OmniSci&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TStringValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64_10x6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Trial&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;memory&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mf&quot;&gt;297.55&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MiB&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;allocs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;6000005&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;minimum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;750.146&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;median&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;      &lt;span class=&quot;mf&quot;&gt;1.014&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;29.38&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;        &lt;span class=&quot;mf&quot;&gt;1.151&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;28.38&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;maximum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;1.794&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;43.06&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;samples&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;          &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sample&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;@benchmark&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v62&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;preallocate&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64_10x6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Trial&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;memory&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mf&quot;&gt;297.55&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MiB&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;allocs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;6000002&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;minimum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;753.877&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;median&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;      &lt;span class=&quot;mf&quot;&gt;1.021&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;28.30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;        &lt;span class=&quot;mf&quot;&gt;1.158&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;28.10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;maximum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;1.806&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;43.17&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;samples&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;          &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sample&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The benchmark above tests whether it’s worth &lt;a href=&quot;https://docs.julialang.org/en/v1/manual/performance-tips/#Pre-allocating-outputs-1&quot;&gt;pre-allocating the results array&lt;/a&gt; vs. using the more convenient &lt;a href=&quot;https://docs.julialang.org/en/v1/manual/functions/#man-vectorized-1&quot;&gt;dot broadcasting syntax&lt;/a&gt;. The idea here is that growing an array over and over can be inefficient when you know the result size at the outset. Yet, comparing the times above, for all statistics pre-allocating the array is &lt;em&gt;slightly worse&lt;/em&gt;, even though we’re passing the compiler more knowledge up front. This didn’t sit well with me, so I consulted the BenchmarkTools.jl manual and found the following about &lt;a href=&quot;https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#interpolating-values-into-benchmark-expressions&quot;&gt;variable interpolation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A good rule of thumb is that &lt;strong&gt;external variables should be explicitly interpolated into the benchmark expression&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Interpolating the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int64_10x6&lt;/code&gt; input array into the function takes it from being a global variable to a local, and sure enough, we see roughly a &lt;strong&gt;6% improvement&lt;/strong&gt; in the minimum time when we pre-allocate the array:&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;@benchmark&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v61i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OmniSci&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TStringValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64_10x6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Trial&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;memory&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mf&quot;&gt;297.55&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MiB&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;allocs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;6000002&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;minimum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;763.817&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;median&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;      &lt;span class=&quot;mf&quot;&gt;960.446&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;24.02&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;        &lt;span class=&quot;mf&quot;&gt;1.178&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;28.68&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;maximum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;1.886&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;45.11&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;samples&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;          &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sample&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;@benchmark&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v62i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;preallocate&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64_10x6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Trial&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;memory&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mf&quot;&gt;297.55&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MiB&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;allocs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;6000002&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;minimum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;721.597&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;median&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;      &lt;span class=&quot;mf&quot;&gt;1.072&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;30.45&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;        &lt;span class=&quot;mf&quot;&gt;1.234&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;32.92&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;maximum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;1.769&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;44.51&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;samples&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;          &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sample&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Whether that 6% improvement will hold up over time or not, at least conceptually we’re no longer worse off for pre-allocating, which fits my mental model of how things should work.&lt;/p&gt;

&lt;h2 id=&quot;evaluate-your-benchmark-over-the-range-of-inputs-you-care-about&quot;&gt;Evaluate Your Benchmark Over the Range of Inputs You Care About&lt;/h2&gt;

&lt;p&gt;In the comparison above, I evaluate the benchmark over &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10^6&lt;/code&gt; observations. How did I choose 1 million as the “right” number of events to test, instead of just testing 1 or 10 events? My general goal for benchmarking this code is to speed up the methods of loading data into an OmniSciDB database. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TStringValue&lt;/code&gt; is one of the internal methods as part of doing a row-wise table load, converting whatever data is present in an array or DataFrame from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;::Type{T}&lt;/code&gt; into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;String&lt;/code&gt; (think iterating over a text file by line). Since users trying to accelerate their database operations are probably going to be using millions to billions of data points, I’m interested in understanding how the functions are performing at these volumes of data.&lt;/p&gt;

&lt;p&gt;The other conscious decision I made was the environment to test on. I could test this on massive CPU- and GPU-enabled servers, but I’m testing this on my Dell XPS 15 laptop. Why?  Because I’m actually interested in how things are performing under more real-world conditions for a realistic user. Testing the performance characteristics of a high-end server with tons of memory and cores would be fun, but I want to make sure any performance improvements are broadly applicable, instead of just because I am throwing more hardware at the problem.&lt;/p&gt;

&lt;p&gt;Less important to me to control for was garbage collection, using a fresh session before each measurement or other “best case scenario” optimizations. I would expect my users to be more analytics and data science focused, so re-using the same session is going to be common. If the performance improvements aren’t completely obvious, I’m not going to incorporate them into the codebase.&lt;/p&gt;

&lt;h2 id=&quot;case-study-speeding-up-tstringvalue&quot;&gt;Case Study: Speeding Up &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TStringValue&lt;/code&gt;&lt;/h2&gt;

&lt;p&gt;For my test, I evaluate the following as the methods to benchmark:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;broadcasting: current library default&lt;/li&gt;
  &lt;li&gt;pre-allocating result array&lt;/li&gt;
  &lt;li&gt;pre-allocated result array with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt; macro&lt;/li&gt;
  &lt;li&gt;pre-allocated result array with threads&lt;/li&gt;
  &lt;li&gt;pre-allocated result array with threads and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;10x6-observations&quot;&gt;10x6 observations&lt;/h3&gt;

&lt;div id=&quot;ts_106&quot; style=&quot;height:400px;width:950px;&quot;&gt;&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;

    // Initialize after dom ready
    var myChart = echarts.init(document.getElementById(&quot;ts_106&quot;));

    // Load data into the ECharts instance
    myChart.setOption(
{&quot;xAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;data&quot;:[&quot;broadcast&quot;,&quot;pre-allocate&quot;,&quot;pre-allocate/inbounds&quot;,&quot;threads&quot;,&quot;threads/inbounds&quot;],&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:30,&quot;silent&quot;:true,&quot;type&quot;:&quot;category&quot;}],&quot;ec_charttype&quot;:&quot;xy plot&quot;,&quot;series&quot;:[{&quot;name&quot;:&quot;Min&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:false,&quot;data&quot;:[752.568,748.719,738.013,249.117,241.585],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{}},&quot;type&quot;:&quot;bar&quot;},{&quot;name&quot;:&quot;Median&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:false,&quot;data&quot;:[990.071,988.012,967.184,253.161,246.792],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{}},&quot;type&quot;:&quot;bar&quot;}],&quot;theme&quot;:{&quot;geo&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;parallel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;markPoint&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}}},&quot;visualMap&quot;:{&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#e7dbc3&quot;]},&quot;funnel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;bar&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0},&quot;emphasis&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0}}},&quot;map&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;scatter&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;pie&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;graph&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}},&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;,&quot;width&quot;:1}}},&quot;backgroundColor&quot;:&quot;rgba(0,0,0,0)&quot;,&quot;line&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;candlestick&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor0&quot;:&quot;#b8d2c7&quot;,&quot;color&quot;:&quot;#e01f54&quot;,&quot;borderColor&quot;:&quot;#f5e8c8&quot;,&quot;borderWidth&quot;:1,&quot;color0&quot;:&quot;#001852&quot;}}},&quot;sankey&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;valueAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;toolbox&quot;:{&quot;iconStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#999999&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#666666&quot;}}},&quot;categoryAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:false,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;tooltip&quot;:{&quot;axisPointer&quot;:{&quot;crossStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1}}},&quot;timeline&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}}},&quot;controlStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5}},&quot;checkpointStyle&quot;:{&quot;color&quot;:&quot;#e43c59&quot;,&quot;borderColor&quot;:&quot;rgba(194,53,49,0.5)&quot;},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:1},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#a9334c&quot;}},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;width&quot;:1}},&quot;radar&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;logAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;textStyle&quot;:{},&quot;gauge&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;boxplot&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;title&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;subtextStyle&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;}},&quot;dataZoom&quot;:{&quot;dataBackgroundColor&quot;:&quot;rgba(47,69,84,0.3)&quot;,&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;handleSize&quot;:&quot;100%&quot;,&quot;handleColor&quot;:&quot;#a7b7cc&quot;,&quot;fillerColor&quot;:&quot;rgba(167,183,204,0.4)&quot;,&quot;backgroundColor&quot;:&quot;rgba(47,69,84,0)&quot;},&quot;timeAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;legend&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;}}},&quot;yAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:50,&quot;silent&quot;:true,&quot;type&quot;:&quot;value&quot;}],&quot;toolbox&quot;:{&quot;feature&quot;:{},&quot;orient&quot;:&quot;vertical&quot;,&quot;itemSize&quot;:15,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;z&quot;:2,&quot;itemGap&quot;:20,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;center&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;showTitle&quot;:true},&quot;ec_width&quot;:1000,&quot;ec_height&quot;:500,&quot;tooltip&quot;:{&quot;triggerOn&quot;:&quot;mousemove&quot;,&quot;enterable&quot;:true,&quot;borderColor&quot;:&quot;#333&quot;,&quot;transitionDuration&quot;:0.4,&quot;hideDelay&quot;:100,&quot;padding&quot;:5,&quot;showDelay&quot;:0,&quot;borderWidth&quot;:0,&quot;showContent&quot;:true,&quot;backgroundColor&quot;:&quot;rgba(50,50,50,0.7)&quot;,&quot;trigger&quot;:&quot;item&quot;,&quot;alwaysShowContent&quot;:false,&quot;confine&quot;:false,&quot;show&quot;:true},&quot;grid&quot;:[{&quot;height&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;width&quot;:&quot;auto&quot;,&quot;backgroundColor&quot;:&quot;transparent&quot;}],&quot;aria&quot;:{&quot;show&quot;:true},&quot;title&quot;:[{&quot;left&quot;:&quot;left&quot;,&quot;borderColor&quot;:&quot;transparent&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;padding&quot;:5,&quot;zlevel&quot;:0,&quot;borderWidth&quot;:1,&quot;target&quot;:&quot;blank&quot;,&quot;z&quot;:2,&quot;itemGap&quot;:5,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;subtarget&quot;:&quot;blank&quot;,&quot;show&quot;:true}],&quot;ec_renderer&quot;:&quot;canvas&quot;,&quot;legend&quot;:{&quot;itemWidth&quot;:25,&quot;data&quot;:[&quot;Min&quot;,&quot;Median&quot;],&quot;borderColor&quot;:&quot;transparent&quot;,&quot;orient&quot;:&quot;horizontal&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;padding&quot;:5,&quot;borderWidth&quot;:1,&quot;inactiveColor&quot;:&quot;#ccc&quot;,&quot;z&quot;:2,&quot;align&quot;:&quot;auto&quot;,&quot;itemGap&quot;:10,&quot;itemHeight&quot;:14,&quot;backgroundColor&quot;:&quot;transparent&quot;,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;selectedMode&quot;:true,&quot;show&quot;:true}} );
&lt;/script&gt;

&lt;p&gt;For the first three on the left, these are comparisons of the single-threaded methods. You can see that pre-allocating the output array is marginally faster than broadcasting, and using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt; macro is incrementally faster still, but neither method provides enough speedup to be worth implementing. The difference between the red and the blue bars represents a garbage collection occurring, but again, the three methods aren’t different enough to notice anything interesting.&lt;/p&gt;

&lt;p&gt;For the multi-threaded tests, I’m using 6 threads (one per physical core), and we’re seeing roughly a &lt;strong&gt;3x speedup&lt;/strong&gt;. Like the single-threaded tests above, using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt; is only marginally faster, but not enough to widely implement for the cost of increased code complexity. Interestingly, doing these multi-threaded benchmarks didn’t trigger garbage collect &lt;em&gt;at all&lt;/em&gt; across my five iterations; not sure if this is specific due to threading or not, but something to explore outside of this blog post.&lt;/p&gt;

&lt;h3 id=&quot;10x7-observations&quot;&gt;10x7 observations&lt;/h3&gt;

&lt;p&gt;To see how these calculation methods might change at a larger scale, I bumped up the observations by an order of 10 and saw the following results:&lt;/p&gt;

&lt;div id=&quot;ts_108&quot; style=&quot;height:400px;width:950px;&quot;&gt;&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;

    // Initialize after dom ready
    var myChart = echarts.init(document.getElementById(&quot;ts_108&quot;));

    // Load data into the ECharts instance
    myChart.setOption(
{&quot;xAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;data&quot;:[&quot;broadcast&quot;,&quot;pre-allocate&quot;,&quot;pre-allocate/inbounds&quot;,&quot;threads&quot;,&quot;threads/inbounds&quot;],&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:30,&quot;silent&quot;:true,&quot;type&quot;:&quot;category&quot;}],&quot;ec_charttype&quot;:&quot;xy plot&quot;,&quot;series&quot;:[{&quot;name&quot;:&quot;Min&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:false,&quot;data&quot;:[26.316,27.064,26.219,2.717,2.641],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{}},&quot;type&quot;:&quot;bar&quot;},{&quot;name&quot;:&quot;Median&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:false,&quot;data&quot;:[39.332,38.925,39.387,17.659,16.659],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{}},&quot;type&quot;:&quot;bar&quot;}],&quot;theme&quot;:{&quot;geo&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;parallel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;markPoint&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}}},&quot;visualMap&quot;:{&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#e7dbc3&quot;]},&quot;funnel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;bar&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0},&quot;emphasis&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0}}},&quot;map&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;scatter&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;pie&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;graph&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}},&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;,&quot;width&quot;:1}}},&quot;backgroundColor&quot;:&quot;rgba(0,0,0,0)&quot;,&quot;line&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;candlestick&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor0&quot;:&quot;#b8d2c7&quot;,&quot;color&quot;:&quot;#e01f54&quot;,&quot;borderColor&quot;:&quot;#f5e8c8&quot;,&quot;borderWidth&quot;:1,&quot;color0&quot;:&quot;#001852&quot;}}},&quot;sankey&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;valueAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;toolbox&quot;:{&quot;iconStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#999999&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#666666&quot;}}},&quot;categoryAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:false,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;tooltip&quot;:{&quot;axisPointer&quot;:{&quot;crossStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1}}},&quot;timeline&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}}},&quot;controlStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5}},&quot;checkpointStyle&quot;:{&quot;color&quot;:&quot;#e43c59&quot;,&quot;borderColor&quot;:&quot;rgba(194,53,49,0.5)&quot;},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:1},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#a9334c&quot;}},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;width&quot;:1}},&quot;radar&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;logAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;textStyle&quot;:{},&quot;gauge&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;boxplot&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;title&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;subtextStyle&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;}},&quot;dataZoom&quot;:{&quot;dataBackgroundColor&quot;:&quot;rgba(47,69,84,0.3)&quot;,&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;handleSize&quot;:&quot;100%&quot;,&quot;handleColor&quot;:&quot;#a7b7cc&quot;,&quot;fillerColor&quot;:&quot;rgba(167,183,204,0.4)&quot;,&quot;backgroundColor&quot;:&quot;rgba(47,69,84,0)&quot;},&quot;timeAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;legend&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;}}},&quot;yAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:50,&quot;silent&quot;:true,&quot;type&quot;:&quot;value&quot;}],&quot;toolbox&quot;:{&quot;feature&quot;:{},&quot;orient&quot;:&quot;vertical&quot;,&quot;itemSize&quot;:15,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;z&quot;:2,&quot;itemGap&quot;:20,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;center&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;showTitle&quot;:true},&quot;ec_width&quot;:1000,&quot;ec_height&quot;:500,&quot;tooltip&quot;:{&quot;triggerOn&quot;:&quot;mousemove&quot;,&quot;enterable&quot;:true,&quot;borderColor&quot;:&quot;#333&quot;,&quot;transitionDuration&quot;:0.4,&quot;hideDelay&quot;:100,&quot;padding&quot;:5,&quot;showDelay&quot;:0,&quot;borderWidth&quot;:0,&quot;showContent&quot;:true,&quot;backgroundColor&quot;:&quot;rgba(50,50,50,0.7)&quot;,&quot;trigger&quot;:&quot;item&quot;,&quot;alwaysShowContent&quot;:false,&quot;confine&quot;:false,&quot;show&quot;:true},&quot;grid&quot;:[{&quot;height&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;width&quot;:&quot;auto&quot;,&quot;backgroundColor&quot;:&quot;transparent&quot;}],&quot;aria&quot;:{&quot;show&quot;:true},&quot;color&quot;:[&quot;#10222B&quot;,&quot;#95AB63&quot;,&quot;#BDD684&quot;,&quot;#E2F0D6&quot;,&quot;#F6FFE0&quot;],&quot;title&quot;:[{&quot;left&quot;:&quot;left&quot;,&quot;borderColor&quot;:&quot;transparent&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;padding&quot;:5,&quot;zlevel&quot;:0,&quot;borderWidth&quot;:1,&quot;target&quot;:&quot;blank&quot;,&quot;z&quot;:2,&quot;itemGap&quot;:5,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;subtarget&quot;:&quot;blank&quot;,&quot;show&quot;:true}],&quot;ec_renderer&quot;:&quot;canvas&quot;,&quot;legend&quot;:{&quot;itemWidth&quot;:25,&quot;data&quot;:[&quot;Min&quot;,&quot;Median&quot;],&quot;borderColor&quot;:&quot;transparent&quot;,&quot;orient&quot;:&quot;horizontal&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;padding&quot;:5,&quot;borderWidth&quot;:1,&quot;inactiveColor&quot;:&quot;#ccc&quot;,&quot;z&quot;:2,&quot;align&quot;:&quot;auto&quot;,&quot;itemGap&quot;:10,&quot;itemHeight&quot;:14,&quot;backgroundColor&quot;:&quot;transparent&quot;,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;selectedMode&quot;:true,&quot;show&quot;:true}});
&lt;/script&gt;

&lt;p&gt;Like at the 1 million data range, there isn’t much difference between the three single-threaded methods. All three of them are within a few percentage in either direction (all three methods triggered garbage collection in each of their five runs).&lt;/p&gt;

&lt;p&gt;For the multi-threaded tests, an interesting performance scenario emerged. Like the 1 million point tests, it’s possible to get a run where garbage collection isn’t triggered, which leads to a large min/median difference in the multi-threaded tests. If you can avoid garbage collection, using six threads here gives nearly a &lt;strong&gt;10x speedup&lt;/strong&gt;, and at the median where both single-threaded and multi-threaded trigger garbage collection you still get a &lt;strong&gt;2x speedup&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 id=&quot;parallelism--compiler-hinting&quot;&gt;Parallelism &amp;gt; Compiler Hinting&lt;/h2&gt;

&lt;p&gt;In the case study above, I’ve demonstrated that for this problem, threading is the first way to pursue speeding up the OmniSci.jl load table methods. While pre-allocating the size of the output array and using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt; did show some slight speedups, using threads to perform the calculations are where the largest improvements occurred. Incorporating the pre-allocation step naturally comes out from the way I wrote the threading methods, so I’ll incorporate that too. Disabling bounds-checking on arrays using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt; seems more dangerous than it is worth, even though none of these methods should ever get outside of their bounds.&lt;/p&gt;

&lt;p&gt;Overall, I hope this post has demonstrated that you don’t have to fancy yourself a high-frequency trader or a bit-twiddler to find ways to improve your Julia code. The first step is reading the manuals for benchmarking, and then like any other pursuit, the only way to get a feeling for what works is to try things.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All of the code for this blog post can be found in this &lt;a href=&quot;https://gist.github.com/randyzwitch/dbe9ce13aa819a1306d62610bb58b173&quot;&gt;GitHub gist&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
</description>
        
        <pubDate>Mon, 16 Dec 2019 00:00:00 +0000</pubDate>
        <link>
        http://randyzwitch.com/benchmarktools-julia-benchmarking/</link>
        <guid isPermaLink="true">http://randyzwitch.com/benchmarktools-julia-benchmarking/</guid>
        <content type="html" xml:base="/benchmarktools-julia-benchmarking/">&lt;p&gt;For the number of years I’ve been programming using Julia, I’ve never really been concerned with performance. Which is to say, I’ve appreciated that &lt;em&gt;other people&lt;/em&gt; are interested in performance and have proven that Julia can be as fast as any other performance language out there. But I’ve never been one to pour over the &lt;a href=&quot;https://docs.julialang.org/en/v1/manual/performance-tips/&quot;&gt;Performance Tips&lt;/a&gt; section of the Julia manual trying to squeeze every last bit of performance.&lt;/p&gt;

&lt;p&gt;But now that I’ve released &lt;a href=&quot;https://www.omnisci.com/blog/announcing-omnisci.jl-a-julia-client-for-omnisci&quot;&gt;OmniSci.jl&lt;/a&gt;, and as a company one of our major selling points is &lt;a href=&quot;https://www.omnisci.com/platform&quot;&gt;accelerated analytics&lt;/a&gt;, I figured it was time to stop assuming I wrote decent-ish code and really pay attention to performance. This post highlights my experience as a beginner, and hopefully will show how others can get started in learning to optimize their Julia code.&lt;/p&gt;

&lt;h2 id=&quot;read-the-manuals&quot;&gt;Read The Manuals!&lt;/h2&gt;

&lt;p&gt;As I mentioned above, I’ve written Julia for many years now, and in that time I’ve grown up with many of the tips in the performance tips section of the documentation. Things like &lt;a href=&quot;https://docs.julialang.org/en/v1/manual/performance-tips/#Write-%22type-stable%22-functions-1&quot;&gt;“write type stable functions”&lt;/a&gt; and &lt;a href=&quot;https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-global-variables-1&quot;&gt;“avoid global variables”&lt;/a&gt; are things that I’ve internalized as good programming practices, as opposed to doing them just because they are performant. But with this long familiarity with the language comes laziness, and by not reading the BenchmarkTools.jl documentation, I started off benchmarking incorrectly. Consider this example:&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Random&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OmniSci&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Base&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Threads&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;#change defaults, since examples long-running&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DEFAULT_PARAMETERS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seconds&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;
&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DEFAULT_PARAMETERS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;samples&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;#generate test data&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;gendata&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;typemin&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;typemax&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;gendata&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generic&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; with&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int64_10x6&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gendata&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int64&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;#Test whether broadcasting more/less efficient than pre-allocating results array&lt;/span&gt;
       &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; preallocate&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

           &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OmniSci&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TStringValue&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;undef&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

           &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OmniSci&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TStringValue&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;
           &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

           &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;
       &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;preallocate&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generic&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; with&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;@benchmark&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v61&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OmniSci&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TStringValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64_10x6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Trial&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;memory&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mf&quot;&gt;297.55&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MiB&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;allocs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;6000005&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;minimum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;750.146&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;median&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;      &lt;span class=&quot;mf&quot;&gt;1.014&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;29.38&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;        &lt;span class=&quot;mf&quot;&gt;1.151&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;28.38&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;maximum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;1.794&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;43.06&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;samples&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;          &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sample&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;@benchmark&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v62&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;preallocate&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64_10x6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Trial&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;memory&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mf&quot;&gt;297.55&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MiB&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;allocs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;6000002&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;minimum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;753.877&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;median&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;      &lt;span class=&quot;mf&quot;&gt;1.021&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;28.30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;        &lt;span class=&quot;mf&quot;&gt;1.158&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;28.10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;maximum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;1.806&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;43.17&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;samples&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;          &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sample&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The benchmark above tests whether it’s worth &lt;a href=&quot;https://docs.julialang.org/en/v1/manual/performance-tips/#Pre-allocating-outputs-1&quot;&gt;pre-allocating the results array&lt;/a&gt; vs. using the more convenient &lt;a href=&quot;https://docs.julialang.org/en/v1/manual/functions/#man-vectorized-1&quot;&gt;dot broadcasting syntax&lt;/a&gt;. The idea here is that growing an array over and over can be inefficient when you know the result size at the outset. Yet, comparing the times above, for all statistics pre-allocating the array is &lt;em&gt;slightly worse&lt;/em&gt;, even though we’re passing the compiler more knowledge up front. This didn’t sit well with me, so I consulted the BenchmarkTools.jl manual and found the following about &lt;a href=&quot;https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#interpolating-values-into-benchmark-expressions&quot;&gt;variable interpolation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A good rule of thumb is that &lt;strong&gt;external variables should be explicitly interpolated into the benchmark expression&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Interpolating the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int64_10x6&lt;/code&gt; input array into the function takes it from being a global variable to a local, and sure enough, we see roughly a &lt;strong&gt;6% improvement&lt;/strong&gt; in the minimum time when we pre-allocate the array:&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;@benchmark&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v61i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OmniSci&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TStringValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64_10x6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Trial&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;memory&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mf&quot;&gt;297.55&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MiB&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;allocs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;6000002&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;minimum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;763.817&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;median&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;      &lt;span class=&quot;mf&quot;&gt;960.446&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;24.02&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;        &lt;span class=&quot;mf&quot;&gt;1.178&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;28.68&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;maximum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;1.886&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;45.11&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;samples&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;          &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sample&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;@benchmark&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v62i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;preallocate&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64_10x6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BenchmarkTools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Trial&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;memory&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mf&quot;&gt;297.55&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MiB&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;allocs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;estimate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;6000002&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;minimum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;721.597&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;median&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;      &lt;span class=&quot;mf&quot;&gt;1.072&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;30.45&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;        &lt;span class=&quot;mf&quot;&gt;1.234&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;32.92&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;maximum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mf&quot;&gt;1.769&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;44.51&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;--------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;samples&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;          &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sample&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Whether that 6% improvement will hold up over time or not, at least conceptually we’re no longer worse off for pre-allocating, which fits my mental model of how things should work.&lt;/p&gt;

&lt;h2 id=&quot;evaluate-your-benchmark-over-the-range-of-inputs-you-care-about&quot;&gt;Evaluate Your Benchmark Over the Range of Inputs You Care About&lt;/h2&gt;

&lt;p&gt;In the comparison above, I evaluate the benchmark over &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10^6&lt;/code&gt; observations. How did I choose 1 million as the “right” number of events to test, instead of just testing 1 or 10 events? My general goal for benchmarking this code is to speed up the methods of loading data into an OmniSciDB database. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TStringValue&lt;/code&gt; is one of the internal methods as part of doing a row-wise table load, converting whatever data is present in an array or DataFrame from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;::Type{T}&lt;/code&gt; into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;String&lt;/code&gt; (think iterating over a text file by line). Since users trying to accelerate their database operations are probably going to be using millions to billions of data points, I’m interested in understanding how the functions are performing at these volumes of data.&lt;/p&gt;

&lt;p&gt;The other conscious decision I made was the environment to test on. I could test this on massive CPU- and GPU-enabled servers, but I’m testing this on my Dell XPS 15 laptop. Why?  Because I’m actually interested in how things are performing under more real-world conditions for a realistic user. Testing the performance characteristics of a high-end server with tons of memory and cores would be fun, but I want to make sure any performance improvements are broadly applicable, instead of just because I am throwing more hardware at the problem.&lt;/p&gt;

&lt;p&gt;Less important to me to control for was garbage collection, using a fresh session before each measurement or other “best case scenario” optimizations. I would expect my users to be more analytics and data science focused, so re-using the same session is going to be common. If the performance improvements aren’t completely obvious, I’m not going to incorporate them into the codebase.&lt;/p&gt;

&lt;h2 id=&quot;case-study-speeding-up-tstringvalue&quot;&gt;Case Study: Speeding Up &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TStringValue&lt;/code&gt;&lt;/h2&gt;

&lt;p&gt;For my test, I evaluate the following as the methods to benchmark:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;broadcasting: current library default&lt;/li&gt;
  &lt;li&gt;pre-allocating result array&lt;/li&gt;
  &lt;li&gt;pre-allocated result array with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt; macro&lt;/li&gt;
  &lt;li&gt;pre-allocated result array with threads&lt;/li&gt;
  &lt;li&gt;pre-allocated result array with threads and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;10x6-observations&quot;&gt;10x6 observations&lt;/h3&gt;

&lt;div id=&quot;ts_106&quot; style=&quot;height:400px;width:950px;&quot;&gt;&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;

    // Initialize after dom ready
    var myChart = echarts.init(document.getElementById(&quot;ts_106&quot;));

    // Load data into the ECharts instance
    myChart.setOption(
{&quot;xAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;data&quot;:[&quot;broadcast&quot;,&quot;pre-allocate&quot;,&quot;pre-allocate/inbounds&quot;,&quot;threads&quot;,&quot;threads/inbounds&quot;],&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:30,&quot;silent&quot;:true,&quot;type&quot;:&quot;category&quot;}],&quot;ec_charttype&quot;:&quot;xy plot&quot;,&quot;series&quot;:[{&quot;name&quot;:&quot;Min&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:false,&quot;data&quot;:[752.568,748.719,738.013,249.117,241.585],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{}},&quot;type&quot;:&quot;bar&quot;},{&quot;name&quot;:&quot;Median&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:false,&quot;data&quot;:[990.071,988.012,967.184,253.161,246.792],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{}},&quot;type&quot;:&quot;bar&quot;}],&quot;theme&quot;:{&quot;geo&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;parallel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;markPoint&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}}},&quot;visualMap&quot;:{&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#e7dbc3&quot;]},&quot;funnel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;bar&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0},&quot;emphasis&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0}}},&quot;map&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;scatter&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;pie&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;graph&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}},&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;,&quot;width&quot;:1}}},&quot;backgroundColor&quot;:&quot;rgba(0,0,0,0)&quot;,&quot;line&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;candlestick&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor0&quot;:&quot;#b8d2c7&quot;,&quot;color&quot;:&quot;#e01f54&quot;,&quot;borderColor&quot;:&quot;#f5e8c8&quot;,&quot;borderWidth&quot;:1,&quot;color0&quot;:&quot;#001852&quot;}}},&quot;sankey&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;valueAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;toolbox&quot;:{&quot;iconStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#999999&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#666666&quot;}}},&quot;categoryAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:false,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;tooltip&quot;:{&quot;axisPointer&quot;:{&quot;crossStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1}}},&quot;timeline&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}}},&quot;controlStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5}},&quot;checkpointStyle&quot;:{&quot;color&quot;:&quot;#e43c59&quot;,&quot;borderColor&quot;:&quot;rgba(194,53,49,0.5)&quot;},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:1},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#a9334c&quot;}},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;width&quot;:1}},&quot;radar&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;logAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;textStyle&quot;:{},&quot;gauge&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;boxplot&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;title&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;subtextStyle&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;}},&quot;dataZoom&quot;:{&quot;dataBackgroundColor&quot;:&quot;rgba(47,69,84,0.3)&quot;,&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;handleSize&quot;:&quot;100%&quot;,&quot;handleColor&quot;:&quot;#a7b7cc&quot;,&quot;fillerColor&quot;:&quot;rgba(167,183,204,0.4)&quot;,&quot;backgroundColor&quot;:&quot;rgba(47,69,84,0)&quot;},&quot;timeAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;legend&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;}}},&quot;yAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:50,&quot;silent&quot;:true,&quot;type&quot;:&quot;value&quot;}],&quot;toolbox&quot;:{&quot;feature&quot;:{},&quot;orient&quot;:&quot;vertical&quot;,&quot;itemSize&quot;:15,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;z&quot;:2,&quot;itemGap&quot;:20,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;center&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;showTitle&quot;:true},&quot;ec_width&quot;:1000,&quot;ec_height&quot;:500,&quot;tooltip&quot;:{&quot;triggerOn&quot;:&quot;mousemove&quot;,&quot;enterable&quot;:true,&quot;borderColor&quot;:&quot;#333&quot;,&quot;transitionDuration&quot;:0.4,&quot;hideDelay&quot;:100,&quot;padding&quot;:5,&quot;showDelay&quot;:0,&quot;borderWidth&quot;:0,&quot;showContent&quot;:true,&quot;backgroundColor&quot;:&quot;rgba(50,50,50,0.7)&quot;,&quot;trigger&quot;:&quot;item&quot;,&quot;alwaysShowContent&quot;:false,&quot;confine&quot;:false,&quot;show&quot;:true},&quot;grid&quot;:[{&quot;height&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;width&quot;:&quot;auto&quot;,&quot;backgroundColor&quot;:&quot;transparent&quot;}],&quot;aria&quot;:{&quot;show&quot;:true},&quot;title&quot;:[{&quot;left&quot;:&quot;left&quot;,&quot;borderColor&quot;:&quot;transparent&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;padding&quot;:5,&quot;zlevel&quot;:0,&quot;borderWidth&quot;:1,&quot;target&quot;:&quot;blank&quot;,&quot;z&quot;:2,&quot;itemGap&quot;:5,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;subtarget&quot;:&quot;blank&quot;,&quot;show&quot;:true}],&quot;ec_renderer&quot;:&quot;canvas&quot;,&quot;legend&quot;:{&quot;itemWidth&quot;:25,&quot;data&quot;:[&quot;Min&quot;,&quot;Median&quot;],&quot;borderColor&quot;:&quot;transparent&quot;,&quot;orient&quot;:&quot;horizontal&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;padding&quot;:5,&quot;borderWidth&quot;:1,&quot;inactiveColor&quot;:&quot;#ccc&quot;,&quot;z&quot;:2,&quot;align&quot;:&quot;auto&quot;,&quot;itemGap&quot;:10,&quot;itemHeight&quot;:14,&quot;backgroundColor&quot;:&quot;transparent&quot;,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;selectedMode&quot;:true,&quot;show&quot;:true}} );
&lt;/script&gt;

&lt;p&gt;For the first three on the left, these are comparisons of the single-threaded methods. You can see that pre-allocating the output array is marginally faster than broadcasting, and using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt; macro is incrementally faster still, but neither method provides enough speedup to be worth implementing. The difference between the red and the blue bars represents a garbage collection occurring, but again, the three methods aren’t different enough to notice anything interesting.&lt;/p&gt;

&lt;p&gt;For the multi-threaded tests, I’m using 6 threads (one per physical core), and we’re seeing roughly a &lt;strong&gt;3x speedup&lt;/strong&gt;. Like the single-threaded tests above, using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt; is only marginally faster, but not enough to widely implement for the cost of increased code complexity. Interestingly, doing these multi-threaded benchmarks didn’t trigger garbage collect &lt;em&gt;at all&lt;/em&gt; across my five iterations; not sure if this is specific due to threading or not, but something to explore outside of this blog post.&lt;/p&gt;

&lt;h3 id=&quot;10x7-observations&quot;&gt;10x7 observations&lt;/h3&gt;

&lt;p&gt;To see how these calculation methods might change at a larger scale, I bumped up the observations by an order of 10 and saw the following results:&lt;/p&gt;

&lt;div id=&quot;ts_108&quot; style=&quot;height:400px;width:950px;&quot;&gt;&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;

    // Initialize after dom ready
    var myChart = echarts.init(document.getElementById(&quot;ts_108&quot;));

    // Load data into the ECharts instance
    myChart.setOption(
{&quot;xAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;data&quot;:[&quot;broadcast&quot;,&quot;pre-allocate&quot;,&quot;pre-allocate/inbounds&quot;,&quot;threads&quot;,&quot;threads/inbounds&quot;],&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:30,&quot;silent&quot;:true,&quot;type&quot;:&quot;category&quot;}],&quot;ec_charttype&quot;:&quot;xy plot&quot;,&quot;series&quot;:[{&quot;name&quot;:&quot;Min&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:false,&quot;data&quot;:[26.316,27.064,26.219,2.717,2.641],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{}},&quot;type&quot;:&quot;bar&quot;},{&quot;name&quot;:&quot;Median&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:false,&quot;data&quot;:[39.332,38.925,39.387,17.659,16.659],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{}},&quot;type&quot;:&quot;bar&quot;}],&quot;theme&quot;:{&quot;geo&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;parallel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;markPoint&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}}},&quot;visualMap&quot;:{&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#e7dbc3&quot;]},&quot;funnel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;bar&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0},&quot;emphasis&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0}}},&quot;map&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;scatter&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;pie&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;graph&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}},&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;,&quot;width&quot;:1}}},&quot;backgroundColor&quot;:&quot;rgba(0,0,0,0)&quot;,&quot;line&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;candlestick&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor0&quot;:&quot;#b8d2c7&quot;,&quot;color&quot;:&quot;#e01f54&quot;,&quot;borderColor&quot;:&quot;#f5e8c8&quot;,&quot;borderWidth&quot;:1,&quot;color0&quot;:&quot;#001852&quot;}}},&quot;sankey&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;valueAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;toolbox&quot;:{&quot;iconStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#999999&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#666666&quot;}}},&quot;categoryAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:false,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;tooltip&quot;:{&quot;axisPointer&quot;:{&quot;crossStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1}}},&quot;timeline&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}}},&quot;controlStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5}},&quot;checkpointStyle&quot;:{&quot;color&quot;:&quot;#e43c59&quot;,&quot;borderColor&quot;:&quot;rgba(194,53,49,0.5)&quot;},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:1},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#a9334c&quot;}},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;width&quot;:1}},&quot;radar&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;logAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;textStyle&quot;:{},&quot;gauge&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;boxplot&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;title&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;subtextStyle&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;}},&quot;dataZoom&quot;:{&quot;dataBackgroundColor&quot;:&quot;rgba(47,69,84,0.3)&quot;,&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;handleSize&quot;:&quot;100%&quot;,&quot;handleColor&quot;:&quot;#a7b7cc&quot;,&quot;fillerColor&quot;:&quot;rgba(167,183,204,0.4)&quot;,&quot;backgroundColor&quot;:&quot;rgba(47,69,84,0)&quot;},&quot;timeAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;legend&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;}}},&quot;yAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:50,&quot;silent&quot;:true,&quot;type&quot;:&quot;value&quot;}],&quot;toolbox&quot;:{&quot;feature&quot;:{},&quot;orient&quot;:&quot;vertical&quot;,&quot;itemSize&quot;:15,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;z&quot;:2,&quot;itemGap&quot;:20,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;center&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;showTitle&quot;:true},&quot;ec_width&quot;:1000,&quot;ec_height&quot;:500,&quot;tooltip&quot;:{&quot;triggerOn&quot;:&quot;mousemove&quot;,&quot;enterable&quot;:true,&quot;borderColor&quot;:&quot;#333&quot;,&quot;transitionDuration&quot;:0.4,&quot;hideDelay&quot;:100,&quot;padding&quot;:5,&quot;showDelay&quot;:0,&quot;borderWidth&quot;:0,&quot;showContent&quot;:true,&quot;backgroundColor&quot;:&quot;rgba(50,50,50,0.7)&quot;,&quot;trigger&quot;:&quot;item&quot;,&quot;alwaysShowContent&quot;:false,&quot;confine&quot;:false,&quot;show&quot;:true},&quot;grid&quot;:[{&quot;height&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;width&quot;:&quot;auto&quot;,&quot;backgroundColor&quot;:&quot;transparent&quot;}],&quot;aria&quot;:{&quot;show&quot;:true},&quot;color&quot;:[&quot;#10222B&quot;,&quot;#95AB63&quot;,&quot;#BDD684&quot;,&quot;#E2F0D6&quot;,&quot;#F6FFE0&quot;],&quot;title&quot;:[{&quot;left&quot;:&quot;left&quot;,&quot;borderColor&quot;:&quot;transparent&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;padding&quot;:5,&quot;zlevel&quot;:0,&quot;borderWidth&quot;:1,&quot;target&quot;:&quot;blank&quot;,&quot;z&quot;:2,&quot;itemGap&quot;:5,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;subtarget&quot;:&quot;blank&quot;,&quot;show&quot;:true}],&quot;ec_renderer&quot;:&quot;canvas&quot;,&quot;legend&quot;:{&quot;itemWidth&quot;:25,&quot;data&quot;:[&quot;Min&quot;,&quot;Median&quot;],&quot;borderColor&quot;:&quot;transparent&quot;,&quot;orient&quot;:&quot;horizontal&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;padding&quot;:5,&quot;borderWidth&quot;:1,&quot;inactiveColor&quot;:&quot;#ccc&quot;,&quot;z&quot;:2,&quot;align&quot;:&quot;auto&quot;,&quot;itemGap&quot;:10,&quot;itemHeight&quot;:14,&quot;backgroundColor&quot;:&quot;transparent&quot;,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;selectedMode&quot;:true,&quot;show&quot;:true}});
&lt;/script&gt;

&lt;p&gt;Like at the 1 million data range, there isn’t much difference between the three single-threaded methods. All three of them are within a few percentage in either direction (all three methods triggered garbage collection in each of their five runs).&lt;/p&gt;

&lt;p&gt;For the multi-threaded tests, an interesting performance scenario emerged. Like the 1 million point tests, it’s possible to get a run where garbage collection isn’t triggered, which leads to a large min/median difference in the multi-threaded tests. If you can avoid garbage collection, using six threads here gives nearly a &lt;strong&gt;10x speedup&lt;/strong&gt;, and at the median where both single-threaded and multi-threaded trigger garbage collection you still get a &lt;strong&gt;2x speedup&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 id=&quot;parallelism--compiler-hinting&quot;&gt;Parallelism &amp;gt; Compiler Hinting&lt;/h2&gt;

&lt;p&gt;In the case study above, I’ve demonstrated that for this problem, threading is the first way to pursue speeding up the OmniSci.jl load table methods. While pre-allocating the size of the output array and using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt; did show some slight speedups, using threads to perform the calculations are where the largest improvements occurred. Incorporating the pre-allocation step naturally comes out from the way I wrote the threading methods, so I’ll incorporate that too. Disabling bounds-checking on arrays using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@inbounds&lt;/code&gt; seems more dangerous than it is worth, even though none of these methods should ever get outside of their bounds.&lt;/p&gt;

&lt;p&gt;Overall, I hope this post has demonstrated that you don’t have to fancy yourself a high-frequency trader or a bit-twiddler to find ways to improve your Julia code. The first step is reading the manuals for benchmarking, and then like any other pursuit, the only way to get a feeling for what works is to try things.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All of the code for this blog post can be found in this &lt;a href=&quot;https://gist.github.com/randyzwitch/dbe9ce13aa819a1306d62610bb58b173&quot;&gt;GitHub gist&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>Parallelizing Distance Calculations Using A GPU With CUDAnative.jl</title>
        
          <description>&lt;p&gt;Hacker News discussion: &lt;a href=&quot;https://news.ycombinator.com/item?id=15021244&quot;&gt;link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://randyzwitch.com/notebooks/cudanative_haversine_julia_example.ipynb&quot;&gt;Code as Julia Jupyter Notebook&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Julia has the reputation as a “fast” language in that it’s possible to write high-performing code. However, what I appreciate most about Julia is not just that the code is fast, but rather that Julia makes high-performance concepts &lt;em&gt;accessible&lt;/em&gt; without having to have a deep computer science or compiled language background (neither of which I possess!)&lt;/p&gt;

&lt;p&gt;For version 0.6 of Julia, another milestone has been reached in the “accessible” high-performance category: the ability to &lt;a href=&quot;https://julialang.org/blog/2017/03/cudanative&quot;&gt;run Julia code natively on NVIDIA GPUs&lt;/a&gt; through the &lt;a href=&quot;https://github.com/JuliaGPU/CUDAnative.jl&quot;&gt;CUDAnative.jl&lt;/a&gt; package. While CUDAnative.jl is still very much in its development stages, the package is already far-enough along that within a few hours, as a complete beginner to GPU programming, I was able to see in excess of 20x speedups for my toy example to calculate haversine distance.&lt;/p&gt;

&lt;h2 id=&quot;getting-started&quot;&gt;Getting Started&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://julialang.org/blog/2017/03/cudanative&quot;&gt;CUDAnative.jl introduction blog post&lt;/a&gt; and &lt;a href=&quot;http://juliagpu.github.io/CUDAnative.jl/stable/#Installation-1&quot;&gt;documentation&lt;/a&gt; cover the installation process in-depth, so I won’t repeat the details here. I’m already a regular compile-from-source Julia user and I found the installation process pretty easy on my &lt;a href=&quot;http://randyzwitch.com/building-data-science-workstation-2017/&quot;&gt;CUDA-enabled Ubuntu workstation&lt;/a&gt;. If you can already do TensorFlow, Keras or other GPU tutorials on your computer, getting CUDAnative.jl to work shouldn’t take more than 10-15 minutes.&lt;/p&gt;

&lt;h2 id=&quot;julia-cpu-implementation&quot;&gt;Julia CPU Implementation&lt;/h2&gt;

&lt;p&gt;To get a feel for what sort of speedup I could expect from using a GPU, I wrote a naive implementation of a distance matrix calculation in Julia:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#https://github.com/quinnj/Rosetta-Julia/blob/master/src/Haversine.jl&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;haversine&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lon1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lon2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;6372.8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asin&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sind&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cosd&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cosd&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sind&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lon2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; pairwise_dist&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;})&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Pre-allocate, since size is known&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Brute force fill in each cell, ignore that distance [i,j] = distance [j,i]&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
            &lt;span class=&quot;nd&quot;&gt;@inbounds&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;haversine&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Example benchmark call&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;lat10000&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;lon10000&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.*&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;120&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@time&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;native_julia_cellwise&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pairwise_dist&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The above code takes a pair of lat/lon values, then calculates the &lt;a href=&quot;https://rosettacode.org/wiki/Haversine_formula&quot;&gt;haversine distance&lt;/a&gt; between the two points. This algorithm is naive in that a distance matrix is symmetric (i.e. the distance between A to B is the same from B to A), so I could’ve done half the work by setting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;result[i,j]&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;result[j,i]&lt;/code&gt; to the same value, but as a measure of work for a benchmark this toy example is fine. Also note that this implementation runs on a single core, no CPU-core-level parallelization has been implemented.&lt;/p&gt;

&lt;p&gt;Or to put all that another way: if someone wanted to tackle this problem without thinking very hard, the implementation might look like this.&lt;/p&gt;

&lt;h2 id=&quot;cudanativejl-implementation&quot;&gt;CUDAnative.jl Implementation&lt;/h2&gt;

&lt;p&gt;There are two parts to the CUDAnative.jl implementation: the kernel (i.e. the actual calculation) and the boilerplate code for coordinating the writing to/from the CPU and GPU.&lt;/p&gt;

&lt;h4 id=&quot;kernel-code&quot;&gt;Kernel Code&lt;/h4&gt;

&lt;p&gt;The kernel code has similarities to the CPU implementation, with a few key differences:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Method signature is one lat/lon point vs. the lat/lon vectors, rather than a pairwise distance calculation&lt;/li&gt;
  &lt;li&gt;Boilerplate code for thread index on the GPU (0-indexed vs. normal Julia 1-indexing)&lt;/li&gt;
  &lt;li&gt;The trigonometric functions need to be prepended with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CUDAnative.&lt;/code&gt;, to differentiate that the GPU functions aren’t the same as the functions from Base Julia&lt;/li&gt;
  &lt;li&gt;Rather than return an array as part of the function return, we use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;out&lt;/code&gt; keyword argument to write directly to the GPU memory&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAdrv&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Calculate one point vs. all other points simultaneously&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; kernel_haversine&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;latpoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lonpoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;AbstractVector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;AbstractVector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;AbstractVector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;})&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Thread index&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;#Need to do the n-1 dance, since CUDA expects 0 and Julia does 1-indexing&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;blockIdx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blockDim&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threadIdx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;6372.8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;asin&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sind&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;latpoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cosd&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cosd&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;latpoint&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sind&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lonpoint&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Return nothing, since we're writing directly to the out array allocated on GPU&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h4 id=&quot;coordination-code&quot;&gt;Coordination Code&lt;/h4&gt;

&lt;p&gt;The coordination code is similar to what you might see in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main()&lt;/code&gt; function in C or Java, where the kernel is applied to the input data. I am using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dev&lt;/code&gt; keyword with the default value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CuDevice(0)&lt;/code&gt; to indicate that the code should be run on the first (in my case, only) GPU device.&lt;/p&gt;

&lt;p&gt;The remainder of the code has comments on its purpose, primarily:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Transfer Julia CPU arrays to GPU arrays (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CuArray&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;Set number of threads/blocks&lt;/li&gt;
  &lt;li&gt;Calculate distance between a point and all other points in the array, write back to CPU&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#validated kernel_haversine/distmat returns same answer as CPU haversine method (not shown)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; distmat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;};&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dev&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CuDevice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CuDevice&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Create a context&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CuContext&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dev&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Change to objects with CUDA context&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;d_lat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CuArray&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;d_lon&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CuArray&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;d_out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CuArray&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Calculate number of calculations, threads, blocks&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;len&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;threads&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ceil&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;threads&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Julia side accumulation of results to relieve GPU memory pressure&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;accum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# run and time the test&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;secs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAdrv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nd&quot;&gt;@elapsed&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;begin&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
            &lt;span class=&quot;nd&quot;&gt;@cuda&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threads&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kernel_haversine&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d_lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d_lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d_out&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;accum&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_out&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Clean up context&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;destroy!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Return timing and bring results back to Julia&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;secs&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accum&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Example benchmark call&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;timing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;distmat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;≈&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;native_julia_cellwise&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;#validate results equivalent CPU and GPU&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The code is written to process one row of the distance matrix at a time to minimize GPU memory usage. By writing out the results to the CPU after each loop iteration, I have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n-1&lt;/code&gt; extra CPU transfers, which is less performant than calculating all the distances first then transferring, but my consumer-grade GPU with 6GB of RAM would run out of GPU memory before completing the calculation otherwise.&lt;/p&gt;

&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;

&lt;p&gt;The performance characteristics of the CPU and GPU calculations are below for various sizes of distance matrices. Having not done any GPU calculations before, I was surprised to see how much of a penalty there is writing back and forth to the GPU. As you can see from the navy-blue line, the execution time is fixed for matrices of size 1 to 1000, representing the fixed cost of moving the data from the CPU to the GPU.&lt;/p&gt;

&lt;p&gt;Of course, once we get above 1000x1000 matrices, the GPU really starts to shine. Due to the log scale, it’s a bit hard to see the magnitude differences, but at 100000x100000 there is a &lt;strong&gt;23x&lt;/strong&gt; reduction in execution time (565.008s CPU vs. 24.32s GPU).&lt;/p&gt;

&lt;div id=&quot;linep&quot; style=&quot;height:400px;width:800px;&quot;&gt;&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
    // Initialize after dom ready
    var myChart = echarts.init(document.getElementById(&quot;linep&quot;));

    // Load data into the ECharts instance
    myChart.setOption(
{&quot;xAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLine&quot;:{&quot;show&quot;:false,&quot;onZero&quot;:true,&quot;lineStyle&quot;:{&quot;normal&quot;:{},&quot;emphasis&quot;:{}}},&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;scale&quot;:true,&quot;gridIndex&quot;:0,&quot;name&quot;:&quot;Matrix dimensions (square)&quot;,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;splitLine&quot;:{&quot;show&quot;:false,&quot;interval&quot;:&quot;auto&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{},&quot;emphasis&quot;:{}}},&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:30,&quot;silent&quot;:true,&quot;type&quot;:&quot;log&quot;}],&quot;ec_charttype&quot;:&quot;xy plot&quot;,&quot;series&quot;:[{&quot;name&quot;:&quot;CPU&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:true,&quot;data&quot;:[[1.0,6.0e-6],[10.0,1.7e-5],[100.0,0.001091],[1000.0,0.090409],[10000.0,5.620437],[100000.0,565.008425]],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{&quot;normal&quot;:{},&quot;emphasis&quot;:{}}},&quot;large&quot;:true,&quot;type&quot;:&quot;line&quot;,&quot;largeThreshold&quot;:2000},{&quot;name&quot;:&quot;GPU&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:true,&quot;data&quot;:[[1.0,0.14232168],[10.0,0.15084915],[100.0,0.15897949],[1000.0,0.16998644],[10000.0,0.6376571],[100000.0,24.32015]],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{&quot;normal&quot;:{},&quot;emphasis&quot;:{}}},&quot;large&quot;:true,&quot;type&quot;:&quot;line&quot;,&quot;largeThreshold&quot;:2000}],&quot;theme&quot;:{&quot;geo&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;parallel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;markPoint&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}}},&quot;visualMap&quot;:{&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#e7dbc3&quot;]},&quot;funnel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;bar&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0},&quot;emphasis&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0}}},&quot;map&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;scatter&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;pie&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;graph&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}},&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;,&quot;width&quot;:1}}},&quot;backgroundColor&quot;:&quot;rgba(0,0,0,0)&quot;,&quot;line&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;candlestick&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor0&quot;:&quot;#b8d2c7&quot;,&quot;color&quot;:&quot;#e01f54&quot;,&quot;borderColor&quot;:&quot;#f5e8c8&quot;,&quot;borderWidth&quot;:1,&quot;color0&quot;:&quot;#001852&quot;}}},&quot;sankey&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;valueAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;toolbox&quot;:{&quot;iconStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#999999&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#666666&quot;}}},&quot;categoryAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:false,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;tooltip&quot;:{&quot;axisPointer&quot;:{&quot;crossStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1}}},&quot;timeline&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}}},&quot;controlStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5}},&quot;checkpointStyle&quot;:{&quot;color&quot;:&quot;#e43c59&quot;,&quot;borderColor&quot;:&quot;rgba(194,53,49,0.5)&quot;},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:1},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#a9334c&quot;}},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;width&quot;:1}},&quot;radar&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;logAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;textStyle&quot;:{},&quot;gauge&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;boxplot&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1},&quot;emphasis&quot;:{&quot;borderWidth&quot;:2}}},&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;title&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;subtextStyle&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;}},&quot;dataZoom&quot;:{&quot;dataBackgroundColor&quot;:&quot;rgba(47,69,84,0.3)&quot;,&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;handleSize&quot;:&quot;100%&quot;,&quot;handleColor&quot;:&quot;#a7b7cc&quot;,&quot;fillerColor&quot;:&quot;rgba(167,183,204,0.4)&quot;,&quot;backgroundColor&quot;:&quot;rgba(47,69,84,0)&quot;},&quot;timeAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;legend&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;}}},&quot;yAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLine&quot;:{&quot;show&quot;:false,&quot;onZero&quot;:true,&quot;lineStyle&quot;:{&quot;normal&quot;:{},&quot;emphasis&quot;:{}}},&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;scale&quot;:true,&quot;gridIndex&quot;:0,&quot;name&quot;:&quot;Time in seconds&quot;,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:50,&quot;silent&quot;:true,&quot;type&quot;:&quot;log&quot;}],&quot;toolbox&quot;:{&quot;feature&quot;:{},&quot;orient&quot;:&quot;vertical&quot;,&quot;itemSize&quot;:15,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;z&quot;:2,&quot;itemGap&quot;:20,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;center&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;showTitle&quot;:true},&quot;ec_width&quot;:800,&quot;ec_height&quot;:400,&quot;grid&quot;:[{&quot;height&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;width&quot;:&quot;auto&quot;,&quot;backgroundColor&quot;:&quot;transparent&quot;}],&quot;title&quot;:[{&quot;left&quot;:&quot;center&quot;,&quot;borderColor&quot;:&quot;transparent&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;padding&quot;:5,&quot;zlevel&quot;:0,&quot;borderWidth&quot;:1,&quot;target&quot;:&quot;blank&quot;,&quot;z&quot;:2,&quot;itemGap&quot;:5,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;subtarget&quot;:&quot;blank&quot;,&quot;textStyle&quot;:{&quot;fontFamily&quot;:&quot;sans-serif&quot;,&quot;fontStyle&quot;:&quot;normal&quot;,&quot;color&quot;:&quot;#000&quot;,&quot;fontSize&quot;:14,&quot;fontWeight&quot;:&quot;normal&quot;},&quot;show&quot;:true,&quot;text&quot;:&quot;Haversine distance: CPU vs. GPU&quot;}],&quot;legend&quot;:{&quot;itemWidth&quot;:25,&quot;data&quot;:[&quot;CPU&quot;,&quot;GPU&quot;],&quot;borderColor&quot;:&quot;transparent&quot;,&quot;orient&quot;:&quot;horizontal&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;padding&quot;:5,&quot;borderWidth&quot;:1,&quot;inactiveColor&quot;:&quot;#ccc&quot;,&quot;z&quot;:2,&quot;align&quot;:&quot;auto&quot;,&quot;itemGap&quot;:10,&quot;itemHeight&quot;:14,&quot;backgroundColor&quot;:&quot;transparent&quot;,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;right&quot;,&quot;top&quot;:&quot;middle&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;selectedMode&quot;:true,&quot;show&quot;:true}}
);
&lt;/script&gt;

&lt;h2 id=&quot;what-i-learned&quot;&gt;What I Learned&lt;/h2&gt;

&lt;p&gt;There are myriad things I learned from this project, but most important is that GPGPU processing can be accessible for people like myself without a CS background. Julia isn’t the first high-level language to provide CUDA functionality, but the fact that the code is so similar to native Julia makes GPU computing something I can include in my toolbox &lt;em&gt;today&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Over time, I’m sure I’ll get better results as I learn more about CUDA, as CUDAnative.jl continues to smooth out the rough edges, etc. But the fact that as a beginner that I could achieve such large speedups in just an hour or two of coding and sparse CUDAnative.jl documentation bodes well for the future of GPU computing in Julia.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://randyzwitch.com/notebooks/cudanative_haversine_julia_example.ipynb&quot;&gt;Code as Julia Jupyter Notebook&lt;/a&gt;&lt;/p&gt;
</description>
        
        <pubDate>Mon, 14 Aug 2017 00:00:00 +0000</pubDate>
        <link>
        http://randyzwitch.com/cudanative-jl-julia/</link>
        <guid isPermaLink="true">http://randyzwitch.com/cudanative-jl-julia/</guid>
        <content type="html" xml:base="/cudanative-jl-julia/">&lt;p&gt;Hacker News discussion: &lt;a href=&quot;https://news.ycombinator.com/item?id=15021244&quot;&gt;link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://randyzwitch.com/notebooks/cudanative_haversine_julia_example.ipynb&quot;&gt;Code as Julia Jupyter Notebook&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Julia has the reputation as a “fast” language in that it’s possible to write high-performing code. However, what I appreciate most about Julia is not just that the code is fast, but rather that Julia makes high-performance concepts &lt;em&gt;accessible&lt;/em&gt; without having to have a deep computer science or compiled language background (neither of which I possess!)&lt;/p&gt;

&lt;p&gt;For version 0.6 of Julia, another milestone has been reached in the “accessible” high-performance category: the ability to &lt;a href=&quot;https://julialang.org/blog/2017/03/cudanative&quot;&gt;run Julia code natively on NVIDIA GPUs&lt;/a&gt; through the &lt;a href=&quot;https://github.com/JuliaGPU/CUDAnative.jl&quot;&gt;CUDAnative.jl&lt;/a&gt; package. While CUDAnative.jl is still very much in its development stages, the package is already far-enough along that within a few hours, as a complete beginner to GPU programming, I was able to see in excess of 20x speedups for my toy example to calculate haversine distance.&lt;/p&gt;

&lt;h2 id=&quot;getting-started&quot;&gt;Getting Started&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://julialang.org/blog/2017/03/cudanative&quot;&gt;CUDAnative.jl introduction blog post&lt;/a&gt; and &lt;a href=&quot;http://juliagpu.github.io/CUDAnative.jl/stable/#Installation-1&quot;&gt;documentation&lt;/a&gt; cover the installation process in-depth, so I won’t repeat the details here. I’m already a regular compile-from-source Julia user and I found the installation process pretty easy on my &lt;a href=&quot;http://randyzwitch.com/building-data-science-workstation-2017/&quot;&gt;CUDA-enabled Ubuntu workstation&lt;/a&gt;. If you can already do TensorFlow, Keras or other GPU tutorials on your computer, getting CUDAnative.jl to work shouldn’t take more than 10-15 minutes.&lt;/p&gt;

&lt;h2 id=&quot;julia-cpu-implementation&quot;&gt;Julia CPU Implementation&lt;/h2&gt;

&lt;p&gt;To get a feel for what sort of speedup I could expect from using a GPU, I wrote a naive implementation of a distance matrix calculation in Julia:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#https://github.com/quinnj/Rosetta-Julia/blob/master/src/Haversine.jl&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;haversine&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lon1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lon2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;6372.8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asin&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sind&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cosd&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cosd&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sind&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lon2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; pairwise_dist&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;})&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Pre-allocate, since size is known&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Brute force fill in each cell, ignore that distance [i,j] = distance [j,i]&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
            &lt;span class=&quot;nd&quot;&gt;@inbounds&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;haversine&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Example benchmark call&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;lat10000&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;lon10000&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.*&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;120&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@time&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;native_julia_cellwise&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pairwise_dist&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The above code takes a pair of lat/lon values, then calculates the &lt;a href=&quot;https://rosettacode.org/wiki/Haversine_formula&quot;&gt;haversine distance&lt;/a&gt; between the two points. This algorithm is naive in that a distance matrix is symmetric (i.e. the distance between A to B is the same from B to A), so I could’ve done half the work by setting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;result[i,j]&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;result[j,i]&lt;/code&gt; to the same value, but as a measure of work for a benchmark this toy example is fine. Also note that this implementation runs on a single core, no CPU-core-level parallelization has been implemented.&lt;/p&gt;

&lt;p&gt;Or to put all that another way: if someone wanted to tackle this problem without thinking very hard, the implementation might look like this.&lt;/p&gt;

&lt;h2 id=&quot;cudanativejl-implementation&quot;&gt;CUDAnative.jl Implementation&lt;/h2&gt;

&lt;p&gt;There are two parts to the CUDAnative.jl implementation: the kernel (i.e. the actual calculation) and the boilerplate code for coordinating the writing to/from the CPU and GPU.&lt;/p&gt;

&lt;h4 id=&quot;kernel-code&quot;&gt;Kernel Code&lt;/h4&gt;

&lt;p&gt;The kernel code has similarities to the CPU implementation, with a few key differences:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Method signature is one lat/lon point vs. the lat/lon vectors, rather than a pairwise distance calculation&lt;/li&gt;
  &lt;li&gt;Boilerplate code for thread index on the GPU (0-indexed vs. normal Julia 1-indexing)&lt;/li&gt;
  &lt;li&gt;The trigonometric functions need to be prepended with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CUDAnative.&lt;/code&gt;, to differentiate that the GPU functions aren’t the same as the functions from Base Julia&lt;/li&gt;
  &lt;li&gt;Rather than return an array as part of the function return, we use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;out&lt;/code&gt; keyword argument to write directly to the GPU memory&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAdrv&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Calculate one point vs. all other points simultaneously&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; kernel_haversine&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;latpoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lonpoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;AbstractVector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;AbstractVector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;AbstractVector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;})&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Thread index&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;#Need to do the n-1 dance, since CUDA expects 0 and Julia does 1-indexing&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;blockIdx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blockDim&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threadIdx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;6372.8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;asin&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sind&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;latpoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cosd&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cosd&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;latpoint&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAnative&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sind&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lonpoint&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Return nothing, since we're writing directly to the out array allocated on GPU&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h4 id=&quot;coordination-code&quot;&gt;Coordination Code&lt;/h4&gt;

&lt;p&gt;The coordination code is similar to what you might see in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main()&lt;/code&gt; function in C or Java, where the kernel is applied to the input data. I am using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dev&lt;/code&gt; keyword with the default value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CuDevice(0)&lt;/code&gt; to indicate that the code should be run on the first (in my case, only) GPU device.&lt;/p&gt;

&lt;p&gt;The remainder of the code has comments on its purpose, primarily:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Transfer Julia CPU arrays to GPU arrays (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CuArray&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;Set number of threads/blocks&lt;/li&gt;
  &lt;li&gt;Calculate distance between a point and all other points in the array, write back to CPU&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#validated kernel_haversine/distmat returns same answer as CPU haversine method (not shown)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; distmat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;};&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dev&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CuDevice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CuDevice&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Create a context&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CuContext&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dev&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Change to objects with CUDA context&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;d_lat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CuArray&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;d_lon&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CuArray&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;d_out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CuArray&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Calculate number of calculations, threads, blocks&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;len&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;threads&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ceil&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;threads&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Julia side accumulation of results to relieve GPU memory pressure&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;accum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# run and time the test&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;secs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CUDAdrv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nd&quot;&gt;@elapsed&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;begin&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
            &lt;span class=&quot;nd&quot;&gt;@cuda&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threads&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kernel_haversine&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d_lat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d_lon&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d_out&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;accum&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_out&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Clean up context&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;destroy!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;#Return timing and bring results back to Julia&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;secs&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accum&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Example benchmark call&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;timing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;distmat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon10000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;≈&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;native_julia_cellwise&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;#validate results equivalent CPU and GPU&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The code is written to process one row of the distance matrix at a time to minimize GPU memory usage. By writing out the results to the CPU after each loop iteration, I have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n-1&lt;/code&gt; extra CPU transfers, which is less performant than calculating all the distances first then transferring, but my consumer-grade GPU with 6GB of RAM would run out of GPU memory before completing the calculation otherwise.&lt;/p&gt;

&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;

&lt;p&gt;The performance characteristics of the CPU and GPU calculations are below for various sizes of distance matrices. Having not done any GPU calculations before, I was surprised to see how much of a penalty there is writing back and forth to the GPU. As you can see from the navy-blue line, the execution time is fixed for matrices of size 1 to 1000, representing the fixed cost of moving the data from the CPU to the GPU.&lt;/p&gt;

&lt;p&gt;Of course, once we get above 1000x1000 matrices, the GPU really starts to shine. Due to the log scale, it’s a bit hard to see the magnitude differences, but at 100000x100000 there is a &lt;strong&gt;23x&lt;/strong&gt; reduction in execution time (565.008s CPU vs. 24.32s GPU).&lt;/p&gt;

&lt;div id=&quot;linep&quot; style=&quot;height:400px;width:800px;&quot;&gt;&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
    // Initialize after dom ready
    var myChart = echarts.init(document.getElementById(&quot;linep&quot;));

    // Load data into the ECharts instance
    myChart.setOption(
{&quot;xAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLine&quot;:{&quot;show&quot;:false,&quot;onZero&quot;:true,&quot;lineStyle&quot;:{&quot;normal&quot;:{},&quot;emphasis&quot;:{}}},&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;scale&quot;:true,&quot;gridIndex&quot;:0,&quot;name&quot;:&quot;Matrix dimensions (square)&quot;,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;splitLine&quot;:{&quot;show&quot;:false,&quot;interval&quot;:&quot;auto&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{},&quot;emphasis&quot;:{}}},&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:30,&quot;silent&quot;:true,&quot;type&quot;:&quot;log&quot;}],&quot;ec_charttype&quot;:&quot;xy plot&quot;,&quot;series&quot;:[{&quot;name&quot;:&quot;CPU&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:true,&quot;data&quot;:[[1.0,6.0e-6],[10.0,1.7e-5],[100.0,0.001091],[1000.0,0.090409],[10000.0,5.620437],[100000.0,565.008425]],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{&quot;normal&quot;:{},&quot;emphasis&quot;:{}}},&quot;large&quot;:true,&quot;type&quot;:&quot;line&quot;,&quot;largeThreshold&quot;:2000},{&quot;name&quot;:&quot;GPU&quot;,&quot;yAxisIndex&quot;:0,&quot;xAxisIndex&quot;:0,&quot;smooth&quot;:true,&quot;data&quot;:[[1.0,0.14232168],[10.0,0.15084915],[100.0,0.15897949],[1000.0,0.16998644],[10000.0,0.6376571],[100000.0,24.32015]],&quot;markLine&quot;:{&quot;data&quot;:[],&quot;lineStyle&quot;:{&quot;normal&quot;:{},&quot;emphasis&quot;:{}}},&quot;large&quot;:true,&quot;type&quot;:&quot;line&quot;,&quot;largeThreshold&quot;:2000}],&quot;theme&quot;:{&quot;geo&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;parallel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;markPoint&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}}},&quot;visualMap&quot;:{&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#e7dbc3&quot;]},&quot;funnel&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;bar&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0},&quot;emphasis&quot;:{&quot;barBorderColor&quot;:&quot;#ccc&quot;,&quot;barBorderWidth&quot;:0}}},&quot;map&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#000000&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;rgb(100,0,0)&quot;}}},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:0.5,&quot;areaColor&quot;:&quot;#eeeeee&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#444444&quot;,&quot;borderWidth&quot;:1,&quot;areaColor&quot;:&quot;rgba(255,215,0,0.8)&quot;}}},&quot;scatter&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;pie&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;graph&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#eeeeee&quot;}}},&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;,&quot;width&quot;:1}}},&quot;backgroundColor&quot;:&quot;rgba(0,0,0,0)&quot;,&quot;line&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;candlestick&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderColor0&quot;:&quot;#b8d2c7&quot;,&quot;color&quot;:&quot;#e01f54&quot;,&quot;borderColor&quot;:&quot;#f5e8c8&quot;,&quot;borderWidth&quot;:1,&quot;color0&quot;:&quot;#001852&quot;}}},&quot;sankey&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;valueAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;toolbox&quot;:{&quot;iconStyle&quot;:{&quot;normal&quot;:{&quot;borderColor&quot;:&quot;#999999&quot;},&quot;emphasis&quot;:{&quot;borderColor&quot;:&quot;#666666&quot;}}},&quot;categoryAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:false,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;tooltip&quot;:{&quot;axisPointer&quot;:{&quot;crossStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#cccccc&quot;,&quot;width&quot;:1}}},&quot;timeline&quot;:{&quot;label&quot;:{&quot;normal&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}},&quot;emphasis&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;}}},&quot;controlStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderColor&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:0.5}},&quot;checkpointStyle&quot;:{&quot;color&quot;:&quot;#e43c59&quot;,&quot;borderColor&quot;:&quot;rgba(194,53,49,0.5)&quot;},&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;borderWidth&quot;:1},&quot;emphasis&quot;:{&quot;color&quot;:&quot;#a9334c&quot;}},&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#293c55&quot;,&quot;width&quot;:1}},&quot;radar&quot;:{&quot;symbolSize&quot;:4,&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1}},&quot;smooth&quot;:false,&quot;symbol&quot;:&quot;emptyCircle&quot;,&quot;lineStyle&quot;:{&quot;normal&quot;:{&quot;width&quot;:2}}},&quot;logAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;textStyle&quot;:{},&quot;gauge&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;},&quot;emphasis&quot;:{&quot;borderWidth&quot;:0,&quot;borderColor&quot;:&quot;#ccc&quot;}}},&quot;boxplot&quot;:{&quot;itemStyle&quot;:{&quot;normal&quot;:{&quot;borderWidth&quot;:1},&quot;emphasis&quot;:{&quot;borderWidth&quot;:2}}},&quot;color&quot;:[&quot;#e01f54&quot;,&quot;#001852&quot;,&quot;#f5e8c8&quot;,&quot;#b8d2c7&quot;,&quot;#c6b38e&quot;,&quot;#a4d8c2&quot;,&quot;#f3d999&quot;,&quot;#d3758f&quot;,&quot;#dcc392&quot;,&quot;#2e4783&quot;,&quot;#82b6e9&quot;,&quot;#ff6347&quot;,&quot;#a092f1&quot;,&quot;#0a915d&quot;,&quot;#eaf889&quot;,&quot;#6699FF&quot;,&quot;#ff6666&quot;,&quot;#3cb371&quot;,&quot;#d5b158&quot;,&quot;#38b6b6&quot;],&quot;title&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;subtextStyle&quot;:{&quot;color&quot;:&quot;#aaaaaa&quot;}},&quot;dataZoom&quot;:{&quot;dataBackgroundColor&quot;:&quot;rgba(47,69,84,0.3)&quot;,&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;},&quot;handleSize&quot;:&quot;100%&quot;,&quot;handleColor&quot;:&quot;#a7b7cc&quot;,&quot;fillerColor&quot;:&quot;rgba(167,183,204,0.4)&quot;,&quot;backgroundColor&quot;:&quot;rgba(47,69,84,0)&quot;},&quot;timeAxis&quot;:{&quot;axisLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}},&quot;axisLabel&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333&quot;},&quot;show&quot;:true},&quot;splitLine&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:[&quot;#ccc&quot;]}},&quot;splitArea&quot;:{&quot;areaStyle&quot;:{&quot;color&quot;:[&quot;rgba(250,250,250,0.3)&quot;,&quot;rgba(200,200,200,0.3)&quot;]},&quot;show&quot;:false},&quot;axisTick&quot;:{&quot;show&quot;:true,&quot;lineStyle&quot;:{&quot;color&quot;:&quot;#333&quot;}}},&quot;legend&quot;:{&quot;textStyle&quot;:{&quot;color&quot;:&quot;#333333&quot;}}},&quot;yAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;axisLine&quot;:{&quot;show&quot;:false,&quot;onZero&quot;:true,&quot;lineStyle&quot;:{&quot;normal&quot;:{},&quot;emphasis&quot;:{}}},&quot;axisLabel&quot;:{&quot;show&quot;:true,&quot;interval&quot;:&quot;auto&quot;,&quot;rotate&quot;:0,&quot;inside&quot;:false,&quot;formatter&quot;:&quot;{value}&quot;,&quot;margin&quot;:8},&quot;scale&quot;:true,&quot;gridIndex&quot;:0,&quot;name&quot;:&quot;Time in seconds&quot;,&quot;minInterval&quot;:0,&quot;zlevel&quot;:0,&quot;triggerEvent&quot;:false,&quot;z&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:50,&quot;silent&quot;:true,&quot;type&quot;:&quot;log&quot;}],&quot;toolbox&quot;:{&quot;feature&quot;:{},&quot;orient&quot;:&quot;vertical&quot;,&quot;itemSize&quot;:15,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;z&quot;:2,&quot;itemGap&quot;:20,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;center&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;showTitle&quot;:true},&quot;ec_width&quot;:800,&quot;ec_height&quot;:400,&quot;grid&quot;:[{&quot;height&quot;:&quot;auto&quot;,&quot;show&quot;:false,&quot;width&quot;:&quot;auto&quot;,&quot;backgroundColor&quot;:&quot;transparent&quot;}],&quot;title&quot;:[{&quot;left&quot;:&quot;center&quot;,&quot;borderColor&quot;:&quot;transparent&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;padding&quot;:5,&quot;zlevel&quot;:0,&quot;borderWidth&quot;:1,&quot;target&quot;:&quot;blank&quot;,&quot;z&quot;:2,&quot;itemGap&quot;:5,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;subtarget&quot;:&quot;blank&quot;,&quot;textStyle&quot;:{&quot;fontFamily&quot;:&quot;sans-serif&quot;,&quot;fontStyle&quot;:&quot;normal&quot;,&quot;color&quot;:&quot;#000&quot;,&quot;fontSize&quot;:14,&quot;fontWeight&quot;:&quot;normal&quot;},&quot;show&quot;:true,&quot;text&quot;:&quot;Haversine distance: CPU vs. GPU&quot;}],&quot;legend&quot;:{&quot;itemWidth&quot;:25,&quot;data&quot;:[&quot;CPU&quot;,&quot;GPU&quot;],&quot;borderColor&quot;:&quot;transparent&quot;,&quot;orient&quot;:&quot;horizontal&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;padding&quot;:5,&quot;borderWidth&quot;:1,&quot;inactiveColor&quot;:&quot;#ccc&quot;,&quot;z&quot;:2,&quot;align&quot;:&quot;auto&quot;,&quot;itemGap&quot;:10,&quot;itemHeight&quot;:14,&quot;backgroundColor&quot;:&quot;transparent&quot;,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;right&quot;,&quot;top&quot;:&quot;middle&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;selectedMode&quot;:true,&quot;show&quot;:true}}
);
&lt;/script&gt;

&lt;h2 id=&quot;what-i-learned&quot;&gt;What I Learned&lt;/h2&gt;

&lt;p&gt;There are myriad things I learned from this project, but most important is that GPGPU processing can be accessible for people like myself without a CS background. Julia isn’t the first high-level language to provide CUDA functionality, but the fact that the code is so similar to native Julia makes GPU computing something I can include in my toolbox &lt;em&gt;today&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Over time, I’m sure I’ll get better results as I learn more about CUDA, as CUDAnative.jl continues to smooth out the rough edges, etc. But the fact that as a beginner that I could achieve such large speedups in just an hour or two of coding and sparse CUDAnative.jl documentation bodes well for the future of GPU computing in Julia.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://randyzwitch.com/notebooks/cudanative_haversine_julia_example.ipynb&quot;&gt;Code as Julia Jupyter Notebook&lt;/a&gt;&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>WordPress to Jekyll: A 30x Speedup</title>
        
          <description>&lt;p&gt;About a month ago, I switched this blog from WordPress hosted on Bluehost to Jekyll on GitHub Pages. I suspected moving to a static website would be faster than generated HTML via PHP, and it is certainly cheaper (GitHub Pages is “free”). But it wasn’t until I needed a dataset for doing some dataset visualization development that I realize how much of an improvement it has been!&lt;/p&gt;

&lt;h2 id=&quot;packages-packages-packages&quot;&gt;Packages, Packages, Packages&lt;/h2&gt;

&lt;p&gt;With the release of v0.5 of Julia, I’ve been working (less) on updating my packages and making new packages (more), because making new stuff is more fun than maintaining old stuff! One of the packages I’ve been building is for the &lt;a href=&quot;http://echarts.baidu.com/&quot;&gt;ECharts visualization library&lt;/a&gt; (v3) from Baidu. While Julia doesn’t necessarily need another visualization library, visualization is something I’m interested in and learning is easier when you’re solving problems you like. And since the world doesn’t need another Iris example, I decided to share some real world website performance data :)&lt;/p&gt;

&lt;h2 id=&quot;line-chart&quot;&gt;Line Chart&lt;/h2&gt;

&lt;p&gt;One of the first features I developed for &lt;a href=&quot;https://github.com/randyzwitch/ECharts.jl&quot;&gt;ECharts.jl&lt;/a&gt; was X-Y charts, which I posit is the most common chart type in business. One thing that is great about the underlying ECharts JavaScript library is that interactivity is really easy to achieve:&lt;/p&gt;

&lt;div id=&quot;linep&quot; style=&quot;height:400px;width:800px;&quot;&gt;&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
    // Initialize after dom ready
    var myChart = echarts.init(document.getElementById(&quot;linep&quot;));

    // Load data into the ECharts instance
    myChart.setOption({&quot;xAxis&quot;:[{&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;splitNumber&quot;:5,&quot;minInterval&quot;:0,&quot;silent&quot;:true,&quot;data&quot;:[&quot;2016-06-20&quot;,&quot;2016-06-21&quot;,&quot;2016-06-22&quot;,&quot;2016-06-23&quot;,&quot;2016-06-24&quot;,&quot;2016-06-25&quot;,&quot;2016-06-26&quot;,&quot;2016-06-27&quot;,&quot;2016-06-28&quot;,&quot;2016-06-29&quot;,&quot;2016-06-30&quot;,&quot;2016-07-01&quot;,&quot;2016-07-02&quot;,&quot;2016-07-03&quot;,&quot;2016-07-04&quot;,&quot;2016-07-05&quot;,&quot;2016-07-06&quot;,&quot;2016-07-07&quot;,&quot;2016-07-08&quot;,&quot;2016-07-09&quot;,&quot;2016-07-10&quot;,&quot;2016-07-11&quot;,&quot;2016-07-12&quot;,&quot;2016-07-13&quot;,&quot;2016-07-14&quot;,&quot;2016-07-15&quot;,&quot;2016-07-16&quot;,&quot;2016-07-17&quot;,&quot;2016-07-18&quot;,&quot;2016-07-19&quot;,&quot;2016-07-20&quot;,&quot;2016-07-21&quot;,&quot;2016-07-22&quot;,&quot;2016-07-23&quot;,&quot;2016-07-24&quot;,&quot;2016-07-25&quot;,&quot;2016-07-26&quot;,&quot;2016-07-27&quot;,&quot;2016-07-28&quot;,&quot;2016-07-29&quot;,&quot;2016-07-30&quot;,&quot;2016-07-31&quot;,&quot;2016-08-01&quot;,&quot;2016-08-02&quot;,&quot;2016-08-03&quot;,&quot;2016-08-04&quot;,&quot;2016-08-05&quot;,&quot;2016-08-06&quot;,&quot;2016-08-07&quot;,&quot;2016-08-08&quot;,&quot;2016-08-09&quot;,&quot;2016-08-10&quot;,&quot;2016-08-11&quot;,&quot;2016-08-12&quot;,&quot;2016-08-13&quot;,&quot;2016-08-14&quot;,&quot;2016-08-15&quot;,&quot;2016-08-16&quot;,&quot;2016-08-17&quot;,&quot;2016-08-18&quot;,&quot;2016-08-19&quot;,&quot;2016-08-20&quot;,&quot;2016-08-21&quot;,&quot;2016-08-22&quot;,&quot;2016-08-23&quot;,&quot;2016-08-24&quot;,&quot;2016-08-25&quot;,&quot;2016-08-26&quot;,&quot;2016-08-27&quot;,&quot;2016-08-28&quot;,&quot;2016-08-29&quot;,&quot;2016-08-30&quot;,&quot;2016-08-31&quot;,&quot;2016-09-01&quot;,&quot;2016-09-02&quot;,&quot;2016-09-03&quot;,&quot;2016-09-04&quot;,&quot;2016-09-05&quot;,&quot;2016-09-06&quot;,&quot;2016-09-07&quot;,&quot;2016-09-08&quot;,&quot;2016-09-09&quot;,&quot;2016-09-10&quot;,&quot;2016-09-11&quot;,&quot;2016-09-12&quot;,&quot;2016-09-13&quot;,&quot;2016-09-14&quot;,&quot;2016-09-15&quot;,&quot;2016-09-16&quot;,&quot;2016-09-17&quot;,&quot;2016-09-18&quot;,&quot;2016-09-19&quot;,&quot;2016-09-20&quot;,&quot;2016-09-21&quot;,&quot;2016-09-22&quot;,&quot;2016-09-23&quot;,&quot;2016-09-24&quot;,&quot;2016-09-25&quot;,&quot;2016-09-26&quot;,&quot;2016-09-27&quot;,&quot;2016-09-28&quot;,&quot;2016-09-29&quot;,&quot;2016-09-30&quot;,&quot;2016-10-01&quot;,&quot;2016-10-02&quot;,&quot;2016-10-03&quot;,&quot;2016-10-04&quot;,&quot;2016-10-05&quot;,&quot;2016-10-06&quot;,&quot;2016-10-07&quot;],&quot;inverse&quot;:false,&quot;type&quot;:&quot;category&quot;,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:30}],&quot;yAxis&quot;:[{&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;name&quot;:&quot;Load time in ms&quot;,&quot;splitNumber&quot;:5,&quot;minInterval&quot;:0,&quot;silent&quot;:true,&quot;inverse&quot;:false,&quot;type&quot;:&quot;value&quot;,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:50}],&quot;toolbox&quot;:{&quot;feature&quot;:{&quot;dataView&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Data View&quot;,&quot;lang&quot;:[&quot;Data View&quot;,&quot;Cancel&quot;,&quot;Refresh&quot;]},&quot;restore&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Restore&quot;},&quot;saveAsImage&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Save As PNG&quot;},&quot;magicType&quot;:{&quot;show&quot;:true,&quot;title&quot;:{&quot;line&quot;:&quot;Line&quot;,&quot;bar&quot;:&quot;Bar&quot;,&quot;tiled&quot;:&quot;Tiled&quot;,&quot;chord&quot;:&quot;Chord&quot;,&quot;stack&quot;:&quot;Stack&quot;,&quot;pie&quot;:&quot;Pie&quot;,&quot;force&quot;:&quot;Force&quot;,&quot;funnel&quot;:&quot;Funnel&quot;},&quot;type&quot;:[&quot;bar&quot;,&quot;line&quot;]}},&quot;itemSize&quot;:15,&quot;orient&quot;:&quot;vertical&quot;,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;z&quot;:2,&quot;itemGap&quot;:20,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;center&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;show&quot;:true,&quot;showTitle&quot;:true},&quot;ec_width&quot;:800,&quot;ec_height&quot;:400,&quot;ec_charttype&quot;:&quot;xy plot&quot;,&quot;color&quot;:[&quot;#2C3E50&quot;,&quot;#E74C3C&quot;,&quot;#ECF0F1&quot;,&quot;#3498DB&quot;,&quot;#2980B9&quot;],&quot;title&quot;:[{&quot;left&quot;:&quot;left&quot;,&quot;borderColor&quot;:&quot;transparent&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;padding&quot;:5,&quot;zlevel&quot;:0,&quot;borderWidth&quot;:1,&quot;target&quot;:&quot;blank&quot;,&quot;z&quot;:2,&quot;itemGap&quot;:5,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;subtext&quot;:&quot;Switching from WordPress on Bluehost to Jekyll on GitHub (2016/09/06)&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;subtarget&quot;:&quot;blank&quot;,&quot;show&quot;:true,&quot;text&quot;:&quot;randyzwitch.com&quot;}],&quot;dataZoom&quot;:[{&quot;show&quot;:true}],&quot;series&quot;:[{&quot;name&quot;:&quot;loadtime_ms&quot;,&quot;data&quot;:[1282,1728,1047,1111,1027,643,757,1049,1201,1265,1617,1145,614,673,1023,1323,1117,1048,904,647,830,761,759,607,1141,1022,864,743,866,1328,1147,973,1178,1093,927,998,1195,1167,1023,1329,1051,929,1037,897,1197,1179,1402,1018,605,2261,2059,2383,2402,1385,2068,2290,2627,1862,2494,2753,1556,898,926,1158,1253,1403,655,497,544,526,503,575,545,628,467,518,568,513,386,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],&quot;smooth&quot;:false,&quot;minSize&quot;:&quot;0%&quot;,&quot;type&quot;:&quot;line&quot;,&quot;maxSize&quot;:&quot;100%&quot;},{&quot;name&quot;:&quot;post&quot;,&quot;data&quot;:[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,386,629,533,453,279,193,83,45,40,46,44,29,34,46,36,29,32,40,35,32,47,43,36,38,36,26,35,35,35,32,40,33],&quot;smooth&quot;:false,&quot;minSize&quot;:&quot;0%&quot;,&quot;type&quot;:&quot;line&quot;,&quot;maxSize&quot;:&quot;100%&quot;}]});
&lt;/script&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ECharts&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrames&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Read in data&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readtable&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/assets/data/website_time_data.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Make data two different series that overlap, so endpoint touches&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;2016-09-06&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loadtime_ms&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;2016-09-06&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loadtime_ms&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])]&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Graph code&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hcat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ec_width&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;800&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;seriesnames!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;loadtime_ms&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;post&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;colorscheme!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;palette&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;acw&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;FlatUI&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;yAxis!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Load time in ms&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;title!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;randyzwitch.com&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;subtext&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Switching from WordPress on Bluehost to Jekyll on GitHub (2016/09/06)&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;toolbox!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;chartTypes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;bar&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;line&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;slider!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Even though I switched to Jekyll on WordPress on 9/6/2016, it appears that the page cache for Google Webmaster Tools didn’t really expire until 9/12/2016 or so. At the average case, the load time went from 1128ms to 38ms! Of course, this isn’t really a &lt;em&gt;fair&lt;/em&gt; comparison, as presumably GitHub Pages runs on much better hardware than the cheap Bluehost hosting I have, and I didn’t reimplement most of the garbage I had on the WordPress version of the blog. But from a user-experience standpoint, good lord what an improvement!&lt;/p&gt;

&lt;h2 id=&quot;box-plots&quot;&gt;Box Plots&lt;/h2&gt;

&lt;p&gt;Want to test out further functionality, here are some box plots of the load time variation:&lt;/p&gt;

&lt;div id=&quot;boxp&quot; style=&quot;height:400px;width:800px;&quot;&gt;&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
    // Initialize after dom ready
    var myChartp = echarts.init(document.getElementById(&quot;boxp&quot;));

    // Load data into the ECharts instance
    myChartp.setOption({&quot;xAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;boundaryGap&quot;:true,&quot;data&quot;:[&quot;WordPress&quot;,&quot;Jekyll&quot;],&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;minInterval&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:30,&quot;silent&quot;:true,&quot;type&quot;:&quot;category&quot;}],&quot;yAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;name&quot;:&quot;Load time in ms&quot;,&quot;minInterval&quot;:0,&quot;min&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:50,&quot;silent&quot;:true,&quot;type&quot;:&quot;value&quot;}],&quot;toolbox&quot;:{&quot;feature&quot;:{&quot;dataView&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Data View&quot;,&quot;lang&quot;:[&quot;Data View&quot;,&quot;Cancel&quot;,&quot;Refresh&quot;]},&quot;restore&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Restore&quot;},&quot;saveAsImage&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Save As PNG&quot;}},&quot;itemSize&quot;:15,&quot;orient&quot;:&quot;vertical&quot;,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;z&quot;:2,&quot;itemGap&quot;:20,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;center&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;show&quot;:true,&quot;showTitle&quot;:true},&quot;ec_width&quot;:800,&quot;ec_height&quot;:400,&quot;ec_charttype&quot;:&quot;box&quot;,&quot;color&quot;:[&quot;#004358&quot;,&quot;#1F8A70&quot;,&quot;#BEDB39&quot;,&quot;#FFE11A&quot;,&quot;#FD7400&quot;],&quot;title&quot;:[{&quot;left&quot;:&quot;left&quot;,&quot;borderColor&quot;:&quot;transparent&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;padding&quot;:5,&quot;zlevel&quot;:0,&quot;borderWidth&quot;:1,&quot;target&quot;:&quot;blank&quot;,&quot;z&quot;:2,&quot;itemGap&quot;:5,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;subtext&quot;:&quot;Switching from WordPress on Bluehost to Jekyll on GitHub (2016/09/06)&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;subtarget&quot;:&quot;blank&quot;,&quot;show&quot;:true,&quot;text&quot;:&quot;randyzwitch.com&quot;}],&quot;series&quot;:[{&quot;name&quot;:&quot;boxplot&quot;,&quot;data&quot;:[[-35.25,750.0,1037.0,1273.5,2058.75],[19.75,33.25,36.0,42.25,55.75]],&quot;smooth&quot;:false,&quot;minSize&quot;:&quot;0%&quot;,&quot;type&quot;:&quot;boxplot&quot;,&quot;maxSize&quot;:&quot;100%&quot;},{&quot;name&quot;:&quot;outliers&quot;,&quot;data&quot;:[[&quot;WordPress&quot;,2261.0],[&quot;WordPress&quot;,2059.0],[&quot;WordPress&quot;,2383.0],[&quot;WordPress&quot;,2402.0],[&quot;WordPress&quot;,2068.0],[&quot;WordPress&quot;,2290.0],[&quot;WordPress&quot;,2627.0],[&quot;WordPress&quot;,2494.0],[&quot;WordPress&quot;,2753.0],[&quot;Jekyll&quot;,83.0]],&quot;smooth&quot;:false,&quot;minSize&quot;:&quot;0%&quot;,&quot;type&quot;:&quot;scatter&quot;,&quot;maxSize&quot;:&quot;100%&quot;}]}
);
&lt;/script&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ECharts&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrames&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Read in data&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readtable&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/Desktop/website_load_time.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;2016-09-06&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loadtime_ms&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;2016-09-12&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loadtime_ms&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])]&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Remove nulls&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Graph code&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;names&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;WordPress&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Jekyll&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ec_width&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;800&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;colorscheme!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;palette&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;acw&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;VitaminC&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;yAxis!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Load time in ms&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nameGap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;min&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;title!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;randyzwitch.com&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
           &lt;span class=&quot;n&quot;&gt;subtext&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Switching from WordPress on Bluehost to Jekyll on GitHub (2016/09/06)&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;toolbox!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Usually, a box plot comparison that is as smushed as the Jekyll plot vs the WordPress one would be a poor visualization, but in this case I think it actually works. The load time for the Jekyll version of this blog is so quick and so consistent that it barely registers as an outlier if it were WordPress! It’s crazy to think that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-1.5 * IQR&lt;/code&gt; time for WordPress is the mean/median/min load time of Jekyll.&lt;/p&gt;

&lt;h2 id=&quot;where-to-go-next&quot;&gt;Where To Go Next?&lt;/h2&gt;
&lt;p&gt;This blog post is really just an interesting finding from my experience moving to Jekyll on GitHub. As it stands now, ECharts.jl is stil in pre-METADATA mode. Right now, I assume that this would be a useful enough package to submit to METADATA some day, but I guess that depends on how much further I get smoothing the rough edges. If there are people who are interested in cleaning up this package further, I’d absolutely love to collaborate.&lt;/p&gt;
</description>
        
        <pubDate>Mon, 10 Oct 2016 00:00:00 +0000</pubDate>
        <link>
        http://randyzwitch.com/wordpress-jekyll-30x-speedup/</link>
        <guid isPermaLink="true">http://randyzwitch.com/wordpress-jekyll-30x-speedup/</guid>
        <content type="html" xml:base="/wordpress-jekyll-30x-speedup/">&lt;p&gt;About a month ago, I switched this blog from WordPress hosted on Bluehost to Jekyll on GitHub Pages. I suspected moving to a static website would be faster than generated HTML via PHP, and it is certainly cheaper (GitHub Pages is “free”). But it wasn’t until I needed a dataset for doing some dataset visualization development that I realize how much of an improvement it has been!&lt;/p&gt;

&lt;h2 id=&quot;packages-packages-packages&quot;&gt;Packages, Packages, Packages&lt;/h2&gt;

&lt;p&gt;With the release of v0.5 of Julia, I’ve been working (less) on updating my packages and making new packages (more), because making new stuff is more fun than maintaining old stuff! One of the packages I’ve been building is for the &lt;a href=&quot;http://echarts.baidu.com/&quot;&gt;ECharts visualization library&lt;/a&gt; (v3) from Baidu. While Julia doesn’t necessarily need another visualization library, visualization is something I’m interested in and learning is easier when you’re solving problems you like. And since the world doesn’t need another Iris example, I decided to share some real world website performance data :)&lt;/p&gt;

&lt;h2 id=&quot;line-chart&quot;&gt;Line Chart&lt;/h2&gt;

&lt;p&gt;One of the first features I developed for &lt;a href=&quot;https://github.com/randyzwitch/ECharts.jl&quot;&gt;ECharts.jl&lt;/a&gt; was X-Y charts, which I posit is the most common chart type in business. One thing that is great about the underlying ECharts JavaScript library is that interactivity is really easy to achieve:&lt;/p&gt;

&lt;div id=&quot;linep&quot; style=&quot;height:400px;width:800px;&quot;&gt;&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
    // Initialize after dom ready
    var myChart = echarts.init(document.getElementById(&quot;linep&quot;));

    // Load data into the ECharts instance
    myChart.setOption({&quot;xAxis&quot;:[{&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;splitNumber&quot;:5,&quot;minInterval&quot;:0,&quot;silent&quot;:true,&quot;data&quot;:[&quot;2016-06-20&quot;,&quot;2016-06-21&quot;,&quot;2016-06-22&quot;,&quot;2016-06-23&quot;,&quot;2016-06-24&quot;,&quot;2016-06-25&quot;,&quot;2016-06-26&quot;,&quot;2016-06-27&quot;,&quot;2016-06-28&quot;,&quot;2016-06-29&quot;,&quot;2016-06-30&quot;,&quot;2016-07-01&quot;,&quot;2016-07-02&quot;,&quot;2016-07-03&quot;,&quot;2016-07-04&quot;,&quot;2016-07-05&quot;,&quot;2016-07-06&quot;,&quot;2016-07-07&quot;,&quot;2016-07-08&quot;,&quot;2016-07-09&quot;,&quot;2016-07-10&quot;,&quot;2016-07-11&quot;,&quot;2016-07-12&quot;,&quot;2016-07-13&quot;,&quot;2016-07-14&quot;,&quot;2016-07-15&quot;,&quot;2016-07-16&quot;,&quot;2016-07-17&quot;,&quot;2016-07-18&quot;,&quot;2016-07-19&quot;,&quot;2016-07-20&quot;,&quot;2016-07-21&quot;,&quot;2016-07-22&quot;,&quot;2016-07-23&quot;,&quot;2016-07-24&quot;,&quot;2016-07-25&quot;,&quot;2016-07-26&quot;,&quot;2016-07-27&quot;,&quot;2016-07-28&quot;,&quot;2016-07-29&quot;,&quot;2016-07-30&quot;,&quot;2016-07-31&quot;,&quot;2016-08-01&quot;,&quot;2016-08-02&quot;,&quot;2016-08-03&quot;,&quot;2016-08-04&quot;,&quot;2016-08-05&quot;,&quot;2016-08-06&quot;,&quot;2016-08-07&quot;,&quot;2016-08-08&quot;,&quot;2016-08-09&quot;,&quot;2016-08-10&quot;,&quot;2016-08-11&quot;,&quot;2016-08-12&quot;,&quot;2016-08-13&quot;,&quot;2016-08-14&quot;,&quot;2016-08-15&quot;,&quot;2016-08-16&quot;,&quot;2016-08-17&quot;,&quot;2016-08-18&quot;,&quot;2016-08-19&quot;,&quot;2016-08-20&quot;,&quot;2016-08-21&quot;,&quot;2016-08-22&quot;,&quot;2016-08-23&quot;,&quot;2016-08-24&quot;,&quot;2016-08-25&quot;,&quot;2016-08-26&quot;,&quot;2016-08-27&quot;,&quot;2016-08-28&quot;,&quot;2016-08-29&quot;,&quot;2016-08-30&quot;,&quot;2016-08-31&quot;,&quot;2016-09-01&quot;,&quot;2016-09-02&quot;,&quot;2016-09-03&quot;,&quot;2016-09-04&quot;,&quot;2016-09-05&quot;,&quot;2016-09-06&quot;,&quot;2016-09-07&quot;,&quot;2016-09-08&quot;,&quot;2016-09-09&quot;,&quot;2016-09-10&quot;,&quot;2016-09-11&quot;,&quot;2016-09-12&quot;,&quot;2016-09-13&quot;,&quot;2016-09-14&quot;,&quot;2016-09-15&quot;,&quot;2016-09-16&quot;,&quot;2016-09-17&quot;,&quot;2016-09-18&quot;,&quot;2016-09-19&quot;,&quot;2016-09-20&quot;,&quot;2016-09-21&quot;,&quot;2016-09-22&quot;,&quot;2016-09-23&quot;,&quot;2016-09-24&quot;,&quot;2016-09-25&quot;,&quot;2016-09-26&quot;,&quot;2016-09-27&quot;,&quot;2016-09-28&quot;,&quot;2016-09-29&quot;,&quot;2016-09-30&quot;,&quot;2016-10-01&quot;,&quot;2016-10-02&quot;,&quot;2016-10-03&quot;,&quot;2016-10-04&quot;,&quot;2016-10-05&quot;,&quot;2016-10-06&quot;,&quot;2016-10-07&quot;],&quot;inverse&quot;:false,&quot;type&quot;:&quot;category&quot;,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:30}],&quot;yAxis&quot;:[{&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;name&quot;:&quot;Load time in ms&quot;,&quot;splitNumber&quot;:5,&quot;minInterval&quot;:0,&quot;silent&quot;:true,&quot;inverse&quot;:false,&quot;type&quot;:&quot;value&quot;,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:50}],&quot;toolbox&quot;:{&quot;feature&quot;:{&quot;dataView&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Data View&quot;,&quot;lang&quot;:[&quot;Data View&quot;,&quot;Cancel&quot;,&quot;Refresh&quot;]},&quot;restore&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Restore&quot;},&quot;saveAsImage&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Save As PNG&quot;},&quot;magicType&quot;:{&quot;show&quot;:true,&quot;title&quot;:{&quot;line&quot;:&quot;Line&quot;,&quot;bar&quot;:&quot;Bar&quot;,&quot;tiled&quot;:&quot;Tiled&quot;,&quot;chord&quot;:&quot;Chord&quot;,&quot;stack&quot;:&quot;Stack&quot;,&quot;pie&quot;:&quot;Pie&quot;,&quot;force&quot;:&quot;Force&quot;,&quot;funnel&quot;:&quot;Funnel&quot;},&quot;type&quot;:[&quot;bar&quot;,&quot;line&quot;]}},&quot;itemSize&quot;:15,&quot;orient&quot;:&quot;vertical&quot;,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;z&quot;:2,&quot;itemGap&quot;:20,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;center&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;show&quot;:true,&quot;showTitle&quot;:true},&quot;ec_width&quot;:800,&quot;ec_height&quot;:400,&quot;ec_charttype&quot;:&quot;xy plot&quot;,&quot;color&quot;:[&quot;#2C3E50&quot;,&quot;#E74C3C&quot;,&quot;#ECF0F1&quot;,&quot;#3498DB&quot;,&quot;#2980B9&quot;],&quot;title&quot;:[{&quot;left&quot;:&quot;left&quot;,&quot;borderColor&quot;:&quot;transparent&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;padding&quot;:5,&quot;zlevel&quot;:0,&quot;borderWidth&quot;:1,&quot;target&quot;:&quot;blank&quot;,&quot;z&quot;:2,&quot;itemGap&quot;:5,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;subtext&quot;:&quot;Switching from WordPress on Bluehost to Jekyll on GitHub (2016/09/06)&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;subtarget&quot;:&quot;blank&quot;,&quot;show&quot;:true,&quot;text&quot;:&quot;randyzwitch.com&quot;}],&quot;dataZoom&quot;:[{&quot;show&quot;:true}],&quot;series&quot;:[{&quot;name&quot;:&quot;loadtime_ms&quot;,&quot;data&quot;:[1282,1728,1047,1111,1027,643,757,1049,1201,1265,1617,1145,614,673,1023,1323,1117,1048,904,647,830,761,759,607,1141,1022,864,743,866,1328,1147,973,1178,1093,927,998,1195,1167,1023,1329,1051,929,1037,897,1197,1179,1402,1018,605,2261,2059,2383,2402,1385,2068,2290,2627,1862,2494,2753,1556,898,926,1158,1253,1403,655,497,544,526,503,575,545,628,467,518,568,513,386,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],&quot;smooth&quot;:false,&quot;minSize&quot;:&quot;0%&quot;,&quot;type&quot;:&quot;line&quot;,&quot;maxSize&quot;:&quot;100%&quot;},{&quot;name&quot;:&quot;post&quot;,&quot;data&quot;:[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,386,629,533,453,279,193,83,45,40,46,44,29,34,46,36,29,32,40,35,32,47,43,36,38,36,26,35,35,35,32,40,33],&quot;smooth&quot;:false,&quot;minSize&quot;:&quot;0%&quot;,&quot;type&quot;:&quot;line&quot;,&quot;maxSize&quot;:&quot;100%&quot;}]});
&lt;/script&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ECharts&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrames&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Read in data&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readtable&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/assets/data/website_time_data.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Make data two different series that overlap, so endpoint touches&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;2016-09-06&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loadtime_ms&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;2016-09-06&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loadtime_ms&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])]&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Graph code&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hcat&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ec_width&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;800&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;seriesnames!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;loadtime_ms&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;post&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;colorscheme!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;palette&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;acw&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;FlatUI&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;yAxis!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Load time in ms&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;title!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;randyzwitch.com&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;subtext&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Switching from WordPress on Bluehost to Jekyll on GitHub (2016/09/06)&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;toolbox!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;chartTypes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;bar&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;line&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;slider!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Even though I switched to Jekyll on WordPress on 9/6/2016, it appears that the page cache for Google Webmaster Tools didn’t really expire until 9/12/2016 or so. At the average case, the load time went from 1128ms to 38ms! Of course, this isn’t really a &lt;em&gt;fair&lt;/em&gt; comparison, as presumably GitHub Pages runs on much better hardware than the cheap Bluehost hosting I have, and I didn’t reimplement most of the garbage I had on the WordPress version of the blog. But from a user-experience standpoint, good lord what an improvement!&lt;/p&gt;

&lt;h2 id=&quot;box-plots&quot;&gt;Box Plots&lt;/h2&gt;

&lt;p&gt;Want to test out further functionality, here are some box plots of the load time variation:&lt;/p&gt;

&lt;div id=&quot;boxp&quot; style=&quot;height:400px;width:800px;&quot;&gt;&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
    // Initialize after dom ready
    var myChartp = echarts.init(document.getElementById(&quot;boxp&quot;));

    // Load data into the ECharts instance
    myChartp.setOption({&quot;xAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;boundaryGap&quot;:true,&quot;data&quot;:[&quot;WordPress&quot;,&quot;Jekyll&quot;],&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;minInterval&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:30,&quot;silent&quot;:true,&quot;type&quot;:&quot;category&quot;}],&quot;yAxis&quot;:[{&quot;splitNumber&quot;:5,&quot;scale&quot;:false,&quot;gridIndex&quot;:0,&quot;name&quot;:&quot;Load time in ms&quot;,&quot;minInterval&quot;:0,&quot;min&quot;:0,&quot;inverse&quot;:false,&quot;nameLocation&quot;:&quot;middle&quot;,&quot;nameGap&quot;:50,&quot;silent&quot;:true,&quot;type&quot;:&quot;value&quot;}],&quot;toolbox&quot;:{&quot;feature&quot;:{&quot;dataView&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Data View&quot;,&quot;lang&quot;:[&quot;Data View&quot;,&quot;Cancel&quot;,&quot;Refresh&quot;]},&quot;restore&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Restore&quot;},&quot;saveAsImage&quot;:{&quot;show&quot;:true,&quot;title&quot;:&quot;Save As PNG&quot;}},&quot;itemSize&quot;:15,&quot;orient&quot;:&quot;vertical&quot;,&quot;height&quot;:&quot;auto&quot;,&quot;zlevel&quot;:0,&quot;z&quot;:2,&quot;itemGap&quot;:20,&quot;right&quot;:&quot;auto&quot;,&quot;top&quot;:&quot;center&quot;,&quot;width&quot;:&quot;auto&quot;,&quot;show&quot;:true,&quot;showTitle&quot;:true},&quot;ec_width&quot;:800,&quot;ec_height&quot;:400,&quot;ec_charttype&quot;:&quot;box&quot;,&quot;color&quot;:[&quot;#004358&quot;,&quot;#1F8A70&quot;,&quot;#BEDB39&quot;,&quot;#FFE11A&quot;,&quot;#FD7400&quot;],&quot;title&quot;:[{&quot;left&quot;:&quot;left&quot;,&quot;borderColor&quot;:&quot;transparent&quot;,&quot;bottom&quot;:&quot;auto&quot;,&quot;padding&quot;:5,&quot;zlevel&quot;:0,&quot;borderWidth&quot;:1,&quot;target&quot;:&quot;blank&quot;,&quot;z&quot;:2,&quot;itemGap&quot;:5,&quot;shadowOffsetY&quot;:0,&quot;shadowOffsetX&quot;:0,&quot;right&quot;:&quot;auto&quot;,&quot;subtext&quot;:&quot;Switching from WordPress on Bluehost to Jekyll on GitHub (2016/09/06)&quot;,&quot;top&quot;:&quot;auto&quot;,&quot;subtarget&quot;:&quot;blank&quot;,&quot;show&quot;:true,&quot;text&quot;:&quot;randyzwitch.com&quot;}],&quot;series&quot;:[{&quot;name&quot;:&quot;boxplot&quot;,&quot;data&quot;:[[-35.25,750.0,1037.0,1273.5,2058.75],[19.75,33.25,36.0,42.25,55.75]],&quot;smooth&quot;:false,&quot;minSize&quot;:&quot;0%&quot;,&quot;type&quot;:&quot;boxplot&quot;,&quot;maxSize&quot;:&quot;100%&quot;},{&quot;name&quot;:&quot;outliers&quot;,&quot;data&quot;:[[&quot;WordPress&quot;,2261.0],[&quot;WordPress&quot;,2059.0],[&quot;WordPress&quot;,2383.0],[&quot;WordPress&quot;,2402.0],[&quot;WordPress&quot;,2068.0],[&quot;WordPress&quot;,2290.0],[&quot;WordPress&quot;,2627.0],[&quot;WordPress&quot;,2494.0],[&quot;WordPress&quot;,2753.0],[&quot;Jekyll&quot;,83.0]],&quot;smooth&quot;:false,&quot;minSize&quot;:&quot;0%&quot;,&quot;type&quot;:&quot;scatter&quot;,&quot;maxSize&quot;:&quot;100%&quot;}]}
);
&lt;/script&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ECharts&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrames&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Read in data&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readtable&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/Desktop/website_load_time.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;2016-09-06&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loadtime_ms&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;2016-09-12&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loadtime_ms&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])]&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Remove nulls&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nothing&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Graph code&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;names&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;WordPress&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Jekyll&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ec_width&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;800&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;colorscheme!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;palette&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;acw&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;VitaminC&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;yAxis!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Load time in ms&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nameGap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;min&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;title!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;randyzwitch.com&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
           &lt;span class=&quot;n&quot;&gt;subtext&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Switching from WordPress on Bluehost to Jekyll on GitHub (2016/09/06)&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;toolbox!&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Usually, a box plot comparison that is as smushed as the Jekyll plot vs the WordPress one would be a poor visualization, but in this case I think it actually works. The load time for the Jekyll version of this blog is so quick and so consistent that it barely registers as an outlier if it were WordPress! It’s crazy to think that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-1.5 * IQR&lt;/code&gt; time for WordPress is the mean/median/min load time of Jekyll.&lt;/p&gt;

&lt;h2 id=&quot;where-to-go-next&quot;&gt;Where To Go Next?&lt;/h2&gt;
&lt;p&gt;This blog post is really just an interesting finding from my experience moving to Jekyll on GitHub. As it stands now, ECharts.jl is stil in pre-METADATA mode. Right now, I assume that this would be a useful enough package to submit to METADATA some day, but I guess that depends on how much further I get smoothing the rough edges. If there are people who are interested in cleaning up this package further, I’d absolutely love to collaborate.&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>A Million Text Files And A Single Laptop</title>
        
          <description>&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2016/01/million-files-size.png&quot; alt=&quot;GNU Parallel Cat Unix&quot; /&gt;&lt;/p&gt;

&lt;p&gt;More often that I would like, I receive datasets where the data has only been partially cleaned, such as the picture on the right: hundreds, thousands…even millions of tiny files. Usually when this happens, the data all have the same format (such as having being generated by sensors or other memory-constrained devices).&lt;/p&gt;

&lt;p&gt;The problem with data like this is that 1) it’s inconvenient to think about a dataset as a million individual pieces 2) the data in aggregate are too large to hold in RAM but 3) the data are small enough where using Hadoop or even a relational database seems like overkill.&lt;/p&gt;

&lt;p&gt;Surprisingly, with judicious use of &lt;a href=&quot;http://www.gnu.org/software/parallel/&quot;&gt;GNU Parallel&lt;/a&gt;, stream processing and a relatively modern computer, you can efficiently process annoying, “medium-sized” data as described above.&lt;/p&gt;

&lt;h2 id=&quot;data-generation&quot;&gt;Data Generation&lt;/h2&gt;

&lt;p&gt;For this blog post, I used a combination of R and Python to generate the data: the “Groceries” dataset from the &lt;a href=&quot;https://cran.r-project.org/web/packages/arules/vignettes/arules.pdf&quot;&gt;arules&lt;/a&gt; package for sampling transactions (with replacement), and the Python &lt;a href=&quot;https://github.com/joke2k/faker&quot;&gt;Faker (fake-factory)&lt;/a&gt; package to generate fake customer profiles and for creating the 1MM+ text files:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c1&quot;&gt;#R Code
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arules&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Groceries&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Groceries&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;groceries.txt&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sep&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;,&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;#Python Code
&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csv&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;faker&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Faker&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;fake&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Faker&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Create customer file of 1,234,567 customers with fake data
# Use dataframe index as a way to generate unique customer id
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customers&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fake&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;simple_profile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1234567&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;customer_df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;customer_df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;cust_id&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;index&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;#Read in transactions file from arules package
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;grocerydata.txt&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;readlines&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;#Remove new line character
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;#Generate transactions by cust_id
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;#file format:
#cust_id::int
#store_id::int
#transaction_datetime::string/datetime
#items::string
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;#for each customer...
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1234567&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;#...create a file...
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'/transactions/custfile_%s'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'w'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csvfile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;trans&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;writer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;csvfile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delimiter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;' '&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;quotechar&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&quot;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;quoting&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;csv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;QUOTE_MINIMAL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;#...that contains all of the transactions they've ever made
&lt;/span&gt;        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;365&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)):&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;trans&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;writerow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fake&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;zipcode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fake&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date_time_this_decade&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;before_now&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;after_now&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]])&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h2 id=&quot;problem-1-concatenating-cat---outtxt-&quot;&gt;Problem 1: Concatenating (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat * &amp;gt;&amp;gt; out.txt&lt;/code&gt; ?!)&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://man7.org/linux/man-pages/man1/cat.1.html&quot;&gt;cat&lt;/a&gt; utility in Unix-y systems is familiar to most anyone who has ever opened up a Terminal window. Take some or all of the files in a folder, concatenate them together….one big file. But something funny happens once you get enough files…&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cat&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; out.txt

&lt;span class=&quot;nt&quot;&gt;-bash&lt;/span&gt;: /bin/cat: Argument list too long&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;That’s a fun thought…too many files for the computer to keep track of. As it turns out, many Unix tools will only accept about 10,000 arguments; the use of the asterisk in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat&lt;/code&gt; command gets expanded before running, so the above statement passes 1,234,567 arguments to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat&lt;/code&gt; and you get an error message.&lt;/p&gt;

&lt;p&gt;One (naive) solution would be to loop over every file (a completely serial operation):&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;span class=&quot;k&quot;&gt;for &lt;/span&gt;f &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cat&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$f&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; ../transactions_cat/transactions.csv&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;done&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Roughly &lt;strong&gt;10,093 seconds&lt;/strong&gt; later, you’ll have your concatenated file. Three hours is quite a coffee break…&lt;/p&gt;

&lt;h2 id=&quot;solution-1-gnu-parallel--concatenation&quot;&gt;Solution 1: GNU Parallel &amp;amp; Concatenation&lt;/h2&gt;

&lt;p&gt;Above, I mentioned that looping over each file gets you past the error condition of too many arguments, but it is a serial operation. If you look at your computer usage during that operation, you’ll likely see that only a fraction of a core of your computer’s CPU is being utilized. We can greatly improve that through the use of GNU Parallel:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;span class=&quot;nb&quot;&gt;ls&lt;/span&gt; | parallel &lt;span class=&quot;nt&quot;&gt;-m&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-j&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$f&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;cat {} &amp;gt;&amp;gt; ../transactions_cat/transactions.csv&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$f&lt;/code&gt; argument in the code is to highlight that you can choose the level of parallelism; however, you will not get infinitely linear scaling, as shown below (&lt;a href=&quot;https://gist.github.com/randyzwitch/ee0f738b5895e059fa2a&quot;&gt;graph code, Julia&lt;/a&gt;):&lt;/p&gt;

&lt;div id=&quot;cat&quot;&gt;
&lt;/div&gt;

&lt;p&gt;Given that the graph represents a single run at each level of parallelism, it’s a bit difficult to say &lt;em&gt;exactly&lt;/em&gt; where the parallelism gets maxed out, but at roughly 10 concurrent jobs, there’s no additional benefit. It’s also interesting to point out what the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-m&lt;/code&gt; argument represents; by specifying &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;m&lt;/code&gt;, you allow multiple arguments (i.e. multiple text files) to be passed as inputs into parallel. This &lt;em&gt;alone&lt;/em&gt; leads to an 8x speedup over the naive loop solution.&lt;/p&gt;

&lt;h2 id=&quot;problem-2-data--ram&quot;&gt;Problem 2: Data &amp;gt; RAM&lt;/h2&gt;

&lt;p&gt;Now that we have a single file, we’ve removed the “one million files” cognitive dissonance, but now we have a second problem: at 19.93GB, the amount of data exceeds the RAM in my laptop (2014 MBP, 16GB of RAM). So in order to do analysis, either a bigger machine is needed or processing has to be done in a streaming or “chunked” manner (such as using the &lt;a href=&quot;http://pandas.pydata.org/pandas-docs/stable/io.html#iterating-through-files-chunk-by-chunk&quot;&gt;“chunksize” keyword in pandas&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;But continuing on with our use of GNU Parallel, suppose we wanted to answer the following types of questions about our transactions data:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;How many unique products were sold?&lt;/li&gt;
  &lt;li&gt;How many transactions were there per day?&lt;/li&gt;
  &lt;li&gt;How many total items were sold per store, per month?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If it’s not clear from the list above, in all three questions there is an “embarrassingly parallel” portion of the computation. Let’s take a look at how to answer all three of these questions in a time- and RAM-efficient manner:&lt;/p&gt;

&lt;h5 id=&quot;q1-unique-products&quot;&gt;Q1: Unique Products&lt;/h5&gt;

&lt;p&gt;Given the format of the data file (transactions in a single column array), this question is the hardest to parallelize, but using a neat trick with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[tr](http://www.linfo.org/tr.html)&lt;/code&gt; (transliterate) utility, we can map our data to one product per row as we stream over the file:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;# Serial method (i.e. no parallelism)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# This is a simple implementation of map &amp;amp; reduce; tr statements represent one map, sort -u statements one reducer&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# cut -d ' ' -f 5- transactions.csv | \     - Using cut, take everything from the 5th column and over from the transactions.csv file&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# tr -d \&quot; | \                              - Using tr, trim off double-quotes. This leaves us with a comma-delimited string of products representing a transaction&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# sort -u | \                               - Using sort, put similar items together, but only output the unique values&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# wc -l                                     - Count number of unique lines, which after de-duping, represents number of unique products&lt;/span&gt;

&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;time cut&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;' '&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; 5- transactions.csv | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;','&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'\n'&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-u&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-l&lt;/span&gt;
331

real	292m7.116s

&lt;span class=&quot;c&quot;&gt;# Parallelized version, default chunk size of 1MB. This will use 100% of all CPUs (real and virtual)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Also map &amp;amp; reduce; tr statements a single map, sort -u statements multiple reducers (8 by default)&lt;/span&gt;

&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;time cut&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;' '&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; 5- transactions.csv | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;','&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'\n'&lt;/span&gt; | parallel &lt;span class=&quot;nt&quot;&gt;--pipe&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--block&lt;/span&gt; 1M &lt;span class=&quot;nb&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-u&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-u&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-l&lt;/span&gt;
331

&lt;span class=&quot;c&quot;&gt;# block size performance - Making block size smaller might improve performance&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Number of jobs can also be manipulated (not evaluated)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# --500K:               73m57.232s&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# --Default 1M:         75m55.268s (3.84x faster than serial)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# --2M:                 79m30.950s&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# --3M:                 80m43.311s&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The trick here is that we swap the comma-delimited transactions with the newline character; the effect of this is taking a single transaction row and returning multiple rows, one for each product. Then we pass that down the line, eventually using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sort -u&lt;/code&gt; to de-dup the list and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wc -l&lt;/code&gt; to count the number of unique lines (i.e. products).&lt;/p&gt;

&lt;p&gt;In a serial fashion, it takes quite some time to calculate the number of unique products. Incorporating GNU Parallel, just using the defaults, gives nearly a 4x speedup!&lt;/p&gt;

&lt;h5 id=&quot;q2-transactions-by-day&quot;&gt;Q2. Transactions By Day&lt;/h5&gt;

&lt;p&gt;If the file format could be considered undesirable in question 1, for question 2 the format is perfect. Since each row represents a transaction, all we need to do is perform the equivalent of a SQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Group By&lt;/code&gt; on the date and sum the rows:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;# Data is at transaction level, so just need to do equivalent of 'group by' operation&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Using cut again, we choose field 3, which is the date part of the timestamp&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# sort | uniq -c is a common pattern for doing a 'group by' count operation&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Final tr step is to trim the leading quotation mark from date string&lt;/span&gt;

&lt;span class=&quot;nb&quot;&gt;time cut&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;' '&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; 3 transactions.csv | &lt;span class=&quot;nb&quot;&gt;sort&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;uniq&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;

real	76m51.223s

&lt;span class=&quot;c&quot;&gt;# Parallelized version&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Quoting can be annoying when using parallel, so writing a Bash function is often much easier than dealing with escaping quotes&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# To do 'group by' operation using awk, need to use an associative array&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Because we are doing parallel operations, need to pass awk output to awk again to return final counts&lt;/span&gt;

awksub &lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'{a[$3]+=1;}END{for(i in a)print i&quot; &quot;a[i];}'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; awksub
&lt;span class=&quot;nb&quot;&gt;time &lt;/span&gt;parallel &lt;span class=&quot;nt&quot;&gt;--pipe&lt;/span&gt; awksub &amp;lt; transactions.csv | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'{a[$1]+=$2;}END{for(i in a)print i&quot; &quot;a[i];}'&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;sort

&lt;/span&gt;real	8m22.674s &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;9.05x faster than serial&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Using GNU Parallel starts to become complicated here, but you do get a 9x speed-up by calculating rows by date in chunks, then “reducing” again by calculating total rows by date (a trick I picked up at this &lt;a href=&quot;http://www.rankfocus.com/use-cpu-cores-linux-commands/&quot;&gt;blog post&lt;/a&gt;.&lt;/p&gt;

&lt;h5 id=&quot;q3-total-items-per-store-per-month&quot;&gt;Q3. Total items Per store, Per month&lt;/h5&gt;

&lt;p&gt;For this example, it could be that my command-line fu is weak, but the serial method actually turns out to be the fastest. Of course, at a 14 minute run time, the real-time benefits to parallelization aren’t that great.&lt;/p&gt;

&lt;p&gt;It may be possible that one of you out there knows how to do this correctly, but an interesting thing to note is that the serial version already uses 40-50% of the available CPU available. So parallelization might yield a 2x speedup, but seven minutes extra per run isn’t worth spending hours trying to the optimal settings.&lt;/p&gt;

&lt;h2 id=&quot;but-ive-got-multiple-files&quot;&gt;But, I’ve got MULTIPLE files…&lt;/h2&gt;

&lt;p&gt;The three examples above showed that it’s possible to process datasets larger than RAM in a realistic amount of time using GNU Parallel. However, the examples also showed that working with Unix utilities can become complicated rather quickly. Shell scripts can help move beyond the “one-liner” syndrome, when the pipeline gets so long you lose track of the logic, but eventually problems are more easily solved using other tools.&lt;/p&gt;

&lt;p&gt;The data that I generated at the beginning of this post represented two concepts: transactions and customers. Once you get to the point where you want to do joins, summarize by multiple columns, estimate models, etc., loading data into a database or an analytics environment like R or Python makes sense. But hopefully this post has shown that a laptop is capable of analyzing WAY more data than most people believe, using many tools written decades ago.&lt;/p&gt;
</description>
        
        <pubDate>Thu, 28 Jan 2016 09:53:42 +0000</pubDate>
        <link>
        http://randyzwitch.com/gnu-parallel-medium-data/</link>
        <guid isPermaLink="true">http://randyzwitch.com/gnu-parallel-medium-data/</guid>
        <content type="html" xml:base="/gnu-parallel-medium-data/">&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2016/01/million-files-size.png&quot; alt=&quot;GNU Parallel Cat Unix&quot; /&gt;&lt;/p&gt;

&lt;p&gt;More often that I would like, I receive datasets where the data has only been partially cleaned, such as the picture on the right: hundreds, thousands…even millions of tiny files. Usually when this happens, the data all have the same format (such as having being generated by sensors or other memory-constrained devices).&lt;/p&gt;

&lt;p&gt;The problem with data like this is that 1) it’s inconvenient to think about a dataset as a million individual pieces 2) the data in aggregate are too large to hold in RAM but 3) the data are small enough where using Hadoop or even a relational database seems like overkill.&lt;/p&gt;

&lt;p&gt;Surprisingly, with judicious use of &lt;a href=&quot;http://www.gnu.org/software/parallel/&quot;&gt;GNU Parallel&lt;/a&gt;, stream processing and a relatively modern computer, you can efficiently process annoying, “medium-sized” data as described above.&lt;/p&gt;

&lt;h2 id=&quot;data-generation&quot;&gt;Data Generation&lt;/h2&gt;

&lt;p&gt;For this blog post, I used a combination of R and Python to generate the data: the “Groceries” dataset from the &lt;a href=&quot;https://cran.r-project.org/web/packages/arules/vignettes/arules.pdf&quot;&gt;arules&lt;/a&gt; package for sampling transactions (with replacement), and the Python &lt;a href=&quot;https://github.com/joke2k/faker&quot;&gt;Faker (fake-factory)&lt;/a&gt; package to generate fake customer profiles and for creating the 1MM+ text files:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c1&quot;&gt;#R Code
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arules&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Groceries&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Groceries&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;groceries.txt&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sep&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;,&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;#Python Code
&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csv&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;faker&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Faker&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;fake&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Faker&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Create customer file of 1,234,567 customers with fake data
# Use dataframe index as a way to generate unique customer id
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customers&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fake&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;simple_profile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1234567&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;customer_df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;customer_df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;cust_id&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;index&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;#Read in transactions file from arules package
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;grocerydata.txt&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;readlines&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;#Remove new line character
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;#Generate transactions by cust_id
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;#file format:
#cust_id::int
#store_id::int
#transaction_datetime::string/datetime
#items::string
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;#for each customer...
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1234567&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;#...create a file...
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'/transactions/custfile_%s'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'w'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csvfile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;trans&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;csv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;writer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;csvfile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delimiter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;' '&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;quotechar&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&quot;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;quoting&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;csv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;QUOTE_MINIMAL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;#...that contains all of the transactions they've ever made
&lt;/span&gt;        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;365&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)):&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;trans&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;writerow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fake&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;zipcode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fake&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date_time_this_decade&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;before_now&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;after_now&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]])&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h2 id=&quot;problem-1-concatenating-cat---outtxt-&quot;&gt;Problem 1: Concatenating (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat * &amp;gt;&amp;gt; out.txt&lt;/code&gt; ?!)&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://man7.org/linux/man-pages/man1/cat.1.html&quot;&gt;cat&lt;/a&gt; utility in Unix-y systems is familiar to most anyone who has ever opened up a Terminal window. Take some or all of the files in a folder, concatenate them together….one big file. But something funny happens once you get enough files…&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cat&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; out.txt

&lt;span class=&quot;nt&quot;&gt;-bash&lt;/span&gt;: /bin/cat: Argument list too long&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;That’s a fun thought…too many files for the computer to keep track of. As it turns out, many Unix tools will only accept about 10,000 arguments; the use of the asterisk in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat&lt;/code&gt; command gets expanded before running, so the above statement passes 1,234,567 arguments to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat&lt;/code&gt; and you get an error message.&lt;/p&gt;

&lt;p&gt;One (naive) solution would be to loop over every file (a completely serial operation):&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;span class=&quot;k&quot;&gt;for &lt;/span&gt;f &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cat&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$f&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; ../transactions_cat/transactions.csv&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;done&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Roughly &lt;strong&gt;10,093 seconds&lt;/strong&gt; later, you’ll have your concatenated file. Three hours is quite a coffee break…&lt;/p&gt;

&lt;h2 id=&quot;solution-1-gnu-parallel--concatenation&quot;&gt;Solution 1: GNU Parallel &amp;amp; Concatenation&lt;/h2&gt;

&lt;p&gt;Above, I mentioned that looping over each file gets you past the error condition of too many arguments, but it is a serial operation. If you look at your computer usage during that operation, you’ll likely see that only a fraction of a core of your computer’s CPU is being utilized. We can greatly improve that through the use of GNU Parallel:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;span class=&quot;nb&quot;&gt;ls&lt;/span&gt; | parallel &lt;span class=&quot;nt&quot;&gt;-m&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-j&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$f&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;cat {} &amp;gt;&amp;gt; ../transactions_cat/transactions.csv&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$f&lt;/code&gt; argument in the code is to highlight that you can choose the level of parallelism; however, you will not get infinitely linear scaling, as shown below (&lt;a href=&quot;https://gist.github.com/randyzwitch/ee0f738b5895e059fa2a&quot;&gt;graph code, Julia&lt;/a&gt;):&lt;/p&gt;

&lt;div id=&quot;cat&quot;&gt;
&lt;/div&gt;

&lt;p&gt;Given that the graph represents a single run at each level of parallelism, it’s a bit difficult to say &lt;em&gt;exactly&lt;/em&gt; where the parallelism gets maxed out, but at roughly 10 concurrent jobs, there’s no additional benefit. It’s also interesting to point out what the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-m&lt;/code&gt; argument represents; by specifying &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;m&lt;/code&gt;, you allow multiple arguments (i.e. multiple text files) to be passed as inputs into parallel. This &lt;em&gt;alone&lt;/em&gt; leads to an 8x speedup over the naive loop solution.&lt;/p&gt;

&lt;h2 id=&quot;problem-2-data--ram&quot;&gt;Problem 2: Data &amp;gt; RAM&lt;/h2&gt;

&lt;p&gt;Now that we have a single file, we’ve removed the “one million files” cognitive dissonance, but now we have a second problem: at 19.93GB, the amount of data exceeds the RAM in my laptop (2014 MBP, 16GB of RAM). So in order to do analysis, either a bigger machine is needed or processing has to be done in a streaming or “chunked” manner (such as using the &lt;a href=&quot;http://pandas.pydata.org/pandas-docs/stable/io.html#iterating-through-files-chunk-by-chunk&quot;&gt;“chunksize” keyword in pandas&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;But continuing on with our use of GNU Parallel, suppose we wanted to answer the following types of questions about our transactions data:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;How many unique products were sold?&lt;/li&gt;
  &lt;li&gt;How many transactions were there per day?&lt;/li&gt;
  &lt;li&gt;How many total items were sold per store, per month?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If it’s not clear from the list above, in all three questions there is an “embarrassingly parallel” portion of the computation. Let’s take a look at how to answer all three of these questions in a time- and RAM-efficient manner:&lt;/p&gt;

&lt;h5 id=&quot;q1-unique-products&quot;&gt;Q1: Unique Products&lt;/h5&gt;

&lt;p&gt;Given the format of the data file (transactions in a single column array), this question is the hardest to parallelize, but using a neat trick with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[tr](http://www.linfo.org/tr.html)&lt;/code&gt; (transliterate) utility, we can map our data to one product per row as we stream over the file:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;# Serial method (i.e. no parallelism)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# This is a simple implementation of map &amp;amp; reduce; tr statements represent one map, sort -u statements one reducer&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# cut -d ' ' -f 5- transactions.csv | \     - Using cut, take everything from the 5th column and over from the transactions.csv file&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# tr -d \&quot; | \                              - Using tr, trim off double-quotes. This leaves us with a comma-delimited string of products representing a transaction&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# sort -u | \                               - Using sort, put similar items together, but only output the unique values&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# wc -l                                     - Count number of unique lines, which after de-duping, represents number of unique products&lt;/span&gt;

&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;time cut&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;' '&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; 5- transactions.csv | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;','&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'\n'&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-u&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-l&lt;/span&gt;
331

real	292m7.116s

&lt;span class=&quot;c&quot;&gt;# Parallelized version, default chunk size of 1MB. This will use 100% of all CPUs (real and virtual)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Also map &amp;amp; reduce; tr statements a single map, sort -u statements multiple reducers (8 by default)&lt;/span&gt;

&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;time cut&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;' '&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; 5- transactions.csv | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;','&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'\n'&lt;/span&gt; | parallel &lt;span class=&quot;nt&quot;&gt;--pipe&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--block&lt;/span&gt; 1M &lt;span class=&quot;nb&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-u&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-u&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-l&lt;/span&gt;
331

&lt;span class=&quot;c&quot;&gt;# block size performance - Making block size smaller might improve performance&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Number of jobs can also be manipulated (not evaluated)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# --500K:               73m57.232s&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# --Default 1M:         75m55.268s (3.84x faster than serial)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# --2M:                 79m30.950s&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# --3M:                 80m43.311s&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The trick here is that we swap the comma-delimited transactions with the newline character; the effect of this is taking a single transaction row and returning multiple rows, one for each product. Then we pass that down the line, eventually using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sort -u&lt;/code&gt; to de-dup the list and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wc -l&lt;/code&gt; to count the number of unique lines (i.e. products).&lt;/p&gt;

&lt;p&gt;In a serial fashion, it takes quite some time to calculate the number of unique products. Incorporating GNU Parallel, just using the defaults, gives nearly a 4x speedup!&lt;/p&gt;

&lt;h5 id=&quot;q2-transactions-by-day&quot;&gt;Q2. Transactions By Day&lt;/h5&gt;

&lt;p&gt;If the file format could be considered undesirable in question 1, for question 2 the format is perfect. Since each row represents a transaction, all we need to do is perform the equivalent of a SQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Group By&lt;/code&gt; on the date and sum the rows:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;# Data is at transaction level, so just need to do equivalent of 'group by' operation&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Using cut again, we choose field 3, which is the date part of the timestamp&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# sort | uniq -c is a common pattern for doing a 'group by' count operation&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Final tr step is to trim the leading quotation mark from date string&lt;/span&gt;

&lt;span class=&quot;nb&quot;&gt;time cut&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;' '&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; 3 transactions.csv | &lt;span class=&quot;nb&quot;&gt;sort&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;uniq&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;

real	76m51.223s

&lt;span class=&quot;c&quot;&gt;# Parallelized version&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Quoting can be annoying when using parallel, so writing a Bash function is often much easier than dealing with escaping quotes&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# To do 'group by' operation using awk, need to use an associative array&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Because we are doing parallel operations, need to pass awk output to awk again to return final counts&lt;/span&gt;

awksub &lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'{a[$3]+=1;}END{for(i in a)print i&quot; &quot;a[i];}'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; awksub
&lt;span class=&quot;nb&quot;&gt;time &lt;/span&gt;parallel &lt;span class=&quot;nt&quot;&gt;--pipe&lt;/span&gt; awksub &amp;lt; transactions.csv | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'{a[$1]+=$2;}END{for(i in a)print i&quot; &quot;a[i];}'&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;tr&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;sort

&lt;/span&gt;real	8m22.674s &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;9.05x faster than serial&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Using GNU Parallel starts to become complicated here, but you do get a 9x speed-up by calculating rows by date in chunks, then “reducing” again by calculating total rows by date (a trick I picked up at this &lt;a href=&quot;http://www.rankfocus.com/use-cpu-cores-linux-commands/&quot;&gt;blog post&lt;/a&gt;.&lt;/p&gt;

&lt;h5 id=&quot;q3-total-items-per-store-per-month&quot;&gt;Q3. Total items Per store, Per month&lt;/h5&gt;

&lt;p&gt;For this example, it could be that my command-line fu is weak, but the serial method actually turns out to be the fastest. Of course, at a 14 minute run time, the real-time benefits to parallelization aren’t that great.&lt;/p&gt;

&lt;p&gt;It may be possible that one of you out there knows how to do this correctly, but an interesting thing to note is that the serial version already uses 40-50% of the available CPU available. So parallelization might yield a 2x speedup, but seven minutes extra per run isn’t worth spending hours trying to the optimal settings.&lt;/p&gt;

&lt;h2 id=&quot;but-ive-got-multiple-files&quot;&gt;But, I’ve got MULTIPLE files…&lt;/h2&gt;

&lt;p&gt;The three examples above showed that it’s possible to process datasets larger than RAM in a realistic amount of time using GNU Parallel. However, the examples also showed that working with Unix utilities can become complicated rather quickly. Shell scripts can help move beyond the “one-liner” syndrome, when the pipeline gets so long you lose track of the logic, but eventually problems are more easily solved using other tools.&lt;/p&gt;

&lt;p&gt;The data that I generated at the beginning of this post represented two concepts: transactions and customers. Once you get to the point where you want to do joins, summarize by multiple columns, estimate models, etc., loading data into a database or an analytics environment like R or Python makes sense. But hopefully this post has shown that a laptop is capable of analyzing WAY more data than most people believe, using many tools written decades ago.&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>JuliaCon 2015: Everyday Analytics and Visualization (video)</title>
        
          <description>&lt;p&gt;At long last, here’s the video of my presentation from JuliaCon 2015, discussion common analytics tasks and visualization. This is really two talks, the first being an example of using the citibike NYC API to analyze ridership of their public bike program, and the second a discussion of the Vega.jl package.&lt;/p&gt;

&lt;p&gt;Speaking at JuliaCon 2015 at MIT CSAIL is the professional highlight of my year; hopefully even more of you will attend next year.&lt;/p&gt;

&lt;p&gt;Enjoy!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Edit: For those of you who would like to follow-along using the actual &lt;a href=&quot;https://github.com/randyzwitch/juliacon2015&quot;&gt;presentation code&lt;/a&gt;, it is available on GitHub.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;citibank-bike-data&quot;&gt;CitiBank Bike Data&lt;/h2&gt;
&lt;iframe src=&quot;https://www.youtube.com/embed/0F8tC3ofH4g?start=135&quot; width=&quot;640&quot; height=&quot;360&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;allowfullscreen&quot;&gt;&lt;/iframe&gt;

&lt;h2 id=&quot;vegajl-presentation&quot;&gt;Vega.jl Presentation&lt;/h2&gt;
&lt;iframe src=&quot;https://www.youtube.com/embed/0F8tC3ofH4g?start=3005&quot; width=&quot;640&quot; height=&quot;360&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;allowfullscreen&quot;&gt;&lt;/iframe&gt;
</description>
        
        <pubDate>Fri, 14 Aug 2015 10:29:41 +0000</pubDate>
        <link>
        http://randyzwitch.com/juliacon-2015-everyday-analytics-and-visualization-video/</link>
        <guid isPermaLink="true">http://randyzwitch.com/juliacon-2015-everyday-analytics-and-visualization-video/</guid>
        <content type="html" xml:base="/juliacon-2015-everyday-analytics-and-visualization-video/">&lt;p&gt;At long last, here’s the video of my presentation from JuliaCon 2015, discussion common analytics tasks and visualization. This is really two talks, the first being an example of using the citibike NYC API to analyze ridership of their public bike program, and the second a discussion of the Vega.jl package.&lt;/p&gt;

&lt;p&gt;Speaking at JuliaCon 2015 at MIT CSAIL is the professional highlight of my year; hopefully even more of you will attend next year.&lt;/p&gt;

&lt;p&gt;Enjoy!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Edit: For those of you who would like to follow-along using the actual &lt;a href=&quot;https://github.com/randyzwitch/juliacon2015&quot;&gt;presentation code&lt;/a&gt;, it is available on GitHub.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;citibank-bike-data&quot;&gt;CitiBank Bike Data&lt;/h2&gt;
&lt;iframe src=&quot;https://www.youtube.com/embed/0F8tC3ofH4g?start=135&quot; width=&quot;640&quot; height=&quot;360&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;allowfullscreen&quot;&gt;&lt;/iframe&gt;

&lt;h2 id=&quot;vegajl-presentation&quot;&gt;Vega.jl Presentation&lt;/h2&gt;
&lt;iframe src=&quot;https://www.youtube.com/embed/0F8tC3ofH4g?start=3005&quot; width=&quot;640&quot; height=&quot;360&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;allowfullscreen&quot;&gt;&lt;/iframe&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>Vega.jl, Rebooted</title>
        
          <description>&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2015/05/pie-300x251.png&quot; alt=&quot;pie&quot; /&gt;
&lt;img src=&quot;/wp-content/uploads/2015/05/donut-e1432224478621.png&quot; alt=&quot;donut&quot; /&gt;&lt;/p&gt;

&lt;p style=&quot;text-align: center;&quot;&gt;
  Mmmmm, baked goods!
&lt;/p&gt;

&lt;h3 id=&quot;rebooting-vegajl&quot;&gt;Rebooting Vega.jl&lt;/h3&gt;

&lt;p&gt;Recently, I’ve found myself without a project to hack on, and I’ve always been interested in learning more about browser-based visualization. So I decided to revive the work that &lt;a href=&quot;https://github.com/johnmyleswhite&quot; target=&quot;_blank&quot;&gt;John Myles White&lt;/a&gt; had done in building &lt;a href=&quot;https://github.com/johnmyleswhite/Vega.jl&quot;&gt;Vega.jl&lt;/a&gt; nearly two years ago. And since I’ll be giving an analytics &amp;amp; visualization workshop at &lt;a href=&quot;http://juliacon.org/&quot; target=&quot;_blank&quot;&gt;JuliaCon 2015&lt;/a&gt;, I figure I better study the topic in a bit more depth.&lt;/p&gt;

&lt;h3 id=&quot;back-in-working-order&quot;&gt;Back In Working Order!&lt;/h3&gt;

&lt;p&gt;The first thing I tackled here was to upgrade the syntax to target v0.4 of Julia. This is just my developer preference, to avoid using &lt;a href=&quot;https://github.com/JuliaLang/Compat.jl&quot; target=&quot;_blank&quot;&gt;Compat.jl&lt;/a&gt; when there are so many more visualizations I’d like to support. So if you’re using v0.4, you shouldn’t see any deprecation errors; if you’re using v0.3, well, eventually you’ll use v0.4!&lt;/p&gt;

&lt;p&gt;Additionally, I modified the package to recognize the traction that Jupyter Notebook has gained in the community. Whereas the original version of Vega.jl only displayed output in a tab in a browser, I’ve overloaded the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;writemime&lt;/code&gt; method to display &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;:VegaVisualization&lt;/code&gt; inline for any environment that can display HTML. If you use Vega.jl from the REPL, you’ll still get the same default browser-opening behavior as existed before.&lt;/p&gt;

&lt;h3 id=&quot;the-first-visualizationyou-addedwas-a-pie-chart&quot;&gt;The First Visualization You Added Was A Pie Chart…&lt;/h3&gt;

&lt;h3 id=&quot;and-followed-with-a-donut-chart&quot;&gt;…And Followed With a Donut Chart?&lt;/h3&gt;

&lt;p&gt;Yup. I’m a troll like that. Besides, being loudly against pie charts is blowhardy (even if studies have shown that people are too stupid to evaluate them).&lt;/p&gt;

&lt;p&gt;Adding these two charts (besides trolling) was a proof-of-concept that I understood the codebase sufficiently in order to extend the package. Now that the syntax is working for Julia v0.4, I understand how the package works (important!), and have improved the workflow by supporting Jupyter Notebook, I plan to create all of the visualizations featured in the &lt;a href=&quot;http://trifacta.github.io/vega/editor/&quot; target=&quot;_blank&quot;&gt;Trifacta Vega Editor&lt;/a&gt; and other standard visualizations such as boxplots. If the community has requests for the order of implementation, I’ll try and accommodate them. Just add a feature request on &lt;a href=&quot;https://github.com/johnmyleswhite/Vega.jl/issues&quot; target=&quot;_blank&quot;&gt;Vega.jl GitHub issues&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;why-not-gadfly-youre-not-starting-a-language-war-are-you&quot;&gt;Why Not Gadfly? You’re Not Starting A Language War, Are You?&lt;/h3&gt;

&lt;p&gt;No, I’m not that big of a troll. Besides, I don’t think we’ve squeezed all the juice (blood?!) out of the &lt;a href=&quot;http://blog.datacamp.com/r-or-python-for-data-analysis/&quot; target=&quot;_blank&quot;&gt;R vs. Python infographic&lt;/a&gt; yet, we don’t need another pointless debate.&lt;/p&gt;

&lt;p&gt;My sole reason for not improving &lt;a href=&quot;http://dcjones.github.io/Gadfly.jl/&quot; target=&quot;_blank&quot;&gt;Gadfly&lt;/a&gt; is just that I plain don’t understand how the codebase works! There are many amazing computer scientists &amp;amp; developers in the Julia community, and I’m not really one of them. I do, however, understand how to generate JSON strings and in that sense, Vega is the perfect platform for me to contribute.&lt;/p&gt;

&lt;h3 id=&quot;collaborators-wanted&quot;&gt;Collaborators Wanted!&lt;/h3&gt;

&lt;p&gt;If you’re interested in visualization, as well as learning Julia and/or contributing to a package, Vega.jl might be a good place to start. I’m always up for collaborating with people, and creating new visualizations isn’t that difficult (especially with the Trifacta examples). So hopefully some of you will be interested in enough to join me to adding one more great visualization library to the Julia community.&lt;/p&gt;
</description>
        
        <pubDate>Thu, 21 May 2015 12:56:07 +0000</pubDate>
        <link>
        http://randyzwitch.com/vega-jl-julia/</link>
        <guid isPermaLink="true">http://randyzwitch.com/vega-jl-julia/</guid>
        <content type="html" xml:base="/vega-jl-julia/">&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2015/05/pie-300x251.png&quot; alt=&quot;pie&quot; /&gt;
&lt;img src=&quot;/wp-content/uploads/2015/05/donut-e1432224478621.png&quot; alt=&quot;donut&quot; /&gt;&lt;/p&gt;

&lt;p style=&quot;text-align: center;&quot;&gt;
  Mmmmm, baked goods!
&lt;/p&gt;

&lt;h3 id=&quot;rebooting-vegajl&quot;&gt;Rebooting Vega.jl&lt;/h3&gt;

&lt;p&gt;Recently, I’ve found myself without a project to hack on, and I’ve always been interested in learning more about browser-based visualization. So I decided to revive the work that &lt;a href=&quot;https://github.com/johnmyleswhite&quot; target=&quot;_blank&quot;&gt;John Myles White&lt;/a&gt; had done in building &lt;a href=&quot;https://github.com/johnmyleswhite/Vega.jl&quot;&gt;Vega.jl&lt;/a&gt; nearly two years ago. And since I’ll be giving an analytics &amp;amp; visualization workshop at &lt;a href=&quot;http://juliacon.org/&quot; target=&quot;_blank&quot;&gt;JuliaCon 2015&lt;/a&gt;, I figure I better study the topic in a bit more depth.&lt;/p&gt;

&lt;h3 id=&quot;back-in-working-order&quot;&gt;Back In Working Order!&lt;/h3&gt;

&lt;p&gt;The first thing I tackled here was to upgrade the syntax to target v0.4 of Julia. This is just my developer preference, to avoid using &lt;a href=&quot;https://github.com/JuliaLang/Compat.jl&quot; target=&quot;_blank&quot;&gt;Compat.jl&lt;/a&gt; when there are so many more visualizations I’d like to support. So if you’re using v0.4, you shouldn’t see any deprecation errors; if you’re using v0.3, well, eventually you’ll use v0.4!&lt;/p&gt;

&lt;p&gt;Additionally, I modified the package to recognize the traction that Jupyter Notebook has gained in the community. Whereas the original version of Vega.jl only displayed output in a tab in a browser, I’ve overloaded the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;writemime&lt;/code&gt; method to display &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;:VegaVisualization&lt;/code&gt; inline for any environment that can display HTML. If you use Vega.jl from the REPL, you’ll still get the same default browser-opening behavior as existed before.&lt;/p&gt;

&lt;h3 id=&quot;the-first-visualizationyou-addedwas-a-pie-chart&quot;&gt;The First Visualization You Added Was A Pie Chart…&lt;/h3&gt;

&lt;h3 id=&quot;and-followed-with-a-donut-chart&quot;&gt;…And Followed With a Donut Chart?&lt;/h3&gt;

&lt;p&gt;Yup. I’m a troll like that. Besides, being loudly against pie charts is blowhardy (even if studies have shown that people are too stupid to evaluate them).&lt;/p&gt;

&lt;p&gt;Adding these two charts (besides trolling) was a proof-of-concept that I understood the codebase sufficiently in order to extend the package. Now that the syntax is working for Julia v0.4, I understand how the package works (important!), and have improved the workflow by supporting Jupyter Notebook, I plan to create all of the visualizations featured in the &lt;a href=&quot;http://trifacta.github.io/vega/editor/&quot; target=&quot;_blank&quot;&gt;Trifacta Vega Editor&lt;/a&gt; and other standard visualizations such as boxplots. If the community has requests for the order of implementation, I’ll try and accommodate them. Just add a feature request on &lt;a href=&quot;https://github.com/johnmyleswhite/Vega.jl/issues&quot; target=&quot;_blank&quot;&gt;Vega.jl GitHub issues&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;why-not-gadfly-youre-not-starting-a-language-war-are-you&quot;&gt;Why Not Gadfly? You’re Not Starting A Language War, Are You?&lt;/h3&gt;

&lt;p&gt;No, I’m not that big of a troll. Besides, I don’t think we’ve squeezed all the juice (blood?!) out of the &lt;a href=&quot;http://blog.datacamp.com/r-or-python-for-data-analysis/&quot; target=&quot;_blank&quot;&gt;R vs. Python infographic&lt;/a&gt; yet, we don’t need another pointless debate.&lt;/p&gt;

&lt;p&gt;My sole reason for not improving &lt;a href=&quot;http://dcjones.github.io/Gadfly.jl/&quot; target=&quot;_blank&quot;&gt;Gadfly&lt;/a&gt; is just that I plain don’t understand how the codebase works! There are many amazing computer scientists &amp;amp; developers in the Julia community, and I’m not really one of them. I do, however, understand how to generate JSON strings and in that sense, Vega is the perfect platform for me to contribute.&lt;/p&gt;

&lt;h3 id=&quot;collaborators-wanted&quot;&gt;Collaborators Wanted!&lt;/h3&gt;

&lt;p&gt;If you’re interested in visualization, as well as learning Julia and/or contributing to a package, Vega.jl might be a good place to start. I’m always up for collaborating with people, and creating new visualizations isn’t that difficult (especially with the Trifacta examples). So hopefully some of you will be interested in enough to join me to adding one more great visualization library to the Julia community.&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>Introducing Twitter.jl</title>
        
          <description>&lt;p&gt;This is possibly the latest “announcement” of a package ever, given that &lt;a href=&quot;https://github.com/randyzwitch/Twitter.jl&quot;&gt;Twitter.jl&lt;/a&gt; has existed on &lt;a href=&quot;https://github.com/JuliaLang/METADATA.jl&quot; title=&quot;Julia METADATA&quot;&gt;METADATA&lt;/a&gt; for nearly a year now, but that’s how things go sometimes. Here’s how to get started with Twitter.jl.&lt;/p&gt;

&lt;h2 id=&quot;hello-world&quot;&gt;Hello, World!&lt;/h2&gt;

&lt;p&gt;If ‘Hello, World!’ is the canonical example of getting started with a programming language, the Twitter API is becoming the first place to start for people wanting to learn about APIs. Authenticating with the Twitter API using Julia is similar to using the R or Python packages, except that rather than doing the OAuth “dance”, Twitter.jl takes all four authentication values in one function:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Twitter&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;apikey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;q8Qw7WJTVP...&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;apisecret&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;FIichPpGJxiOssN...&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;accesstoken&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;98689850-v0zZNr...&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;accesstokensecret&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;w7bDg9K0c493T...&quot;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;twitterauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;apikey&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;apisecret&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;accesstoken&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;accesstokensecret&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;All four of these values can be found after registering at the &lt;a href=&quot;https://dev.twitter.com/&quot;&gt;Twitter Developer page&lt;/a&gt; and creating an application. Having all four values in your script is less secure than just providing the api key and api secret, but in the future, I’ll likely implement the full OAuth “handshake”. One thing to keep in mind with this function as it currently works is that no validation of your credentials is performed; the only thing this function does is define a global variable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;twittercred&lt;/code&gt; for later use by the various functions that create the OAuth headers. To shout “Hello, World!” to all of your Twitter followers, you can use the following code:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;post_status_update&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Hello, World!&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h2 id=&quot;general-packagefunction-structure&quot;&gt;General Package/Function Structure&lt;/h2&gt;

&lt;p&gt;From the example above, you can see that the function naming follows the &lt;a href=&quot;https://dev.twitter.com/rest/public&quot;&gt;Twitter REST API&lt;/a&gt; naming convention, with the HTTP verb first and the endpoint as the remainder of the function name. As such, it’s a good idea at this early package state to have the Twitter documentation open while using this package, so that you can quickly find the methods you are looking for.&lt;/p&gt;

&lt;p&gt;For each function/API endpoint, I’ve gone through and determined which parameters are required; these are required arguments in the Julia functions. For all other options, each function takes a second optional &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dict{String, String}&lt;/code&gt; for any option shown in the Twitter documentation. While this Dict structure allows for ultimate flexibility (and quick definition of functions!), I do realize that it’s less than optimal that you don’t know what optional arguments each Twitter endpoint allows.&lt;/p&gt;

&lt;p&gt;As an example, suppose you wanted to search for tweets containing the hashtag &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#julialang&lt;/code&gt;. The minimum function call is as follows:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia_tweets&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_search_tweets&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;#julialang&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;By default, the API will return the 15 most recent tweets containing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#julialang&lt;/code&gt; hashtag. To return the most recent 100 tweets (the maximum per API ‘page’), you can pass the “count” parameter via the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Options&lt;/code&gt; Dict:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;span class=&quot;n&quot;&gt;julia_tweets_100&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_search_tweets&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;#julialang&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;count&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;100&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;})&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h2 id=&quot;composite-types-and-dataframes-definitions&quot;&gt;Composite Types and DataFrames definitions&lt;/h2&gt;

&lt;p&gt;The Twitter API is structured into 4 return data types (&lt;a href=&quot;https://dev.twitter.com/overview/api/places&quot;&gt;Places&lt;/a&gt;, &lt;a href=&quot;https://dev.twitter.com/overview/api/users&quot;&gt;Users&lt;/a&gt;, &lt;a href=&quot;https://dev.twitter.com/overview/api/tweets&quot;&gt;Tweets&lt;/a&gt;, and &lt;a href=&quot;https://dev.twitter.com/overview/api/entities&quot;&gt;Entities&lt;/a&gt;), and I’ve mimicked these types using Julia &lt;a href=&quot;http://julia.readthedocs.org/en/latest/manual/types/#composite-types&quot;&gt;Composite Types&lt;/a&gt;. As such, most functions in Twitter.jl return an array of specific type, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Array{TWEETS,1}&lt;/code&gt; from the prior &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#julialang&lt;/code&gt; search example. The benefit to defining custom types for the returned Twitter data is that rudimentary DataFrame methods have also been defined:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;julia_tweets_100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I describe these DataFrames as ‘rudimentary’ as they parse the top level of JSON into columns, which results in some DataFrame columns having complex data types such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dict()&lt;/code&gt; (and within the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dict()&lt;/code&gt;, nested Dicts!). As a running theme in this post, this is something I hope to get around to improving in the future.&lt;/p&gt;

&lt;h2 id=&quot;want-to-get-started-developing-julia-start-here&quot;&gt;Want to Get Started Developing Julia? Start Here!&lt;/h2&gt;

&lt;p&gt;One of the common questions I get asked is how to get started with Julia, both from a learning perspective and from a package development perspective. Hacking away on the core Julia codebase is great if you have the ability, but the code can certainly be intimidating (the people are quite friendly though). Creating a package isn’t necessarily hard, but you have to think about an idea you want to implement. The third alternative is…&lt;/p&gt;

&lt;p&gt;…improve the Twitter package! If you go to the &lt;a href=&quot;https://github.com/randyzwitch/Twitter.jl&quot;&gt;GitHub page for Twitter.jl&lt;/a&gt;, you’ll see a long list of TODO items that need to be worked on. The hardest part (building the OAuth headers) has already been taken care of. What’s left is &lt;a href=&quot;http://randyzwitch.com/julia-metaprogramming-refactoring/&quot;&gt;re-factoring the code for simplification&lt;/a&gt;, factoring out the &lt;a href=&quot;https://github.com/randyzwitch/OAuth.jl&quot;&gt;OAuth code in general into a new Julia library&lt;/a&gt; (also partially started), then building the Streaming API functions, cleaning up the DataFrame methods to remove the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dict&lt;/code&gt; column types, paging through API results…and so-on.&lt;/p&gt;

&lt;p&gt;So if any of you are on the sidelines wanting to get some practice on developing packages, without needing to worry about learning Astrophysics first, I’d love to collaborate. And if any Julia programming masters want to collaborate, well that’s great too. All help and pull requests are welcomed.&lt;/p&gt;

&lt;p&gt;In the meantime, hopefully some of you will find this package useful for natural language processing, social networking analysis or even creating bots 😉&lt;/p&gt;
</description>
        
        <pubDate>Mon, 08 Dec 2014 17:12:58 +0000</pubDate>
        <link>
        http://randyzwitch.com/twitter-api-julia/</link>
        <guid isPermaLink="true">http://randyzwitch.com/twitter-api-julia/</guid>
        <content type="html" xml:base="/twitter-api-julia/">&lt;p&gt;This is possibly the latest “announcement” of a package ever, given that &lt;a href=&quot;https://github.com/randyzwitch/Twitter.jl&quot;&gt;Twitter.jl&lt;/a&gt; has existed on &lt;a href=&quot;https://github.com/JuliaLang/METADATA.jl&quot; title=&quot;Julia METADATA&quot;&gt;METADATA&lt;/a&gt; for nearly a year now, but that’s how things go sometimes. Here’s how to get started with Twitter.jl.&lt;/p&gt;

&lt;h2 id=&quot;hello-world&quot;&gt;Hello, World!&lt;/h2&gt;

&lt;p&gt;If ‘Hello, World!’ is the canonical example of getting started with a programming language, the Twitter API is becoming the first place to start for people wanting to learn about APIs. Authenticating with the Twitter API using Julia is similar to using the R or Python packages, except that rather than doing the OAuth “dance”, Twitter.jl takes all four authentication values in one function:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Twitter&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;apikey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;q8Qw7WJTVP...&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;apisecret&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;FIichPpGJxiOssN...&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;accesstoken&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;98689850-v0zZNr...&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;accesstokensecret&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;w7bDg9K0c493T...&quot;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;twitterauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;apikey&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;apisecret&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;accesstoken&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;accesstokensecret&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;All four of these values can be found after registering at the &lt;a href=&quot;https://dev.twitter.com/&quot;&gt;Twitter Developer page&lt;/a&gt; and creating an application. Having all four values in your script is less secure than just providing the api key and api secret, but in the future, I’ll likely implement the full OAuth “handshake”. One thing to keep in mind with this function as it currently works is that no validation of your credentials is performed; the only thing this function does is define a global variable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;twittercred&lt;/code&gt; for later use by the various functions that create the OAuth headers. To shout “Hello, World!” to all of your Twitter followers, you can use the following code:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;post_status_update&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Hello, World!&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h2 id=&quot;general-packagefunction-structure&quot;&gt;General Package/Function Structure&lt;/h2&gt;

&lt;p&gt;From the example above, you can see that the function naming follows the &lt;a href=&quot;https://dev.twitter.com/rest/public&quot;&gt;Twitter REST API&lt;/a&gt; naming convention, with the HTTP verb first and the endpoint as the remainder of the function name. As such, it’s a good idea at this early package state to have the Twitter documentation open while using this package, so that you can quickly find the methods you are looking for.&lt;/p&gt;

&lt;p&gt;For each function/API endpoint, I’ve gone through and determined which parameters are required; these are required arguments in the Julia functions. For all other options, each function takes a second optional &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dict{String, String}&lt;/code&gt; for any option shown in the Twitter documentation. While this Dict structure allows for ultimate flexibility (and quick definition of functions!), I do realize that it’s less than optimal that you don’t know what optional arguments each Twitter endpoint allows.&lt;/p&gt;

&lt;p&gt;As an example, suppose you wanted to search for tweets containing the hashtag &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#julialang&lt;/code&gt;. The minimum function call is as follows:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia_tweets&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_search_tweets&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;#julialang&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;By default, the API will return the 15 most recent tweets containing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#julialang&lt;/code&gt; hashtag. To return the most recent 100 tweets (the maximum per API ‘page’), you can pass the “count” parameter via the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Options&lt;/code&gt; Dict:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;span class=&quot;n&quot;&gt;julia_tweets_100&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_search_tweets&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;#julialang&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;count&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;100&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;})&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h2 id=&quot;composite-types-and-dataframes-definitions&quot;&gt;Composite Types and DataFrames definitions&lt;/h2&gt;

&lt;p&gt;The Twitter API is structured into 4 return data types (&lt;a href=&quot;https://dev.twitter.com/overview/api/places&quot;&gt;Places&lt;/a&gt;, &lt;a href=&quot;https://dev.twitter.com/overview/api/users&quot;&gt;Users&lt;/a&gt;, &lt;a href=&quot;https://dev.twitter.com/overview/api/tweets&quot;&gt;Tweets&lt;/a&gt;, and &lt;a href=&quot;https://dev.twitter.com/overview/api/entities&quot;&gt;Entities&lt;/a&gt;), and I’ve mimicked these types using Julia &lt;a href=&quot;http://julia.readthedocs.org/en/latest/manual/types/#composite-types&quot;&gt;Composite Types&lt;/a&gt;. As such, most functions in Twitter.jl return an array of specific type, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Array{TWEETS,1}&lt;/code&gt; from the prior &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#julialang&lt;/code&gt; search example. The benefit to defining custom types for the returned Twitter data is that rudimentary DataFrame methods have also been defined:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;julia_tweets_100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I describe these DataFrames as ‘rudimentary’ as they parse the top level of JSON into columns, which results in some DataFrame columns having complex data types such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dict()&lt;/code&gt; (and within the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dict()&lt;/code&gt;, nested Dicts!). As a running theme in this post, this is something I hope to get around to improving in the future.&lt;/p&gt;

&lt;h2 id=&quot;want-to-get-started-developing-julia-start-here&quot;&gt;Want to Get Started Developing Julia? Start Here!&lt;/h2&gt;

&lt;p&gt;One of the common questions I get asked is how to get started with Julia, both from a learning perspective and from a package development perspective. Hacking away on the core Julia codebase is great if you have the ability, but the code can certainly be intimidating (the people are quite friendly though). Creating a package isn’t necessarily hard, but you have to think about an idea you want to implement. The third alternative is…&lt;/p&gt;

&lt;p&gt;…improve the Twitter package! If you go to the &lt;a href=&quot;https://github.com/randyzwitch/Twitter.jl&quot;&gt;GitHub page for Twitter.jl&lt;/a&gt;, you’ll see a long list of TODO items that need to be worked on. The hardest part (building the OAuth headers) has already been taken care of. What’s left is &lt;a href=&quot;http://randyzwitch.com/julia-metaprogramming-refactoring/&quot;&gt;re-factoring the code for simplification&lt;/a&gt;, factoring out the &lt;a href=&quot;https://github.com/randyzwitch/OAuth.jl&quot;&gt;OAuth code in general into a new Julia library&lt;/a&gt; (also partially started), then building the Streaming API functions, cleaning up the DataFrame methods to remove the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dict&lt;/code&gt; column types, paging through API results…and so-on.&lt;/p&gt;

&lt;p&gt;So if any of you are on the sidelines wanting to get some practice on developing packages, without needing to worry about learning Astrophysics first, I’d love to collaborate. And if any Julia programming masters want to collaborate, well that’s great too. All help and pull requests are welcomed.&lt;/p&gt;

&lt;p&gt;In the meantime, hopefully some of you will find this package useful for natural language processing, social networking analysis or even creating bots 😉&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>Code Refactoring Using Metaprogramming</title>
        
          <description>&lt;p&gt;It’s been nearly a year since I wrote &lt;a href=&quot;https://github.com/randyzwitch/Twitter.jl/&quot;&gt;Twitter.jl&lt;/a&gt;, back when I seemingly had MUCH more free time. In these past 10 months, I’ve used Julia quite a bit to develop other packages, and I try to use it at work when I know I’m not going to be collaborating with others (since my colleagues don’t know Julia, not because it’s bad for collaboration!).&lt;/p&gt;

&lt;p&gt;One of the things that’s obvious from my earlier Julia code is that I didn’t understand how powerful metaprogramming can be, so here’s a simple example where I can replace 50 lines of Julia code with 10.&lt;/p&gt;

&lt;h2 id=&quot;ctrl-a-ctrl-c-ctrl-p-repeat&quot;&gt;CTRL-A, CTRL-C, CTRL-P. Repeat.&lt;/h2&gt;

&lt;p&gt;Admittedly, when I started on the Twitter package, I fully meant to go back and clean up the codebase, but moved onto something more fun instead. The Twitter package started out as a means of learning how to use the &lt;a href=&quot;https://github.com/JuliaWeb/Requests.jl&quot;&gt;Requests.jl&lt;/a&gt; library to make API calls, figure out the OAuth syntax I needed (which itself should be factored out of Twitter.jl), then copied-and-pasted the same basic function structure over and over. While fast, what I was left with was this (currently, the help.jl file in the Twitter package):&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#############################################################&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Help section Functions for Twitter API&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#############################################################&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; get_help_configuration&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/help/configuration.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; get_help_languages&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/help/languages.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; get_help_privacy&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/help/privacy.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; get_help_tos&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/help/tos.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; get_application_rate_limit_status&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/application/rate_limit_status.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;It’s pretty clear that this is the same exact code pattern, right down to the spacing! The way to interpret this code is that for these five Twitter API methods, there are no required inputs. Optionally, there is the ‘options’ keyword that allows for specifying a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dict()&lt;/code&gt; of options. For these five functions, there are no options you can pass to the Twitter API, so even this keyword is redundant. These are simple functions so I don’t gain a lot by way of maintainability by using metaprogramming, but at the same time, one of the core tenets of programming is ‘Don’t Repeat Yourself’, so let’s clean this up.&lt;/p&gt;

&lt;h2 id=&quot;for-symbol-in-symbolslist&quot;&gt;For :symbol in symbolslist…&lt;/h2&gt;

&lt;p&gt;In order to clean this up, we need to take out the unique parts of the function, then pass them as arguments to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@eval&lt;/code&gt; macro as follows:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;funcname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_help_configuration&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_help_languages&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_help_privacy&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_help_tos&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_application_rate_limit_status&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;endpoint&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;help/configuration.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;help/languages.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;help/privacy.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;s&quot;&gt;&quot;help/tos.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;application/rate_limit_status.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;endp&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;funcname&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;endpoint&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;nd&quot;&gt;@eval&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;($&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

	        &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;endp&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

	        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

    	&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;What’s happening in this code is that I define two tuples: one of function names (as symbols, denoted by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;:&lt;/code&gt;) and one of the API endpoints. We can then iterate over the two tuples, substituting the function names and endpoints into the code. When the package is loaded, this code evaluates, defining the five functions for use in the Twitter package.&lt;/p&gt;

&lt;h2 id=&quot;wha&quot;&gt;Wha?&lt;/h2&gt;

&lt;p&gt;Yeah, so metaprogramming can be simple, but it can also be mind-bending. It’s one thing to not repeat yourself, it’s another to write something so complex that even YOU can’t remember how the code works. But somewhere in between lies a sweet spot where you can re-factor whole swaths of code and streamline your codebase. Metaprogramming is used throughout the Julia codebase, so if you’re interested in seeing more examples of metaprogramming, check out the Julia source code, the &lt;a href=&quot;https://github.com/JuliaWeb/Requests.jl/blob/master/src/Requests.jl&quot; title=&quot;Requests.jl code&quot;&gt;Requests.jl&lt;/a&gt; package (where I first saw this) or really anyone who actually knows what they are doing. I’m just a metaprogramming pretender at this point 🙂  &lt;/p&gt;

&lt;p&gt;To read additional discussion around this specific example, see the Julia-Users discussion at: &lt;a href=&quot;https://groups.google.com/forum/#!topic/julia-users/zvJmqB2N0GQ&quot;&gt;https://groups.google.com/forum/#!topic/julia-users/zvJmqB2N0GQ&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edit, 11/22/2014:&lt;/strong&gt; &lt;a href=&quot;http://www.reddit.com/r/Julia/comments/2mvtnr/code_refactoring_using_metaprogramming_in_julia/cma5g25&quot;&gt;DarthToaster on Reddit&lt;/a&gt; provided another fantastic way to approach refactoring, using macros:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;macro&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; endpoint&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;quote&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; $&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;esc&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;path&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;@endpoint&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_help_configuration&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;help/configuration.json&quot;&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@endpoint&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_help_languages&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;help/languages.json&quot;&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

</description>
        
        <pubDate>Tue, 18 Nov 2014 09:11:06 +0000</pubDate>
        <link>
        http://randyzwitch.com/julia-metaprogramming-refactoring/</link>
        <guid isPermaLink="true">http://randyzwitch.com/julia-metaprogramming-refactoring/</guid>
        <content type="html" xml:base="/julia-metaprogramming-refactoring/">&lt;p&gt;It’s been nearly a year since I wrote &lt;a href=&quot;https://github.com/randyzwitch/Twitter.jl/&quot;&gt;Twitter.jl&lt;/a&gt;, back when I seemingly had MUCH more free time. In these past 10 months, I’ve used Julia quite a bit to develop other packages, and I try to use it at work when I know I’m not going to be collaborating with others (since my colleagues don’t know Julia, not because it’s bad for collaboration!).&lt;/p&gt;

&lt;p&gt;One of the things that’s obvious from my earlier Julia code is that I didn’t understand how powerful metaprogramming can be, so here’s a simple example where I can replace 50 lines of Julia code with 10.&lt;/p&gt;

&lt;h2 id=&quot;ctrl-a-ctrl-c-ctrl-p-repeat&quot;&gt;CTRL-A, CTRL-C, CTRL-P. Repeat.&lt;/h2&gt;

&lt;p&gt;Admittedly, when I started on the Twitter package, I fully meant to go back and clean up the codebase, but moved onto something more fun instead. The Twitter package started out as a means of learning how to use the &lt;a href=&quot;https://github.com/JuliaWeb/Requests.jl&quot;&gt;Requests.jl&lt;/a&gt; library to make API calls, figure out the OAuth syntax I needed (which itself should be factored out of Twitter.jl), then copied-and-pasted the same basic function structure over and over. While fast, what I was left with was this (currently, the help.jl file in the Twitter package):&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#############################################################&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Help section Functions for Twitter API&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#############################################################&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; get_help_configuration&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/help/configuration.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; get_help_languages&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/help/languages.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; get_help_privacy&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/help/privacy.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; get_help_tos&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/help/tos.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; get_application_rate_limit_status&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/application/rate_limit_status.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;It’s pretty clear that this is the same exact code pattern, right down to the spacing! The way to interpret this code is that for these five Twitter API methods, there are no required inputs. Optionally, there is the ‘options’ keyword that allows for specifying a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dict()&lt;/code&gt; of options. For these five functions, there are no options you can pass to the Twitter API, so even this keyword is redundant. These are simple functions so I don’t gain a lot by way of maintainability by using metaprogramming, but at the same time, one of the core tenets of programming is ‘Don’t Repeat Yourself’, so let’s clean this up.&lt;/p&gt;

&lt;h2 id=&quot;for-symbol-in-symbolslist&quot;&gt;For :symbol in symbolslist…&lt;/h2&gt;

&lt;p&gt;In order to clean this up, we need to take out the unique parts of the function, then pass them as arguments to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@eval&lt;/code&gt; macro as follows:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;funcname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_help_configuration&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_help_languages&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_help_privacy&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_help_tos&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_application_rate_limit_status&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;endpoint&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;help/configuration.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;help/languages.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;help/privacy.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;s&quot;&gt;&quot;help/tos.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;application/rate_limit_status.json&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;endp&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;funcname&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;endpoint&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;nd&quot;&gt;@eval&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;($&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;

	        &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;endp&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

	        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;

    	&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;What’s happening in this code is that I define two tuples: one of function names (as symbols, denoted by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;:&lt;/code&gt;) and one of the API endpoints. We can then iterate over the two tuples, substituting the function names and endpoints into the code. When the package is loaded, this code evaluates, defining the five functions for use in the Twitter package.&lt;/p&gt;

&lt;h2 id=&quot;wha&quot;&gt;Wha?&lt;/h2&gt;

&lt;p&gt;Yeah, so metaprogramming can be simple, but it can also be mind-bending. It’s one thing to not repeat yourself, it’s another to write something so complex that even YOU can’t remember how the code works. But somewhere in between lies a sweet spot where you can re-factor whole swaths of code and streamline your codebase. Metaprogramming is used throughout the Julia codebase, so if you’re interested in seeing more examples of metaprogramming, check out the Julia source code, the &lt;a href=&quot;https://github.com/JuliaWeb/Requests.jl/blob/master/src/Requests.jl&quot; title=&quot;Requests.jl code&quot;&gt;Requests.jl&lt;/a&gt; package (where I first saw this) or really anyone who actually knows what they are doing. I’m just a metaprogramming pretender at this point 🙂  &lt;/p&gt;

&lt;p&gt;To read additional discussion around this specific example, see the Julia-Users discussion at: &lt;a href=&quot;https://groups.google.com/forum/#!topic/julia-users/zvJmqB2N0GQ&quot;&gt;https://groups.google.com/forum/#!topic/julia-users/zvJmqB2N0GQ&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edit, 11/22/2014:&lt;/strong&gt; &lt;a href=&quot;http://www.reddit.com/r/Julia/comments/2mvtnr/code_refactoring_using_metaprogramming_in_julia/cma5g25&quot;&gt;DarthToaster on Reddit&lt;/a&gt; provided another fantastic way to approach refactoring, using macros:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;macro&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; endpoint&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;quote&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; $&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;esc&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}())&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_oauth&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://api.twitter.com/1.1/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;path&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;@endpoint&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_help_configuration&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;help/configuration.json&quot;&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@endpoint&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_help_languages&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;help/languages.json&quot;&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>Visualizing Analytics Languages With VennEuler.jl</title>
        
          <description>&lt;p&gt;It often doesn’t take much to get me off track, and on a holiday weekend…well, I was just begging for a fun way to shirk. Enter Harlan Harris:&lt;/p&gt;

&lt;blockquote class=&quot;twitter-tweet&quot; data-cards=&quot;hidden&quot; data-partner=&quot;tweetdeck&quot;&gt;
  &lt;p&gt;
    someone redo this area-prop'l Venn w/ my Julia pkg! &lt;a href=&quot;http://t.co/Mh8rXZbRgY&quot;&gt;http://t.co/Mh8rXZbRgY&lt;/a&gt; &lt;a href=&quot;http://t.co/RDWNQHTw3S&quot;&gt;http://t.co/RDWNQHTw3S&lt;/a&gt; &lt;a href=&quot;http://t.co/ljujd9DG0T&quot;&gt;http://t.co/ljujd9DG0T&lt;/a&gt; via &lt;a href=&quot;https://twitter.com/revodavid&quot;&gt;@revodavid&lt;/a&gt;
  &lt;/p&gt;

  &lt;p&gt;
    — Harlan Harris (@HarlanH) &lt;a href=&quot;https://twitter.com/HarlanH/statuses/505365468363100160&quot;&gt;August 29, 2014&lt;/a&gt;
  &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hey, I’m someone looking for something to do! And I like writing Julia code! So let’s have a look at recreating this diagram in Julia using VennEuler.jl (&lt;a title=&quot;VennEuler.jl example&quot; href=&quot;http://nbviewer.ipython.org/gist/randyzwitch/860e1d9ae5a12cb61b1b&quot; target=&quot;_blank&quot;&gt;IJulia Notebook link&lt;/a&gt;):&lt;/p&gt;

&lt;div style=&quot;width: 490px&quot; class=&quot;wp-caption alignnone&quot;&gt;
  &lt;img src=&quot;http://revolution-computing.typepad.com/.a/6a010534b1db25970b01a73e0af9c7970d-800wi&quot; alt=&quot;&quot; width=&quot;480&quot; height=&quot;427&quot; /&gt;

  &lt;p class=&quot;wp-caption-text&quot;&gt;
    Source: Revolution R/KDNuggets
  &lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;http://blog.revolutionanalytics.com/2014/08/r-tops-kdnuggets-data-analysis-software-poll-for-4th-consecutive-year.html&quot; target=&quot;_blank&quot;&gt;http://blog.revolutionanalytics.com/2014/08/r-tops-kdnuggets-data-analysis-software-poll-for-4th-consecutive-year.html&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;installing-venneulerjl&quot;&gt;Installing VennEuler.jl&lt;/h2&gt;

&lt;p&gt;Because VennEuler.jl is not in METADATA as of the time of writing, instead of using Pkg.add() you’ll need to run:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;Pkg&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://github.com/HarlanH/VennEuler.jl.git&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Note that VennEuler uses some of the more exotic packages (at least to me) like NLopt and Cairo, so you might need to have a few additional dependencies installed with the package.&lt;/p&gt;

&lt;h2 id=&quot;data&quot;&gt;Data&lt;/h2&gt;

&lt;p&gt;The data was a bit confusing to me at first, since the percentages add up to more than 100% (people could vote multiple times). In order to create a dataset to use, I took the percentages, multiplied by 1000, then re-created the voting pattern. The data for the graph can be downloaded from &lt;a title=&quot;Dataset&quot; href=&quot;http://randyzwitch.com/wp-content/uploads/2014/08/kdnuggets_language_survey_2014.csv&quot; target=&quot;_blank&quot;&gt;this link&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;code---circles&quot;&gt;Code - Circles&lt;/h2&gt;

&lt;p&gt;With a few modifications, I basically re-purposed Harlan’s code from the &lt;a href=&quot;https://github.com/HarlanH/VennEuler.jl/blob/master/test/DC2.jl&quot;&gt;package test files&lt;/a&gt;. The circle result is as follows:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VennEuler&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readcsv&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/home/rzwitch/Desktop/kdnuggets_language_survey_2014.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;header&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bool&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Circles&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;make_euler_object&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EulerSpec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;# circles, for now&lt;/span&gt;

&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minf&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimize&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random_state&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ftol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xtol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.0025&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxtime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;120&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;got &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minf at &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minx (returned &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;ret)&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;render&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/home/rzwitch/Desktop/kd.svg&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2014/08/venneulercircles.png&quot; alt=&quot;venneulercircles&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Since the percentage of R, SAS, and Python users isn’t too dramatically different (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;49.81%&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;33.42%&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;40.97%&lt;/code&gt; respectively) and the visualizations are circles, it’s a bit hard to tell that R is about 16% points higher than SAS and 9% points higher than Python.&lt;/p&gt;

&lt;h2 id=&quot;code--rectangles&quot;&gt;Code - Rectangles&lt;/h2&gt;

&lt;p&gt;Alternatively, we can use rectangles to represent the areas:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VennEuler&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readcsv&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/home/rzwitch/Desktop/kdnuggets_language_survey_2014.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;header&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bool&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Rectangles&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;make_euler_object&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EulerSpec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rectangle&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EulerSpec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rectangle&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;EulerSpec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rectangle&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)],&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;sizesum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;


&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minf&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimize_iteratively&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random_state&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ftol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xtol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.0025&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxtime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;phase 1: got &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minf at &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minx (returned &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;ret)&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minf&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimize&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ftol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xtol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.001&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxtime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;phase 2: got &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minf at &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minx (returned &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;ret)&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;render&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/home/rzwitch/Desktop/kd-rects.svg&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2014/08/venneulerrectangles.png&quot; alt=&quot;venneulerrectangles&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Here, it’s a slight bit easier to see that SAS and Python are about the same area-wise and that R is larger, although the different dimensions do obscure this fact a bit.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;If I spent more time with this package, I’m sure I could make something even more aesthetically pleasing. And for that matter, it’s still a pre-production package that will no doubt get better in the future. But at the very least, there is a way to create an area-proportional representation of relationships using VennEuler.jl in Julia.&lt;/p&gt;
</description>
        
        <pubDate>Fri, 29 Aug 2014 15:16:24 +0000</pubDate>
        <link>
        http://randyzwitch.com/visualizing-analytics-languages-venneuler-jl/</link>
        <guid isPermaLink="true">http://randyzwitch.com/visualizing-analytics-languages-venneuler-jl/</guid>
        <content type="html" xml:base="/visualizing-analytics-languages-venneuler-jl/">&lt;p&gt;It often doesn’t take much to get me off track, and on a holiday weekend…well, I was just begging for a fun way to shirk. Enter Harlan Harris:&lt;/p&gt;

&lt;blockquote class=&quot;twitter-tweet&quot; data-cards=&quot;hidden&quot; data-partner=&quot;tweetdeck&quot;&gt;
  &lt;p&gt;
    someone redo this area-prop'l Venn w/ my Julia pkg! &lt;a href=&quot;http://t.co/Mh8rXZbRgY&quot;&gt;http://t.co/Mh8rXZbRgY&lt;/a&gt; &lt;a href=&quot;http://t.co/RDWNQHTw3S&quot;&gt;http://t.co/RDWNQHTw3S&lt;/a&gt; &lt;a href=&quot;http://t.co/ljujd9DG0T&quot;&gt;http://t.co/ljujd9DG0T&lt;/a&gt; via &lt;a href=&quot;https://twitter.com/revodavid&quot;&gt;@revodavid&lt;/a&gt;
  &lt;/p&gt;

  &lt;p&gt;
    — Harlan Harris (@HarlanH) &lt;a href=&quot;https://twitter.com/HarlanH/statuses/505365468363100160&quot;&gt;August 29, 2014&lt;/a&gt;
  &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hey, I’m someone looking for something to do! And I like writing Julia code! So let’s have a look at recreating this diagram in Julia using VennEuler.jl (&lt;a title=&quot;VennEuler.jl example&quot; href=&quot;http://nbviewer.ipython.org/gist/randyzwitch/860e1d9ae5a12cb61b1b&quot; target=&quot;_blank&quot;&gt;IJulia Notebook link&lt;/a&gt;):&lt;/p&gt;

&lt;div style=&quot;width: 490px&quot; class=&quot;wp-caption alignnone&quot;&gt;
  &lt;img src=&quot;http://revolution-computing.typepad.com/.a/6a010534b1db25970b01a73e0af9c7970d-800wi&quot; alt=&quot;&quot; width=&quot;480&quot; height=&quot;427&quot; /&gt;

  &lt;p class=&quot;wp-caption-text&quot;&gt;
    Source: Revolution R/KDNuggets
  &lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;http://blog.revolutionanalytics.com/2014/08/r-tops-kdnuggets-data-analysis-software-poll-for-4th-consecutive-year.html&quot; target=&quot;_blank&quot;&gt;http://blog.revolutionanalytics.com/2014/08/r-tops-kdnuggets-data-analysis-software-poll-for-4th-consecutive-year.html&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;installing-venneulerjl&quot;&gt;Installing VennEuler.jl&lt;/h2&gt;

&lt;p&gt;Because VennEuler.jl is not in METADATA as of the time of writing, instead of using Pkg.add() you’ll need to run:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;Pkg&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://github.com/HarlanH/VennEuler.jl.git&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Note that VennEuler uses some of the more exotic packages (at least to me) like NLopt and Cairo, so you might need to have a few additional dependencies installed with the package.&lt;/p&gt;

&lt;h2 id=&quot;data&quot;&gt;Data&lt;/h2&gt;

&lt;p&gt;The data was a bit confusing to me at first, since the percentages add up to more than 100% (people could vote multiple times). In order to create a dataset to use, I took the percentages, multiplied by 1000, then re-created the voting pattern. The data for the graph can be downloaded from &lt;a title=&quot;Dataset&quot; href=&quot;http://randyzwitch.com/wp-content/uploads/2014/08/kdnuggets_language_survey_2014.csv&quot; target=&quot;_blank&quot;&gt;this link&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;code---circles&quot;&gt;Code - Circles&lt;/h2&gt;

&lt;p&gt;With a few modifications, I basically re-purposed Harlan’s code from the &lt;a href=&quot;https://github.com/HarlanH/VennEuler.jl/blob/master/test/DC2.jl&quot;&gt;package test files&lt;/a&gt;. The circle result is as follows:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VennEuler&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readcsv&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/home/rzwitch/Desktop/kdnuggets_language_survey_2014.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;header&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bool&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Circles&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;make_euler_object&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EulerSpec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;# circles, for now&lt;/span&gt;

&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minf&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimize&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random_state&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ftol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xtol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.0025&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxtime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;120&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;got &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minf at &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minx (returned &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;ret)&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;render&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/home/rzwitch/Desktop/kd.svg&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2014/08/venneulercircles.png&quot; alt=&quot;venneulercircles&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Since the percentage of R, SAS, and Python users isn’t too dramatically different (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;49.81%&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;33.42%&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;40.97%&lt;/code&gt; respectively) and the visualizations are circles, it’s a bit hard to tell that R is about 16% points higher than SAS and 9% points higher than Python.&lt;/p&gt;

&lt;h2 id=&quot;code--rectangles&quot;&gt;Code - Rectangles&lt;/h2&gt;

&lt;p&gt;Alternatively, we can use rectangles to represent the areas:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VennEuler&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readcsv&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/home/rzwitch/Desktop/kdnuggets_language_survey_2014.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;header&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bool&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Rectangles&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;make_euler_object&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EulerSpec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rectangle&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EulerSpec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rectangle&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;EulerSpec&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rectangle&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)],&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;sizesum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;


&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minf&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimize_iteratively&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random_state&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ftol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xtol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.0025&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxtime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;phase 1: got &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minf at &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minx (returned &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;ret)&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minf&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimize&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ftol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xtol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.001&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxtime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;phase 2: got &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minf at &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minx (returned &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;ret)&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;render&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/home/rzwitch/Desktop/kd-rects.svg&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eo&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minx&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2014/08/venneulerrectangles.png&quot; alt=&quot;venneulerrectangles&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Here, it’s a slight bit easier to see that SAS and Python are about the same area-wise and that R is larger, although the different dimensions do obscure this fact a bit.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;If I spent more time with this package, I’m sure I could make something even more aesthetically pleasing. And for that matter, it’s still a pre-production package that will no doubt get better in the future. But at the very least, there is a way to create an area-proportional representation of relationships using VennEuler.jl in Julia.&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      <item>
        <title>String Interpolation for Fun and Profit</title>
        
          <description>&lt;p&gt;In a previous post, I showed how I frequently use &lt;a href=&quot;http://randyzwitch.com/julia-odbc-jl/&quot;&gt;Julia as a ‘glue’ language&lt;/a&gt; to connect multiple systems in a complicated data pipeline. For this blog post, I will show two more examples where I use Julia for general programming, rather than for computationally-intense programs.&lt;/p&gt;

&lt;h2 id=&quot;string-buildingintroduction&quot;&gt;String Building: Introduction&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://docs.julialang.org/en/latest/manual/strings/&quot;&gt;Strings section of the Julia Manual&lt;/a&gt; provides a very in-depth treatment of the considerations when using strings within Julia. For the purposes of my examples, there are only three things to know:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Strings are immutable within Julia and 1-indexed&lt;/li&gt;
  &lt;li&gt;Strings are easily created through the a syntax familiar to most languages:&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;authorname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;randy zwitch&quot;&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;randy zwitch&quot;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;authorname&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;ul&gt;
  &lt;li&gt;String interpolation is easiest done using dollar-sign notation. Additionally, parenthesis can be used to avoid symbol ambiguity:&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpolated&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;the author of this blog post is &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(authorname)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;the author of this blog post is randy zwitch&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;del&gt;If you are using large volumes of textual data, you’ll want to pay attention to the difference between the various string types that Julia provides (&lt;em&gt;UTF8/16/32, ASCII, Unicode, etc&lt;/em&gt;), but for the purposes of this blog post we’ll just be using the &lt;em&gt;ASCIIString&lt;/em&gt; type by not explicitly declaring the string type and only using ASCII characters.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;EDIT, 9/8/2016: Starting with version 0.5, Julia defaults to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;String&lt;/code&gt; type, which is an UTF-8 character encoding.&lt;/p&gt;

&lt;h2 id=&quot;example-1-repetitive-queries&quot;&gt;Example 1: Repetitive Queries&lt;/h2&gt;

&lt;p&gt;As part of my data engineering responsibilities at work, I often get requests to pull a sample of every table in a new database in our Hadoop cluster. This type of request is usually from the business owner, who wants to evaluate the data set has been imported correctly, but doesn’t actually want to write any sort of queries. So using the &lt;a href=&quot;https://github.com/quinnj/ODBC.jl&quot;&gt;ODBC.jl&lt;/a&gt; package, I repeatedly do the same &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;select * from &amp;lt;tablename&amp;gt;&lt;/code&gt; query and save to individual .tab files:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;       &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fresh&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;approach&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;technical&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;computing&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Documentation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;http&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;://&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;docs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;julialang&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;org&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;__&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;kt&quot;&gt;Type&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;help()&quot;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;list&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;help&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topics&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;`&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Version&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prerelease&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4028&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2014&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;42&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;UTC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;__&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;__&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Commit&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2185&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bd1&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;days&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;old&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;master&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;__&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;x86_64&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w64&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mingw32&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Production hiveserver2&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Object&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;----------------------&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Data&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Source&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Production&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hiveserver2&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Production&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hiveserver2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Number&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Contains&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;resultset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;No&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tables&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;show tables in db;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;elapsed&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.167028049&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seconds&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tbl&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tab_name&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;select * from db.&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(tbl) &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;limit 1000;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;C:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;data_dump&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(tbl)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;.tab&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;'\t'&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;While the query is simple, writing/running this hundreds of times would be a waste of effort. So with a simple loop over the array of tables, I can provide a sample of hundreds of tables in .tab files with five lines of code.&lt;/p&gt;

&lt;h2 id=&quot;example-2-generating-query-code&quot;&gt;Example 2: Generating Query Code&lt;/h2&gt;

&lt;p&gt;In another task, I was asked to join a handful of Hive tables, then transpose the table from “long” to “wide”, so that each id value only had one row instead of multiple. This is fairly trivial to do using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASE&lt;/code&gt; statements in SQL; the problem arises when you have thousands of potential row values to transpose into columns! Instead of getting carpal tunnel syndrome typing out thousands of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASE&lt;/code&gt; statements, I decided to use Julia to generate the SQL code itself:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Starting portion of query, the groupby columns&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;groupbycols&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;select
interact.interactionid,
interact.agentname,
interact.agentid,
interact.agentgroup,
interact.agentsupervisor,
interact.sitename,
interact.dnis,
interact.agentextension,
interact.interactiondirection,
interact.interactiontype,
interact.customerid,
interact.customercity,
interact.customerstate,
interact.interactiondatetime,
interact.durationinms,&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Generate CASE statements based on the number of possible values of queryid&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; casestatements&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;repetitions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int64&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;repetitions&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;MAX(CASE WHEN q.queryid = &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;queryid then q.score END) as q&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(queryid)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;score,&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;repetitions&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;MIN(CASE WHEN q.queryid = &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;queryid then q.startoffsetinms END) as q&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(queryid)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;startoffset,&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;repetitions&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;MAX(CASE WHEN q.queryid = &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;queryid then q.endoffsetinms END) as q&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(queryid)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;endoffset,&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
	&lt;span class=&quot;c&quot;&gt;#Last clause, so repeat it up to number of repetitions minus 1, then do simple print to get line without comma at end&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;repetitions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;SUM(CASE WHEN q.queryid = &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;queryid and q.score &amp;gt; q.mediumthreshold THEN 1 END) as q&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(queryid)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;hits,&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;SUM(CASE WHEN q.queryid = &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;repetitions and q.score &amp;gt; q.mediumthreshold THEN 1 END) as q&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(repetitions)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;hits&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Ending table statement&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tablestatements&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;from db.table1 as interact
left join db.table2 as q on (interact.interactionid = q.interactionid)
left join db.table3 as t on (interact.interactionid = t.interactionid)
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15;&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Submitting all of the statements on one line is usually frowned upon, but this will generate my SQL code&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupbycols&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;casestatements&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tablestatements&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactionid&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agentname&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agentid&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agentgroup&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agentsupervisor&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sitename&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dnis&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agentextension&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactiondirection&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactiontype&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customerid&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customercity&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customerstate&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactiondatetime&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;durationinms&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q1score&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q2score&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q3score&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q4score&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q5score&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q1startoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q2startoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q3startoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q4startoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q5startoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q1endoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q2endoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q3endoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q4endoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q5endoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mediumthreshold&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q1hits&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mediumthreshold&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q2hits&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mediumthreshold&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q3hits&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mediumthreshold&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q4hits&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mediumthreshold&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q5hits&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;db&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;left&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;join&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;db&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactionid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactionid&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;left&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;join&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;db&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactionid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactionid&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;group&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;13&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The example here only repeats the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASE&lt;/code&gt; statements five times, which wouldn’t really be that much typing. However, for my actual application, the number of possible values was 2153, leading to a query result which was 8157 columns! Suffice to say, I’d still be writing that code if I decided to do it by hand.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;Like my ‘glue language’ post, I hope this post has shown that Julia can be used for more than grunting about microbenchmark performance. Whereas I used to use Python for doing weird string operations like this, I’m finding that the dollar-sign syntax in Julia feels more comfortable for me than the Python string formatting mini-language (although that’s not particularly difficult either). So if you’ve been hesitant to jump into learning Julia because you think it’s only useful for doing Mandelbrot calculations or complex linear algebra, Julia is just as at-home doing quick general programming tasks as well.&lt;/p&gt;
</description>
        
        <pubDate>Mon, 14 Jul 2014 12:01:10 +0000</pubDate>
        <link>
        http://randyzwitch.com/string-interpolation-julia/</link>
        <guid isPermaLink="true">http://randyzwitch.com/string-interpolation-julia/</guid>
        <content type="html" xml:base="/string-interpolation-julia/">&lt;p&gt;In a previous post, I showed how I frequently use &lt;a href=&quot;http://randyzwitch.com/julia-odbc-jl/&quot;&gt;Julia as a ‘glue’ language&lt;/a&gt; to connect multiple systems in a complicated data pipeline. For this blog post, I will show two more examples where I use Julia for general programming, rather than for computationally-intense programs.&lt;/p&gt;

&lt;h2 id=&quot;string-buildingintroduction&quot;&gt;String Building: Introduction&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://docs.julialang.org/en/latest/manual/strings/&quot;&gt;Strings section of the Julia Manual&lt;/a&gt; provides a very in-depth treatment of the considerations when using strings within Julia. For the purposes of my examples, there are only three things to know:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Strings are immutable within Julia and 1-indexed&lt;/li&gt;
  &lt;li&gt;Strings are easily created through the a syntax familiar to most languages:&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;authorname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;randy zwitch&quot;&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;randy zwitch&quot;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;authorname&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;ul&gt;
  &lt;li&gt;String interpolation is easiest done using dollar-sign notation. Additionally, parenthesis can be used to avoid symbol ambiguity:&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpolated&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;the author of this blog post is &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(authorname)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;the author of this blog post is randy zwitch&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;del&gt;If you are using large volumes of textual data, you’ll want to pay attention to the difference between the various string types that Julia provides (&lt;em&gt;UTF8/16/32, ASCII, Unicode, etc&lt;/em&gt;), but for the purposes of this blog post we’ll just be using the &lt;em&gt;ASCIIString&lt;/em&gt; type by not explicitly declaring the string type and only using ASCII characters.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;EDIT, 9/8/2016: Starting with version 0.5, Julia defaults to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;String&lt;/code&gt; type, which is an UTF-8 character encoding.&lt;/p&gt;

&lt;h2 id=&quot;example-1-repetitive-queries&quot;&gt;Example 1: Repetitive Queries&lt;/h2&gt;

&lt;p&gt;As part of my data engineering responsibilities at work, I often get requests to pull a sample of every table in a new database in our Hadoop cluster. This type of request is usually from the business owner, who wants to evaluate the data set has been imported correctly, but doesn’t actually want to write any sort of queries. So using the &lt;a href=&quot;https://github.com/quinnj/ODBC.jl&quot;&gt;ODBC.jl&lt;/a&gt; package, I repeatedly do the same &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;select * from &amp;lt;tablename&amp;gt;&lt;/code&gt; query and save to individual .tab files:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;       &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fresh&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;approach&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;technical&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;computing&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Documentation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;http&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;://&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;docs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;julialang&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;org&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;__&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;kt&quot;&gt;Type&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;help()&quot;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;list&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;help&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topics&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;`&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Version&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prerelease&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4028&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2014&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;42&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;UTC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;__&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;__&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Commit&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2185&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bd1&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;days&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;old&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;master&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;__&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;x86_64&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w64&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mingw32&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Production hiveserver2&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Object&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;----------------------&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Data&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Source&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Production&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hiveserver2&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Production&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hiveserver2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Number&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Contains&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;resultset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;No&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tables&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;show tables in db;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;elapsed&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.167028049&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seconds&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tbl&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tab_name&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;select * from db.&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(tbl) &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;limit 1000;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;C:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;data_dump&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(tbl)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;.tab&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;'\t'&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;While the query is simple, writing/running this hundreds of times would be a waste of effort. So with a simple loop over the array of tables, I can provide a sample of hundreds of tables in .tab files with five lines of code.&lt;/p&gt;

&lt;h2 id=&quot;example-2-generating-query-code&quot;&gt;Example 2: Generating Query Code&lt;/h2&gt;

&lt;p&gt;In another task, I was asked to join a handful of Hive tables, then transpose the table from “long” to “wide”, so that each id value only had one row instead of multiple. This is fairly trivial to do using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASE&lt;/code&gt; statements in SQL; the problem arises when you have thousands of potential row values to transpose into columns! Instead of getting carpal tunnel syndrome typing out thousands of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASE&lt;/code&gt; statements, I decided to use Julia to generate the SQL code itself:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Starting portion of query, the groupby columns&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;groupbycols&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;select
interact.interactionid,
interact.agentname,
interact.agentid,
interact.agentgroup,
interact.agentsupervisor,
interact.sitename,
interact.dnis,
interact.agentextension,
interact.interactiondirection,
interact.interactiontype,
interact.customerid,
interact.customercity,
interact.customerstate,
interact.interactiondatetime,
interact.durationinms,&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Generate CASE statements based on the number of possible values of queryid&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; casestatements&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;repetitions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int64&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;repetitions&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;MAX(CASE WHEN q.queryid = &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;queryid then q.score END) as q&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(queryid)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;score,&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;repetitions&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;MIN(CASE WHEN q.queryid = &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;queryid then q.startoffsetinms END) as q&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(queryid)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;startoffset,&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;repetitions&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;MAX(CASE WHEN q.queryid = &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;queryid then q.endoffsetinms END) as q&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(queryid)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;endoffset,&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
	&lt;span class=&quot;c&quot;&gt;#Last clause, so repeat it up to number of repetitions minus 1, then do simple print to get line without comma at end&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;repetitions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;SUM(CASE WHEN q.queryid = &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;queryid and q.score &amp;gt; q.mediumthreshold THEN 1 END) as q&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(queryid)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;hits,&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;SUM(CASE WHEN q.queryid = &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;repetitions and q.score &amp;gt; q.mediumthreshold THEN 1 END) as q&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(repetitions)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;hits&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Ending table statement&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tablestatements&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;from db.table1 as interact
left join db.table2 as q on (interact.interactionid = q.interactionid)
left join db.table3 as t on (interact.interactionid = t.interactionid)
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15;&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Submitting all of the statements on one line is usually frowned upon, but this will generate my SQL code&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupbycols&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;casestatements&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tablestatements&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactionid&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agentname&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agentid&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agentgroup&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agentsupervisor&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sitename&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dnis&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agentextension&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactiondirection&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactiontype&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customerid&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customercity&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customerstate&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactiondatetime&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;durationinms&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q1score&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q2score&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q3score&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q4score&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q5score&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q1startoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q2startoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q3startoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q4startoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MIN&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q5startoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q1endoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q2endoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q3endoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q4endoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;then&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endoffsetinms&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q5endoffset&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mediumthreshold&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q1hits&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mediumthreshold&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q2hits&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mediumthreshold&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q3hits&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mediumthreshold&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q4hits&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CASE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mediumthreshold&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q5hits&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;db&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;left&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;join&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;db&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactionid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactionid&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;left&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;join&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;db&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interact&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactionid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interactionid&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;group&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;13&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The example here only repeats the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASE&lt;/code&gt; statements five times, which wouldn’t really be that much typing. However, for my actual application, the number of possible values was 2153, leading to a query result which was 8157 columns! Suffice to say, I’d still be writing that code if I decided to do it by hand.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;Like my ‘glue language’ post, I hope this post has shown that Julia can be used for more than grunting about microbenchmark performance. Whereas I used to use Python for doing weird string operations like this, I’m finding that the dollar-sign syntax in Julia feels more comfortable for me than the Python string formatting mini-language (although that’s not particularly difficult either). So if you’ve been hesitant to jump into learning Julia because you think it’s only useful for doing Mandelbrot calculations or complex linear algebra, Julia is just as at-home doing quick general programming tasks as well.&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      <item>
        <title>Using Julia As A &quot;Glue&quot; Language</title>
        
          <description>&lt;p&gt;While much of the focus in the Julia community has been on the performance aspects of Julia relative to other scientific computing languages, Julia is also perfectly suited to ‘glue’ together multiple data sources/languages. In this blog post, I will cover how to create an interactive plot using &lt;a title=&quot;Gadfly.jl documentation&quot; href=&quot;http://dcjones.github.io/Gadfly.jl/&quot; target=&quot;_blank&quot;&gt;Gadfly.jl&lt;/a&gt;, by first preparing the data using Hadoop and &lt;a title=&quot;Teradata Aster&quot; href=&quot;http://www.asterdata.com/&quot; target=&quot;_blank&quot;&gt;Teradata Aster&lt;/a&gt; via &lt;a title=&quot;Julia ODBC&quot; href=&quot;https://github.com/quinnj/ODBC.jl&quot; target=&quot;_blank&quot;&gt;ODBC.jl&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The example problem I am going to solve is calculating and visualizing the number of airplanes by hour in the air at any given time in the U.S. for the year 1987. Because of the structure and storage of the underlying data, I will need to write some custom Hive code, upload the data to Teradata Aster via a command-line utility, re-calculate the number of flights per hour using a built-in Aster function, then using Julia to visualize the data.&lt;/p&gt;

&lt;h2 id=&quot;step-1-getting-data-from-hadoop&quot;&gt;Step 1: Getting Data From Hadoop&lt;/h2&gt;

&lt;p&gt;In a prior set of &lt;a title=&quot;Getting Started Using Hadoop, Part 3: Loading Data&quot; href=&quot;http://randyzwitch.com/uploading-data-hadoop-amazon-ec2-cloudera-part-3/&quot; target=&quot;_blank&quot;&gt;blog posts&lt;/a&gt;, I talked about loading the &lt;a title=&quot;Airline dataset&quot; href=&quot;http://stat-computing.org/dataexpo/2009/&quot; target=&quot;_blank&quot;&gt;airline dataset&lt;/a&gt; into Hadoop, then &lt;a title=&quot;Getting Started With Hadoop, Final: Analysis Using Hive &amp;amp; Pig&quot; href=&quot;http://randyzwitch.com/getting-started-hadoop-hive-pig/&quot; target=&quot;_blank&quot;&gt;analyzing the dataset using Hive or Pig&lt;/a&gt;. Using ODBC.jl, we can use Hive via Julia to submit our queries. The hardest part of setting up this process is making sure that you have the appropriate Hive drivers for your Hadoop cluster and credentials (which isn’t covered here). Once you have your DSN set up, running Hive queries is as easy as the following:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Connect to Hadoop cluster via Hive (pre-defined Windows DSN in ODBC Manager)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;hiveconn&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Production hiveserver2&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;your-user-name&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;your-password-here&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Clean data, return results directly to file&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#Data returned with have origin of flight, flight takeoff, flight landing and elapsed time&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;hive_query_string&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;select
origin,
from_unixtime(flight_takeoff_datetime_origin) as flight_takeoff_datetime_origin,
from_unixtime(flight_takeoff_datetime_origin + (actualelapsedtime * 60)) as flight_landing_datetime_origin,
actualelapsedtime
from
(select
origin,
unix_timestamp(CONCAT(year,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, month, &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, dayofmonth, &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, SUBSTR(LPAD(deptime, 4, 0), 1, 2), &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, SUBSTR(LPAD(deptime, 4, 0), 3, 4), &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;))  as flight_takeoff_datetime_origin,
actualelapsedtime
from vw_airline
where year = 1987 and actualelapsedtime &amp;gt; 0) inner_query;&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Run query, save results directly to file&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hive_query_string&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hiveconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;C:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;airline_times.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;delim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;','&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this code, I’ve written my query as a Julia string, to keep my code easily modifiable. Then, I pass the Julia string object to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query()&lt;/code&gt; function, along with my ODBC &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;connection&lt;/code&gt; object. This query runs on Hadoop through Hive, then streams the result directly to my local hard drive, making this a very RAM efficient (though I/O inefficient!) operation.&lt;/p&gt;

&lt;h2 id=&quot;step-2-shelling-out-to-load-data-to-aster&quot;&gt;Step 2: Shelling Out To Load Data To Aster&lt;/h2&gt;

&lt;p&gt;Once I created the file with my Hadoop results in it, I now have a decision point: I can either A) do the rest of the analysis in Julia or B) use a different tool for my calculations. Because this is a toy example, I’m going to use Teradata Aster to do my calculations, which provides a convenient function called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;burst()&lt;/code&gt; to regularize timestamps into fixed intervals. But before I can use Aster to ‘burst’ my data, I first need to upload it to the database.&lt;/p&gt;

&lt;p&gt;While I could loop over the data within Julia and insert each record one at a time, Teradata provides a command-line utility to upload data in parallel. Running command-line scripts from within Julia is as easy as using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;run()&lt;/code&gt; command, with each command surrounded in backticks:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Connect to Aster (pre-defined Windows DSN in ODBC Manager)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;aster01&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;your-user-name&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;your-password&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Create table to hold airline results&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;create_airline_table_statement&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;create table ebi_temp.airline
(origin varchar,
flight_takeoff_datetime_origin timestamp,
flight_landing_datetime_origin timestamp,
actualelapsedtime int,
partition key (origin))&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Execute query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_airline_table_statement&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Create airport table&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#Data downloaded from http://openflights.org/data.html&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;create_airport_table_statement&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;create table ebi_temp.airport
(airport_id int,
name varchar,
city varchar,
country varchar,
IATAFAA varchar,
ICAO varchar,
latitude float,
longitude float,
altitude int,
timezone float,
dst varchar,
partition key (country))&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Execute query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_airport_table_statement&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Upload data via run() command&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#ncluster_loader utility already on Windows PATH&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sb&quot;&gt;`ncluster_loader -h 192.168.1.1 -U your-user-name -w your-password -d aster01 -c --skip-rows=1 --el-enabled --el-table e_dist_error_2 --el-schema temp temp.airline C:\\airline_times.csv`&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sb&quot;&gt;`ncluster_loader -h 192.168.1.1 -U your-user-name -w your-password -d aster01 -c --el-enabled --el-table e_dist_error_2 --el-schema temp temp.airport C:\\airports.dat`&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;While I could’ve run this at the command-line, having all of this within an IJulia Notebook keeps all my work together, should I need to re-run this in the future.&lt;/p&gt;

&lt;h2 id=&quot;step-3-using-aster-for-calculations&quot;&gt;Step 3: Using Aster For Calculations&lt;/h2&gt;

&lt;p&gt;With my data now loaded in Aster, I can normalize the timestamps to UTC, then ‘burst’ the data into regular time intervals. Again, all of this can be done via ODBC from within Julia:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Normalize timestamps from local time to UTC time&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;aster_view_string&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;
create view temp.vw_airline_times_utc as
select
row_number() over(order by flight_takeoff_datetime_origin) as unique_flight_number,
origin,
flight_takeoff_datetime_origin,
flight_landing_datetime_origin,
flight_takeoff_datetime_origin - (INTERVAL '1 hour' * timezone) as flight_takeoff_datetime_utc,
flight_landing_datetime_origin - (INTERVAL '1 hour' * timezone) as flight_landing_datetime_utc,
timezone
from temp.airline
left join temp.airport on (airline.origin = airport.iatafaa);&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Execute query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aster_view_string&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Teradata Aster SQL-H functionality, accessed via ODBC query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;burst_query_string&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;create table temp.airline_burst_hour distribute by hash (origin) as
SELECT
*,
&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;INTERVAL_START&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;::date as calendar_date,
extract(HOUR from &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;INTERVAL_START&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;) as hour_utc
FROM BURST(
     ON (select
        unique_flight_number,
        origin,
        flight_takeoff_datetime_utc,
        flight_landing_datetime_utc
        FROM temp.vw_airline_times_utc
)
     START_COLUMN('flight_takeoff_datetime_utc')
     END_COLUMN('flight_landing_datetime_utc')
     BURST_INTERVAL('3600')
);&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Execute query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;burst_query_string&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Since it might not be clear what I’m doing here, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;burst()&lt;/code&gt; function in Aster takes a row of data with a start and end timestamp, and (potentially) returns multiple rows which normalize the time between the timestamps. If you’re familiar with pandas in Python, it’s a similar functionality to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;resample&lt;/code&gt; on a series of timestamps.&lt;/p&gt;

&lt;h2 id=&quot;step-4-download-smaller-data-into-julia-visualize&quot;&gt;Step 4: Download Smaller Data Into Julia, Visualize&lt;/h2&gt;

&lt;p&gt;Now that the data has been processed from Hadoop to Aster through a series of queries, we now have a much smaller dataset that can be loaded into RAM and processed by Julia:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Calculate the number of flights per hour per day&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;flights_query&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;
select
calendar_date,
hour_utc,
sum(1) as num_flights
from temp.airline_burst_hour
group by 1,2
order by 1,2;&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Bring results into Julia DataFrame&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;flights_per_day&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flights_query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Gadfly&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Create boxplot, with one box plot per hour&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;set_default_plot_size&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plot&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flights_per_day&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;hour_utc&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;num_flights&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Guide&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlabel&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Hour UTC&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Guide&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ylabel&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Flights In Air&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Guide&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;title&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Number of Flights In Air To/From U.S. By Hour - 1987&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Scale&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_continuous&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minvalue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxvalue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Geom&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;boxplot&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The Gadfly code above produces the following plot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/airline_plot.png&quot; alt=&quot;gadfly&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Since this chart is in UTC, it might not be obvious what the interpretation is of the trend. Because the airline dataset represents flights either leaving or returning to the United States, there are many fewer planes in the air overnight and the early morning hours (UTC 7-10, 2-5am Eastern). During the hours when the airports are open, there appears to be a limit of roughly 2500 planes per hour in the sky.&lt;/p&gt;

&lt;h2 id=&quot;why-not-do-all-of-this-in-julia&quot;&gt;Why Not Do All Of This In Julia?&lt;/h2&gt;

&lt;p&gt;At this point, you might be tempted to wonder why go through all of this effort? Couldn’t this all be done in Julia?&lt;/p&gt;

&lt;p&gt;Yes, you probably could do all of this work in Julia with a sufficiently large amount of RAM. As a proof-of-concept, I hope I’ve shown that there is much more to Julia than micro-benchmarking Julia’s speed relative to other scientific programming languages. You’ll notice that in none of my code have I used any type annotations, as none would really make sense (nor would they improve performance).  And although this is a toy example purposely using multiple systems, I much more frequently use Julia in this manner at work than doing linear algebra or machine learning.&lt;/p&gt;

&lt;p&gt;So next time you’re tempted to use Python or R or shell scripting or whatever, consider Julia as well. Julia is just as at-home as a scripting language as a scientific computing language.&lt;/p&gt;
</description>
        
        <pubDate>Tue, 24 Jun 2014 08:57:31 +0000</pubDate>
        <link>
        http://randyzwitch.com/julia-odbc-jl/</link>
        <guid isPermaLink="true">http://randyzwitch.com/julia-odbc-jl/</guid>
        <content type="html" xml:base="/julia-odbc-jl/">&lt;p&gt;While much of the focus in the Julia community has been on the performance aspects of Julia relative to other scientific computing languages, Julia is also perfectly suited to ‘glue’ together multiple data sources/languages. In this blog post, I will cover how to create an interactive plot using &lt;a title=&quot;Gadfly.jl documentation&quot; href=&quot;http://dcjones.github.io/Gadfly.jl/&quot; target=&quot;_blank&quot;&gt;Gadfly.jl&lt;/a&gt;, by first preparing the data using Hadoop and &lt;a title=&quot;Teradata Aster&quot; href=&quot;http://www.asterdata.com/&quot; target=&quot;_blank&quot;&gt;Teradata Aster&lt;/a&gt; via &lt;a title=&quot;Julia ODBC&quot; href=&quot;https://github.com/quinnj/ODBC.jl&quot; target=&quot;_blank&quot;&gt;ODBC.jl&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The example problem I am going to solve is calculating and visualizing the number of airplanes by hour in the air at any given time in the U.S. for the year 1987. Because of the structure and storage of the underlying data, I will need to write some custom Hive code, upload the data to Teradata Aster via a command-line utility, re-calculate the number of flights per hour using a built-in Aster function, then using Julia to visualize the data.&lt;/p&gt;

&lt;h2 id=&quot;step-1-getting-data-from-hadoop&quot;&gt;Step 1: Getting Data From Hadoop&lt;/h2&gt;

&lt;p&gt;In a prior set of &lt;a title=&quot;Getting Started Using Hadoop, Part 3: Loading Data&quot; href=&quot;http://randyzwitch.com/uploading-data-hadoop-amazon-ec2-cloudera-part-3/&quot; target=&quot;_blank&quot;&gt;blog posts&lt;/a&gt;, I talked about loading the &lt;a title=&quot;Airline dataset&quot; href=&quot;http://stat-computing.org/dataexpo/2009/&quot; target=&quot;_blank&quot;&gt;airline dataset&lt;/a&gt; into Hadoop, then &lt;a title=&quot;Getting Started With Hadoop, Final: Analysis Using Hive &amp;amp; Pig&quot; href=&quot;http://randyzwitch.com/getting-started-hadoop-hive-pig/&quot; target=&quot;_blank&quot;&gt;analyzing the dataset using Hive or Pig&lt;/a&gt;. Using ODBC.jl, we can use Hive via Julia to submit our queries. The hardest part of setting up this process is making sure that you have the appropriate Hive drivers for your Hadoop cluster and credentials (which isn’t covered here). Once you have your DSN set up, running Hive queries is as easy as the following:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Connect to Hadoop cluster via Hive (pre-defined Windows DSN in ODBC Manager)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;hiveconn&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Production hiveserver2&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;your-user-name&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;your-password-here&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Clean data, return results directly to file&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#Data returned with have origin of flight, flight takeoff, flight landing and elapsed time&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;hive_query_string&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;select
origin,
from_unixtime(flight_takeoff_datetime_origin) as flight_takeoff_datetime_origin,
from_unixtime(flight_takeoff_datetime_origin + (actualelapsedtime * 60)) as flight_landing_datetime_origin,
actualelapsedtime
from
(select
origin,
unix_timestamp(CONCAT(year,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, month, &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, dayofmonth, &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, SUBSTR(LPAD(deptime, 4, 0), 1, 2), &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, SUBSTR(LPAD(deptime, 4, 0), 3, 4), &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;))  as flight_takeoff_datetime_origin,
actualelapsedtime
from vw_airline
where year = 1987 and actualelapsedtime &amp;gt; 0) inner_query;&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Run query, save results directly to file&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hive_query_string&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hiveconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;C:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;airline_times.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;delim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;','&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this code, I’ve written my query as a Julia string, to keep my code easily modifiable. Then, I pass the Julia string object to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query()&lt;/code&gt; function, along with my ODBC &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;connection&lt;/code&gt; object. This query runs on Hadoop through Hive, then streams the result directly to my local hard drive, making this a very RAM efficient (though I/O inefficient!) operation.&lt;/p&gt;

&lt;h2 id=&quot;step-2-shelling-out-to-load-data-to-aster&quot;&gt;Step 2: Shelling Out To Load Data To Aster&lt;/h2&gt;

&lt;p&gt;Once I created the file with my Hadoop results in it, I now have a decision point: I can either A) do the rest of the analysis in Julia or B) use a different tool for my calculations. Because this is a toy example, I’m going to use Teradata Aster to do my calculations, which provides a convenient function called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;burst()&lt;/code&gt; to regularize timestamps into fixed intervals. But before I can use Aster to ‘burst’ my data, I first need to upload it to the database.&lt;/p&gt;

&lt;p&gt;While I could loop over the data within Julia and insert each record one at a time, Teradata provides a command-line utility to upload data in parallel. Running command-line scripts from within Julia is as easy as using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;run()&lt;/code&gt; command, with each command surrounded in backticks:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Connect to Aster (pre-defined Windows DSN in ODBC Manager)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;aster01&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;your-user-name&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;your-password&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Create table to hold airline results&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;create_airline_table_statement&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;create table ebi_temp.airline
(origin varchar,
flight_takeoff_datetime_origin timestamp,
flight_landing_datetime_origin timestamp,
actualelapsedtime int,
partition key (origin))&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Execute query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_airline_table_statement&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Create airport table&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#Data downloaded from http://openflights.org/data.html&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;create_airport_table_statement&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;create table ebi_temp.airport
(airport_id int,
name varchar,
city varchar,
country varchar,
IATAFAA varchar,
ICAO varchar,
latitude float,
longitude float,
altitude int,
timezone float,
dst varchar,
partition key (country))&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Execute query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_airport_table_statement&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Upload data via run() command&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#ncluster_loader utility already on Windows PATH&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sb&quot;&gt;`ncluster_loader -h 192.168.1.1 -U your-user-name -w your-password -d aster01 -c --skip-rows=1 --el-enabled --el-table e_dist_error_2 --el-schema temp temp.airline C:\\airline_times.csv`&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sb&quot;&gt;`ncluster_loader -h 192.168.1.1 -U your-user-name -w your-password -d aster01 -c --el-enabled --el-table e_dist_error_2 --el-schema temp temp.airport C:\\airports.dat`&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;While I could’ve run this at the command-line, having all of this within an IJulia Notebook keeps all my work together, should I need to re-run this in the future.&lt;/p&gt;

&lt;h2 id=&quot;step-3-using-aster-for-calculations&quot;&gt;Step 3: Using Aster For Calculations&lt;/h2&gt;

&lt;p&gt;With my data now loaded in Aster, I can normalize the timestamps to UTC, then ‘burst’ the data into regular time intervals. Again, all of this can be done via ODBC from within Julia:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Normalize timestamps from local time to UTC time&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;aster_view_string&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;
create view temp.vw_airline_times_utc as
select
row_number() over(order by flight_takeoff_datetime_origin) as unique_flight_number,
origin,
flight_takeoff_datetime_origin,
flight_landing_datetime_origin,
flight_takeoff_datetime_origin - (INTERVAL '1 hour' * timezone) as flight_takeoff_datetime_utc,
flight_landing_datetime_origin - (INTERVAL '1 hour' * timezone) as flight_landing_datetime_utc,
timezone
from temp.airline
left join temp.airport on (airline.origin = airport.iatafaa);&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Execute query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aster_view_string&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Teradata Aster SQL-H functionality, accessed via ODBC query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;burst_query_string&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&quot;create table temp.airline_burst_hour distribute by hash (origin) as
SELECT
*,
&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;INTERVAL_START&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;::date as calendar_date,
extract(HOUR from &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;INTERVAL_START&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;) as hour_utc
FROM BURST(
     ON (select
        unique_flight_number,
        origin,
        flight_takeoff_datetime_utc,
        flight_landing_datetime_utc
        FROM temp.vw_airline_times_utc
)
     START_COLUMN('flight_takeoff_datetime_utc')
     END_COLUMN('flight_landing_datetime_utc')
     BURST_INTERVAL('3600')
);&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Execute query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;burst_query_string&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Since it might not be clear what I’m doing here, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;burst()&lt;/code&gt; function in Aster takes a row of data with a start and end timestamp, and (potentially) returns multiple rows which normalize the time between the timestamps. If you’re familiar with pandas in Python, it’s a similar functionality to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;resample&lt;/code&gt; on a series of timestamps.&lt;/p&gt;

&lt;h2 id=&quot;step-4-download-smaller-data-into-julia-visualize&quot;&gt;Step 4: Download Smaller Data Into Julia, Visualize&lt;/h2&gt;

&lt;p&gt;Now that the data has been processed from Hadoop to Aster through a series of queries, we now have a much smaller dataset that can be loaded into RAM and processed by Julia:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Calculate the number of flights per hour per day&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;flights_query&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;
select
calendar_date,
hour_utc,
sum(1) as num_flights
from temp.airline_burst_hour
group by 1,2
order by 1,2;&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Bring results into Julia DataFrame&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;flights_per_day&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flights_query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asterconn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Gadfly&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Create boxplot, with one box plot per hour&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;set_default_plot_size&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plot&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flights_per_day&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;hour_utc&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;num_flights&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Guide&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlabel&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Hour UTC&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Guide&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ylabel&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Flights In Air&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Guide&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;title&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Number of Flights In Air To/From U.S. By Hour - 1987&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Scale&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_continuous&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minvalue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxvalue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4000&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Geom&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;boxplot&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The Gadfly code above produces the following plot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/airline_plot.png&quot; alt=&quot;gadfly&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Since this chart is in UTC, it might not be obvious what the interpretation is of the trend. Because the airline dataset represents flights either leaving or returning to the United States, there are many fewer planes in the air overnight and the early morning hours (UTC 7-10, 2-5am Eastern). During the hours when the airports are open, there appears to be a limit of roughly 2500 planes per hour in the sky.&lt;/p&gt;

&lt;h2 id=&quot;why-not-do-all-of-this-in-julia&quot;&gt;Why Not Do All Of This In Julia?&lt;/h2&gt;

&lt;p&gt;At this point, you might be tempted to wonder why go through all of this effort? Couldn’t this all be done in Julia?&lt;/p&gt;

&lt;p&gt;Yes, you probably could do all of this work in Julia with a sufficiently large amount of RAM. As a proof-of-concept, I hope I’ve shown that there is much more to Julia than micro-benchmarking Julia’s speed relative to other scientific programming languages. You’ll notice that in none of my code have I used any type annotations, as none would really make sense (nor would they improve performance).  And although this is a toy example purposely using multiple systems, I much more frequently use Julia in this manner at work than doing linear algebra or machine learning.&lt;/p&gt;

&lt;p&gt;So next time you’re tempted to use Python or R or shell scripting or whatever, consider Julia as well. Julia is just as at-home as a scripting language as a scientific computing language.&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>Adding Line Numbers in IPython/Jupyter Notebooks</title>
        
          <description>&lt;p&gt;Lately, I’ve been using Jupyter Notebooks for all of my Python and Julia coding. The ability to develop and submit small snippets of code and create plots inline is just so useful that it has broken the stranglehold of using an IDE while I’m coding. However, the one thing that was missing for a smooth transition was line numbers in the cells; luckily, this can be achieved in two ways.&lt;/p&gt;

&lt;h2 id=&quot;keyboard-shortcut&quot;&gt;Keyboard Shortcut&lt;/h2&gt;

&lt;p&gt;The easiest way to add line numbers to a Jupyter Notebook is to use the keyboard shortcut, which is &lt;strong&gt;Ctrl-m&lt;/strong&gt; to enter Command Mode, then type&lt;strong&gt; L&lt;/strong&gt;. Just highlight the cell you are interested in adding line numbers to, then hit the keyboard shortcut to toggle the line numbers.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2013/11/ipython-notebook-line-numbers.png&quot; alt=&quot;ipython-notebook-line-numbers&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;add-line-numbers-to-all-cells-at-startup&quot;&gt;Add Line Numbers to All Cells at Startup&lt;/h2&gt;

&lt;p&gt;&lt;del&gt;While the keyboard shortcut is great for toggling line numbers on/off, I prefer having line numbers always on. Luckily, the IPython Dev folks on Twitter were kind enough to explain how to do this:&lt;/del&gt;&lt;/p&gt;

&lt;blockquote class=&quot;twitter-tweet&quot; lang=&quot;en&quot;&gt;
  &lt;p style=&quot;text-align: center;&quot;&gt;
    &lt;del&gt;&lt;a href=&quot;https://twitter.com/randyzwitch&quot;&gt;@randyzwitch&lt;/a&gt; add `IPython.Cell.options_default.cm_config.lineNumbers = true;` to your custom.js&lt;/del&gt;
  &lt;/p&gt;

  &lt;p style=&quot;text-align: center;&quot;&gt;
    &lt;del&gt;— IPython Dev (@IPythonDev) &lt;a href=&quot;https://twitter.com/IPythonDev/statuses/394906726828236800&quot;&gt;October 28, 2013&lt;/a&gt;&lt;/del&gt;
  &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;del&gt;I use OSX with the default ‘profile_default’ profile, so the path for my custom.js file for IPython is:&lt;/del&gt;&lt;/p&gt;

&lt;pre&gt;&lt;del&gt;/Users/randyzwitch/.ipython/profile_default/static/custom/&lt;/del&gt;&lt;/pre&gt;

&lt;p&gt;&lt;del&gt;Similarly, you can do the same for IJulia:&lt;/del&gt;&lt;/p&gt;

&lt;pre&gt;&lt;del&gt;/Users/randyzwitch/.ipython/profile_julia/static/custom&lt;/del&gt;&lt;/pre&gt;

&lt;p&gt;&lt;del&gt;If you are using a different operating system than OSX, or you are using OSX and you don’t see a custom.js file in these locations, a quick search for custom.js will get you to the right file location. Once you open up the custom.js file, you can place the line of JavaScript anywhere in the file, as long as it’s not inside any of any pre-existing functions in the file.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;&lt;del&gt;Once you place the line of JavaScript in your file, you’ll need to restart IPython/IJulia completely for the change to take effect. After that, you’ll have line numbers in each cell, each Notebook!&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Edit 11/4/2015: Thanks to reader Nat Dunn, I’ve been made aware that the above method no longer works, which isn’t a surprise given the amount of changes between IPython Notebook to the entire Jupyter project in the past 2 years.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For the (currently) correct method of &lt;a href=&quot;https://www.webucator.com/blog/2015/11/show-line-numbers-by-default-in-ipython-notebook/&quot; target=&quot;_blank&quot;&gt;adding line numbers to Jupyter Notebook by default&lt;/a&gt;, please see &lt;a href=&quot;https://www.webucator.com/blog/2015/11/show-line-numbers-by-default-in-ipython-notebook/&quot; target=&quot;_blank&quot;&gt;Nat’s post&lt;/a&gt; with the correct instructions on modifying the custom.js file.&lt;/em&gt;&lt;/p&gt;
</description>
        
        <pubDate>Tue, 19 Nov 2013 08:48:18 +0000</pubDate>
        <link>
        http://randyzwitch.com/line-numbers-ipython-notebook/</link>
        <guid isPermaLink="true">http://randyzwitch.com/line-numbers-ipython-notebook/</guid>
        <content type="html" xml:base="/line-numbers-ipython-notebook/">&lt;p&gt;Lately, I’ve been using Jupyter Notebooks for all of my Python and Julia coding. The ability to develop and submit small snippets of code and create plots inline is just so useful that it has broken the stranglehold of using an IDE while I’m coding. However, the one thing that was missing for a smooth transition was line numbers in the cells; luckily, this can be achieved in two ways.&lt;/p&gt;

&lt;h2 id=&quot;keyboard-shortcut&quot;&gt;Keyboard Shortcut&lt;/h2&gt;

&lt;p&gt;The easiest way to add line numbers to a Jupyter Notebook is to use the keyboard shortcut, which is &lt;strong&gt;Ctrl-m&lt;/strong&gt; to enter Command Mode, then type&lt;strong&gt; L&lt;/strong&gt;. Just highlight the cell you are interested in adding line numbers to, then hit the keyboard shortcut to toggle the line numbers.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2013/11/ipython-notebook-line-numbers.png&quot; alt=&quot;ipython-notebook-line-numbers&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;add-line-numbers-to-all-cells-at-startup&quot;&gt;Add Line Numbers to All Cells at Startup&lt;/h2&gt;

&lt;p&gt;&lt;del&gt;While the keyboard shortcut is great for toggling line numbers on/off, I prefer having line numbers always on. Luckily, the IPython Dev folks on Twitter were kind enough to explain how to do this:&lt;/del&gt;&lt;/p&gt;

&lt;blockquote class=&quot;twitter-tweet&quot; lang=&quot;en&quot;&gt;
  &lt;p style=&quot;text-align: center;&quot;&gt;
    &lt;del&gt;&lt;a href=&quot;https://twitter.com/randyzwitch&quot;&gt;@randyzwitch&lt;/a&gt; add `IPython.Cell.options_default.cm_config.lineNumbers = true;` to your custom.js&lt;/del&gt;
  &lt;/p&gt;

  &lt;p style=&quot;text-align: center;&quot;&gt;
    &lt;del&gt;— IPython Dev (@IPythonDev) &lt;a href=&quot;https://twitter.com/IPythonDev/statuses/394906726828236800&quot;&gt;October 28, 2013&lt;/a&gt;&lt;/del&gt;
  &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;del&gt;I use OSX with the default ‘profile_default’ profile, so the path for my custom.js file for IPython is:&lt;/del&gt;&lt;/p&gt;

&lt;pre&gt;&lt;del&gt;/Users/randyzwitch/.ipython/profile_default/static/custom/&lt;/del&gt;&lt;/pre&gt;

&lt;p&gt;&lt;del&gt;Similarly, you can do the same for IJulia:&lt;/del&gt;&lt;/p&gt;

&lt;pre&gt;&lt;del&gt;/Users/randyzwitch/.ipython/profile_julia/static/custom&lt;/del&gt;&lt;/pre&gt;

&lt;p&gt;&lt;del&gt;If you are using a different operating system than OSX, or you are using OSX and you don’t see a custom.js file in these locations, a quick search for custom.js will get you to the right file location. Once you open up the custom.js file, you can place the line of JavaScript anywhere in the file, as long as it’s not inside any of any pre-existing functions in the file.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;&lt;del&gt;Once you place the line of JavaScript in your file, you’ll need to restart IPython/IJulia completely for the change to take effect. After that, you’ll have line numbers in each cell, each Notebook!&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Edit 11/4/2015: Thanks to reader Nat Dunn, I’ve been made aware that the above method no longer works, which isn’t a surprise given the amount of changes between IPython Notebook to the entire Jupyter project in the past 2 years.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For the (currently) correct method of &lt;a href=&quot;https://www.webucator.com/blog/2015/11/show-line-numbers-by-default-in-ipython-notebook/&quot; target=&quot;_blank&quot;&gt;adding line numbers to Jupyter Notebook by default&lt;/a&gt;, please see &lt;a href=&quot;https://www.webucator.com/blog/2015/11/show-line-numbers-by-default-in-ipython-notebook/&quot; target=&quot;_blank&quot;&gt;Nat’s post&lt;/a&gt; with the correct instructions on modifying the custom.js file.&lt;/em&gt;&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>Fun With Just-In-Time Compiling: Julia, Python, R and pqR</title>
        
          <description>&lt;p&gt;Recently I’ve been spending a lot of time trying to learn &lt;a title=&quot;Julia language&quot; href=&quot;http://julialang.org/&quot; target=&quot;_blank&quot;&gt;Julia&lt;/a&gt; by doing the problems at &lt;a title=&quot;Project Euler&quot; href=&quot;http://projecteuler.net/&quot; target=&quot;_blank&quot;&gt;Project Euler&lt;/a&gt;. What’s great about these problems is that it gets me out of my normal design patterns, since I don’t generally think about prime numbers, factorials and other number theory problems during my normal workday. These problems have also given me the opportunity to really think about how computers work, since Julia allows the programmer to pass type declarations to the just-in-time compiler (JIT).&lt;/p&gt;

&lt;p&gt;As I’ve been working on optimizing my Julia code, I decided to figure out how fast this problem can be solved using any of the languages/techniques I know. So I decided to benchmark one of the Project Euler problems using &lt;a title=&quot;Julia language&quot; href=&quot;http://julialang.org/&quot; target=&quot;_blank&quot;&gt;Julia&lt;/a&gt;, &lt;a title=&quot;Python language&quot; href=&quot;http://python.org/&quot; target=&quot;_blank&quot;&gt;Python&lt;/a&gt;, &lt;a title=&quot;Numba&quot; href=&quot;http://numba.pydata.org/&quot; target=&quot;_blank&quot;&gt;Python with Numba&lt;/a&gt;, &lt;a title=&quot;Pypy&quot; href=&quot;http://pypy.org/&quot; target=&quot;_blank&quot;&gt;PyPy&lt;/a&gt;, &lt;a title=&quot;R&quot; href=&quot;http://cran.us.r-project.org/&quot; target=&quot;_blank&quot;&gt;R&lt;/a&gt;, R using the &lt;a title=&quot;R compiler&quot; href=&quot;http://stat.ethz.ch/R-manual/R-devel/library/compiler/html/compile.html&quot; target=&quot;_blank&quot;&gt;compiler&lt;/a&gt; package, &lt;a title=&quot;pqR&quot; href=&quot;http://radfordneal.wordpress.com/2013/06/22/announcing-pqr-a-faster-version-of-r/&quot; target=&quot;_blank&quot;&gt;pqR&lt;/a&gt; and pqR using the compiler package. Here’s what I found…&lt;/p&gt;

&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;

&lt;p&gt;The problem I’m using for the benchmark is calculating the smallest number that is divisible by all of the numbers in a factorial. For example, for the numbers in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;5!&lt;/code&gt;, 60 is the smallest number that is divisible by 2, 3, 4 and 5. Here’s the Julia code:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; smallestdivisall&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int64&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;factorial&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;break&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;elseif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;All code versions follow this same pattern: the outside loop will run from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt; up to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n!&lt;/code&gt;, since by definition the last value in the loop will be divisible by all of the numbers in the factorial. The inner loops go through and do a modulo calculation, checking to see if there is a remainder after division. If there is a remainder, break out of the loop and move to the next number. Once the state occurs where there is no remainder on the modulo calculation and the inner loop value of j equals the last number in the factorial (i.e. it is divisible by all of the factorial numbers), we have found the minimum number.&lt;/p&gt;

&lt;h2 id=&quot;benchmarking---overall&quot;&gt;Benchmarking - Overall&lt;/h2&gt;

&lt;p&gt;Here are the results of the eight permutations of languages/techniques (see &lt;a title=&quot;GitHub Gist for JIT test&quot; href=&quot;https://gist.github.com/randyzwitch/6341926&quot; target=&quot;_blank&quot;&gt;this&lt;/a&gt; GitHub Gist for the actual code used, &lt;a title=&quot;compiler results&quot; href=&quot;http://randyzwitch.com/wp-content/uploads/2013/09/jit.csv&quot; target=&quot;_blank&quot;&gt;this link&lt;/a&gt; for results file, and &lt;a title=&quot;ggplot2 code&quot; href=&quot;https://gist.github.com/randyzwitch/6414244&quot; target=&quot;_blank&quot;&gt;this&lt;/a&gt; GitHub Gist for the ggplot2 code):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2013/08/jit-comparison.png&quot; alt=&quot;jit-comparison&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Across the range of tests from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;5!&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;20!&lt;/code&gt;, Julia is the fastest to find the minimum number. Python with Numba is second and PyPy is third. pqR fares better than R in general, but using the compiler package can narrow the gap.&lt;/p&gt;

&lt;p&gt;To make more useful comparisons, in the next section I’ll compare each language to its “compiled” function state.&lt;/p&gt;

&lt;h2 id=&quot;benchmarking---individual&quot;&gt;Benchmarking - Individual&lt;/h2&gt;

&lt;h3 id=&quot;python&quot;&gt;Python&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2013/09/JITpython-e1378131849775.png&quot; alt=&quot;JITpython&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Amongst the native Python code options, I saw a 16x speedup by using PyPy instead of Python 2.7.6 (10.62s vs. 172.06s at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;20!&lt;/code&gt;). Using Numba with Python instead of PyPy nets an &lt;em&gt;incremental&lt;/em&gt; ~40% speedup using the &lt;a title=&quot;autojit example&quot; href=&quot;http://numba.pydata.org/&quot; target=&quot;_blank&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@autojit&lt;/code&gt;&lt;/a&gt; decorator (7.63s vs. 10.63 at 20!).&lt;/p&gt;

&lt;p&gt;So in the case of Python, using two lines of code with the Numba JIT compiler you can get substantial improvements in performance without needing to do any code re-writes. This is a great benefit given that you can stay in native Python, since PyPy doesn’t support all existing packages within the Python ecosystem.&lt;/p&gt;

&lt;h3 id=&quot;rpqr&quot;&gt;R/pqR&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2013/09/JITr-e1378132951124.png&quot; alt=&quot;JITr&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It’s understood in the R community that &lt;a title=&quot;Why are R loops slow?&quot; href=&quot;http://stackoverflow.com/questions/7142767/why-are-loops-slow-in-r&quot; target=&quot;_blank&quot;&gt;loops are not a strong point&lt;/a&gt; of the language. In the case of this problem, I decided to use loops because 1) it keeps the code pattern similar across languages and 2) I hoped I’d see the max benefit from the compiler package by not trying any funky R optimizations up front.&lt;/p&gt;

&lt;p&gt;As expected, pqR is generally faster than R and using the compiler package is faster than not using the compiler. I saw ~30% improvement using pqR relative to R and ~20% &lt;em&gt;incremental&lt;/em&gt; improvement using the compiler package with pqR. Using the compiler package within R showed ~35% improvement.&lt;/p&gt;

&lt;p&gt;So unlike the case with Python, where you could just use Python with Numba and stay within the same language/environment, if you can use pqR &lt;em&gt;and&lt;/em&gt; the compiler package, you can get a performance benefit from using both.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;For a comparison like I’ve done above, it’s easy to get carried away and extrapolate the results from one simple test to all programming problems ever. “&lt;em&gt;Julia is the best language for all cases ever!!!11111eleventy!&lt;/em&gt;” would be easy to proclaim, but all problems aren’t looping problems using simple division. Once you get into writing longer programs, other tasks such string manipulation and accessing APIs, using a technique from a package only available in one ecosystem but not another, etc., which tool is “best” for solving a problem becomes a much more difficult decision. The only way to know how much improvement you can see from different techniques &amp;amp; tools is to profile your program(s) and experiment.&lt;/p&gt;

&lt;p&gt;The main thing that I took away from this exercise is that no matter which tool you are comfortable with to do analysis, there are potentially large performance improvements that can be made &lt;em&gt;just&lt;/em&gt; by using a JIT without needing to dramatically re-write your code. For those of us who don’t know C (and/or are too lazy to re-write our code several times to wring out a little extra performance), that’s a great thing.&lt;/p&gt;
</description>
        
        <pubDate>Mon, 02 Sep 2013 19:57:45 +0000</pubDate>
        <link>
        http://randyzwitch.com/python-pypy-julia-r-pqr-jit-just-in-time-compiler/</link>
        <guid isPermaLink="true">http://randyzwitch.com/python-pypy-julia-r-pqr-jit-just-in-time-compiler/</guid>
        <content type="html" xml:base="/python-pypy-julia-r-pqr-jit-just-in-time-compiler/">&lt;p&gt;Recently I’ve been spending a lot of time trying to learn &lt;a title=&quot;Julia language&quot; href=&quot;http://julialang.org/&quot; target=&quot;_blank&quot;&gt;Julia&lt;/a&gt; by doing the problems at &lt;a title=&quot;Project Euler&quot; href=&quot;http://projecteuler.net/&quot; target=&quot;_blank&quot;&gt;Project Euler&lt;/a&gt;. What’s great about these problems is that it gets me out of my normal design patterns, since I don’t generally think about prime numbers, factorials and other number theory problems during my normal workday. These problems have also given me the opportunity to really think about how computers work, since Julia allows the programmer to pass type declarations to the just-in-time compiler (JIT).&lt;/p&gt;

&lt;p&gt;As I’ve been working on optimizing my Julia code, I decided to figure out how fast this problem can be solved using any of the languages/techniques I know. So I decided to benchmark one of the Project Euler problems using &lt;a title=&quot;Julia language&quot; href=&quot;http://julialang.org/&quot; target=&quot;_blank&quot;&gt;Julia&lt;/a&gt;, &lt;a title=&quot;Python language&quot; href=&quot;http://python.org/&quot; target=&quot;_blank&quot;&gt;Python&lt;/a&gt;, &lt;a title=&quot;Numba&quot; href=&quot;http://numba.pydata.org/&quot; target=&quot;_blank&quot;&gt;Python with Numba&lt;/a&gt;, &lt;a title=&quot;Pypy&quot; href=&quot;http://pypy.org/&quot; target=&quot;_blank&quot;&gt;PyPy&lt;/a&gt;, &lt;a title=&quot;R&quot; href=&quot;http://cran.us.r-project.org/&quot; target=&quot;_blank&quot;&gt;R&lt;/a&gt;, R using the &lt;a title=&quot;R compiler&quot; href=&quot;http://stat.ethz.ch/R-manual/R-devel/library/compiler/html/compile.html&quot; target=&quot;_blank&quot;&gt;compiler&lt;/a&gt; package, &lt;a title=&quot;pqR&quot; href=&quot;http://radfordneal.wordpress.com/2013/06/22/announcing-pqr-a-faster-version-of-r/&quot; target=&quot;_blank&quot;&gt;pqR&lt;/a&gt; and pqR using the compiler package. Here’s what I found…&lt;/p&gt;

&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;

&lt;p&gt;The problem I’m using for the benchmark is calculating the smallest number that is divisible by all of the numbers in a factorial. For example, for the numbers in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;5!&lt;/code&gt;, 60 is the smallest number that is divisible by 2, 3, 4 and 5. Here’s the Julia code:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; smallestdivisall&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int64&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;factorial&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;break&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;elseif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;All code versions follow this same pattern: the outside loop will run from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt; up to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n!&lt;/code&gt;, since by definition the last value in the loop will be divisible by all of the numbers in the factorial. The inner loops go through and do a modulo calculation, checking to see if there is a remainder after division. If there is a remainder, break out of the loop and move to the next number. Once the state occurs where there is no remainder on the modulo calculation and the inner loop value of j equals the last number in the factorial (i.e. it is divisible by all of the factorial numbers), we have found the minimum number.&lt;/p&gt;

&lt;h2 id=&quot;benchmarking---overall&quot;&gt;Benchmarking - Overall&lt;/h2&gt;

&lt;p&gt;Here are the results of the eight permutations of languages/techniques (see &lt;a title=&quot;GitHub Gist for JIT test&quot; href=&quot;https://gist.github.com/randyzwitch/6341926&quot; target=&quot;_blank&quot;&gt;this&lt;/a&gt; GitHub Gist for the actual code used, &lt;a title=&quot;compiler results&quot; href=&quot;http://randyzwitch.com/wp-content/uploads/2013/09/jit.csv&quot; target=&quot;_blank&quot;&gt;this link&lt;/a&gt; for results file, and &lt;a title=&quot;ggplot2 code&quot; href=&quot;https://gist.github.com/randyzwitch/6414244&quot; target=&quot;_blank&quot;&gt;this&lt;/a&gt; GitHub Gist for the ggplot2 code):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2013/08/jit-comparison.png&quot; alt=&quot;jit-comparison&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Across the range of tests from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;5!&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;20!&lt;/code&gt;, Julia is the fastest to find the minimum number. Python with Numba is second and PyPy is third. pqR fares better than R in general, but using the compiler package can narrow the gap.&lt;/p&gt;

&lt;p&gt;To make more useful comparisons, in the next section I’ll compare each language to its “compiled” function state.&lt;/p&gt;

&lt;h2 id=&quot;benchmarking---individual&quot;&gt;Benchmarking - Individual&lt;/h2&gt;

&lt;h3 id=&quot;python&quot;&gt;Python&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2013/09/JITpython-e1378131849775.png&quot; alt=&quot;JITpython&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Amongst the native Python code options, I saw a 16x speedup by using PyPy instead of Python 2.7.6 (10.62s vs. 172.06s at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;20!&lt;/code&gt;). Using Numba with Python instead of PyPy nets an &lt;em&gt;incremental&lt;/em&gt; ~40% speedup using the &lt;a title=&quot;autojit example&quot; href=&quot;http://numba.pydata.org/&quot; target=&quot;_blank&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@autojit&lt;/code&gt;&lt;/a&gt; decorator (7.63s vs. 10.63 at 20!).&lt;/p&gt;

&lt;p&gt;So in the case of Python, using two lines of code with the Numba JIT compiler you can get substantial improvements in performance without needing to do any code re-writes. This is a great benefit given that you can stay in native Python, since PyPy doesn’t support all existing packages within the Python ecosystem.&lt;/p&gt;

&lt;h3 id=&quot;rpqr&quot;&gt;R/pqR&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;/wp-content/uploads/2013/09/JITr-e1378132951124.png&quot; alt=&quot;JITr&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It’s understood in the R community that &lt;a title=&quot;Why are R loops slow?&quot; href=&quot;http://stackoverflow.com/questions/7142767/why-are-loops-slow-in-r&quot; target=&quot;_blank&quot;&gt;loops are not a strong point&lt;/a&gt; of the language. In the case of this problem, I decided to use loops because 1) it keeps the code pattern similar across languages and 2) I hoped I’d see the max benefit from the compiler package by not trying any funky R optimizations up front.&lt;/p&gt;

&lt;p&gt;As expected, pqR is generally faster than R and using the compiler package is faster than not using the compiler. I saw ~30% improvement using pqR relative to R and ~20% &lt;em&gt;incremental&lt;/em&gt; improvement using the compiler package with pqR. Using the compiler package within R showed ~35% improvement.&lt;/p&gt;

&lt;p&gt;So unlike the case with Python, where you could just use Python with Numba and stay within the same language/environment, if you can use pqR &lt;em&gt;and&lt;/em&gt; the compiler package, you can get a performance benefit from using both.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;For a comparison like I’ve done above, it’s easy to get carried away and extrapolate the results from one simple test to all programming problems ever. “&lt;em&gt;Julia is the best language for all cases ever!!!11111eleventy!&lt;/em&gt;” would be easy to proclaim, but all problems aren’t looping problems using simple division. Once you get into writing longer programs, other tasks such string manipulation and accessing APIs, using a technique from a package only available in one ecosystem but not another, etc., which tool is “best” for solving a problem becomes a much more difficult decision. The only way to know how much improvement you can see from different techniques &amp;amp; tools is to profile your program(s) and experiment.&lt;/p&gt;

&lt;p&gt;The main thing that I took away from this exercise is that no matter which tool you are comfortable with to do analysis, there are potentially large performance improvements that can be made &lt;em&gt;just&lt;/em&gt; by using a JIT without needing to dramatically re-write your code. For those of us who don’t know C (and/or are too lazy to re-write our code several times to wring out a little extra performance), that’s a great thing.&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      <item>
        <title>Tabular Data I/O in Julia</title>
        
          <description>&lt;p&gt;Importing tabular data into Julia can be done in (at least) three ways: reading a delimited file into an array, reading a delimited file into a DataFrame and accessing databases using ODBC.&lt;/p&gt;

&lt;h3 id=&quot;reading-a-file-into-an-array-using-readdlm&quot;&gt;Reading a file into an array using readdlm&lt;/h3&gt;

&lt;p&gt;The most basic way to read data into Julia is through the use of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;readdlm&lt;/code&gt; function, which will create an array:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;readdlm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Char&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Type&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If you are reading in a fairly normal delimited file, you can get away with just using the first two arguments, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;source&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delim&lt;/code&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readdlm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/airline/1987.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;','&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1311827&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Any&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;It’s important to note that by only specifying the first two arguments, you leave it up to Julia to determine the type of array to return. In the code example above, an array of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Any&lt;/code&gt; is returned, as the .csv file I read in was not of homogenous type such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Int64&lt;/code&gt; or &lt;del&gt;ASCII&lt;/del&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;String&lt;/code&gt;. If you know for certain which type of array you want, you specify the data type using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type&lt;/code&gt; argument:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readdlm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/airline/1987.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;','&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1311827&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;It’s probably the case that unless you are looking to do linear algebra or other specific mathy type work, you’ll likely find that reading your data into a DataFrame will be more comfortable to work with (especially if you are coming from an R, Python/pandas or even spreadsheet tradition).&lt;/p&gt;

&lt;p&gt;To write an array out to a file, you can use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;writedlm&lt;/code&gt; function (defaults to comma-separated):&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;writedlm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Char&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;reading-a-file-into-a-dataframe-using-readtable&quot;&gt;Reading a file into a DataFrame using readtable&lt;/h3&gt;

&lt;p&gt;As I covered in my prior blog post about Julia, you can also &lt;a title=&quot;Julia for Beginners&quot; href=&quot;http://randyzwitch.com/julia-language-beginners/&quot; target=&quot;_blank&quot;&gt;read in delimited files into Julia using the DataFrames package&lt;/a&gt;, which returns a DataFrame instead of an array. Besides just being able to read in delimited files, the DataFrames package also supports reading in gzippped files on the fly:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrames&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airline_df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readtable&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/airline/1987.csv.gz&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1311826&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;  &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;methods&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;see&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;constructors&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;From what I understand, in the future you will be able to read files directly from Amazon S3 into a DataFrame (this is already supported in the &lt;a title=&quot;Julia Amazon S3&quot; href=&quot;https://github.com/amitmurthy/AWS.jl&quot; target=&quot;_blank&quot;&gt;AWS package&lt;/a&gt;), but for now, the DataFrames package works only on local files. Writing a DataFrame to file can be done with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;writetable&lt;/code&gt; function:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;writetable&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;By default, the &lt;a title=&quot;Julia DataFrames&quot; href=&quot;http://juliastats.github.io/DataFrames.jl/io.html&quot; target=&quot;_blank&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;writetable&lt;/code&gt; function&lt;/a&gt; will use the delimiter specified by the filename extension and default to printing the column names as a header.&lt;/p&gt;

&lt;h3 id=&quot;accessing-databases-using-odbc&quot;&gt;Accessing Databases using ODBC&lt;/h3&gt;

&lt;p&gt;The third major way of importing tabular data into Julia is through the use of ODBC access to various databases such as MySQL and PostgreSQL.&lt;/p&gt;

&lt;h4 id=&quot;using-a-dsn&quot;&gt;Using a DSN&lt;/h4&gt;

&lt;p&gt;The &lt;a title=&quot;Julia ODBC package&quot; href=&quot;https://github.com/karbarcca/ODBC.jl&quot; target=&quot;_blank&quot;&gt;Julia ODBC package&lt;/a&gt; provides functionality to connect to a database using a Data Source Name (DSN). Assuming you store all the credentials in your DSN (server name, username, password, etc.), connecting to a database is as easy as:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;MySQL&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MySQL&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;successful&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Of course, if you don’t want to store your password in your DSN (especially in the case where there are multiple users for a computer), you can pass the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;usr&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pwd&lt;/code&gt; arguments to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ODBC.connect&lt;/code&gt; function:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dsn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h4 id=&quot;using-a-connection-string&quot;&gt;Using a connection string&lt;/h4&gt;

&lt;p&gt;Alternatively, you can build your own connection strings within a Julia session using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;advancedconnect&lt;/code&gt; function:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Amazon Redshift/Postgres connection string&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;red&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;advancedconnect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Driver={psqlODBC};ServerName=reporting.XXXXX.us-east-1.redshift.amazonaws.com;Username=XXXX;Password=XXXX;Database=XXXX;Port=XXXX&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Driver&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;psqlODBC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;};&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ServerName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reporting&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XXXXX&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;us&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;east&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;redshift&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;amazonaws&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;com&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Username&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XXXX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Password&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XXXX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XXXX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Port&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XXXX&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;successful&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#MySQL connection string&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;advancedconnect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Driver={MySQL};user=root;server=localhost;database=airline;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Driver&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MySQL&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;};&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;user&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;root&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;server&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;localhost&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;successful&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Regardless of which way you connect, you can query data using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; function. If you want your output as a DataFrame, you can assign the result of the function to an object. If you want to save the results to a file, you specify the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;file&lt;/code&gt; argument:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;MySQL&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MySQL&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;successful&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Save query results into a DataFrame called 'results'&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;results&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Select * from a1987;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;results&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;  &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;methods&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;see&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;constructors&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Save query results to a file, tab-delimited (default)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Select * from a1987;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;output.tab&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delim&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\t'&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;summary&quot;&gt;Summary&lt;/h3&gt;

&lt;p&gt;Overall, importing data into Julia is no easier/more difficult than any other language. The biggest thing I’ve noticed thus far is that Julia is a bit less efficient than Python/pandas or R in terms of the amount of RAM needed to store data. In my experience, this is really only an issue once you are working with 1GB+ files (of course, depending on the resources available to you on your machine).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Edit 3/25/2016: A much more up-to-date method of &lt;a href=&quot;https://cbrownley.wordpress.com/2015/05/29/reading_writing_csv_with_r_python_julia/&quot; target=&quot;_blank&quot;&gt;reading CSV data into Julia&lt;/a&gt; can be found at this &lt;a href=&quot;https://cbrownley.wordpress.com/2015/05/29/reading_writing_csv_with_r_python_julia/&quot; target=&quot;_blank&quot;&gt;blog post by Clinton Brownley&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
</description>
        
        <pubDate>Tue, 06 Aug 2013 10:05:38 +0000</pubDate>
        <link>
        http://randyzwitch.com/julia-import-data/</link>
        <guid isPermaLink="true">http://randyzwitch.com/julia-import-data/</guid>
        <content type="html" xml:base="/julia-import-data/">&lt;p&gt;Importing tabular data into Julia can be done in (at least) three ways: reading a delimited file into an array, reading a delimited file into a DataFrame and accessing databases using ODBC.&lt;/p&gt;

&lt;h3 id=&quot;reading-a-file-into-an-array-using-readdlm&quot;&gt;Reading a file into an array using readdlm&lt;/h3&gt;

&lt;p&gt;The most basic way to read data into Julia is through the use of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;readdlm&lt;/code&gt; function, which will create an array:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;readdlm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Char&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Type&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If you are reading in a fairly normal delimited file, you can get away with just using the first two arguments, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;source&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delim&lt;/code&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readdlm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/airline/1987.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;','&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1311827&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Any&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;It’s important to note that by only specifying the first two arguments, you leave it up to Julia to determine the type of array to return. In the code example above, an array of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Any&lt;/code&gt; is returned, as the .csv file I read in was not of homogenous type such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Int64&lt;/code&gt; or &lt;del&gt;ASCII&lt;/del&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;String&lt;/code&gt;. If you know for certain which type of array you want, you specify the data type using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type&lt;/code&gt; argument:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readdlm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/airline/1987.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;','&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1311827&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;It’s probably the case that unless you are looking to do linear algebra or other specific mathy type work, you’ll likely find that reading your data into a DataFrame will be more comfortable to work with (especially if you are coming from an R, Python/pandas or even spreadsheet tradition).&lt;/p&gt;

&lt;p&gt;To write an array out to a file, you can use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;writedlm&lt;/code&gt; function (defaults to comma-separated):&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;writedlm&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Char&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;reading-a-file-into-a-dataframe-using-readtable&quot;&gt;Reading a file into a DataFrame using readtable&lt;/h3&gt;

&lt;p&gt;As I covered in my prior blog post about Julia, you can also &lt;a title=&quot;Julia for Beginners&quot; href=&quot;http://randyzwitch.com/julia-language-beginners/&quot; target=&quot;_blank&quot;&gt;read in delimited files into Julia using the DataFrames package&lt;/a&gt;, which returns a DataFrame instead of an array. Besides just being able to read in delimited files, the DataFrames package also supports reading in gzippped files on the fly:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrames&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airline_df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readtable&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/airline/1987.csv.gz&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1311826&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline_df&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;  &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;methods&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;see&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;constructors&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;From what I understand, in the future you will be able to read files directly from Amazon S3 into a DataFrame (this is already supported in the &lt;a title=&quot;Julia Amazon S3&quot; href=&quot;https://github.com/amitmurthy/AWS.jl&quot; target=&quot;_blank&quot;&gt;AWS package&lt;/a&gt;), but for now, the DataFrames package works only on local files. Writing a DataFrame to file can be done with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;writetable&lt;/code&gt; function:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;writetable&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;By default, the &lt;a title=&quot;Julia DataFrames&quot; href=&quot;http://juliastats.github.io/DataFrames.jl/io.html&quot; target=&quot;_blank&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;writetable&lt;/code&gt; function&lt;/a&gt; will use the delimiter specified by the filename extension and default to printing the column names as a header.&lt;/p&gt;

&lt;h3 id=&quot;accessing-databases-using-odbc&quot;&gt;Accessing Databases using ODBC&lt;/h3&gt;

&lt;p&gt;The third major way of importing tabular data into Julia is through the use of ODBC access to various databases such as MySQL and PostgreSQL.&lt;/p&gt;

&lt;h4 id=&quot;using-a-dsn&quot;&gt;Using a DSN&lt;/h4&gt;

&lt;p&gt;The &lt;a title=&quot;Julia ODBC package&quot; href=&quot;https://github.com/karbarcca/ODBC.jl&quot; target=&quot;_blank&quot;&gt;Julia ODBC package&lt;/a&gt; provides functionality to connect to a database using a Data Source Name (DSN). Assuming you store all the credentials in your DSN (server name, username, password, etc.), connecting to a database is as easy as:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;MySQL&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MySQL&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;successful&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Of course, if you don’t want to store your password in your DSN (especially in the case where there are multiple users for a computer), you can pass the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;usr&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pwd&lt;/code&gt; arguments to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ODBC.connect&lt;/code&gt; function:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dsn&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h4 id=&quot;using-a-connection-string&quot;&gt;Using a connection string&lt;/h4&gt;

&lt;p&gt;Alternatively, you can build your own connection strings within a Julia session using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;advancedconnect&lt;/code&gt; function:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Amazon Redshift/Postgres connection string&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;red&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;advancedconnect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Driver={psqlODBC};ServerName=reporting.XXXXX.us-east-1.redshift.amazonaws.com;Username=XXXX;Password=XXXX;Database=XXXX;Port=XXXX&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Driver&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;psqlODBC&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;};&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ServerName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reporting&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XXXXX&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;us&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;east&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;redshift&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;amazonaws&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;com&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Username&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XXXX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Password&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XXXX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XXXX&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Port&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XXXX&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;successful&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#MySQL connection string&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;advancedconnect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Driver={MySQL};user=root;server=localhost;database=airline;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Driver&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MySQL&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;};&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;user&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;root&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;server&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;localhost&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;successful&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Regardless of which way you connect, you can query data using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; function. If you want your output as a DataFrame, you can assign the result of the function to an object. If you want to save the results to a file, you specify the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;file&lt;/code&gt; argument:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ODBC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;MySQL&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MySQL&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;successful&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Save query results into a DataFrame called 'results'&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;results&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Select * from a1987;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;results&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;  &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;methods&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;see&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;constructors&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Save query results to a file, tab-delimited (default)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;julia&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Select * from a1987;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;output.tab&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delim&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\t'&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;summary&quot;&gt;Summary&lt;/h3&gt;

&lt;p&gt;Overall, importing data into Julia is no easier/more difficult than any other language. The biggest thing I’ve noticed thus far is that Julia is a bit less efficient than Python/pandas or R in terms of the amount of RAM needed to store data. In my experience, this is really only an issue once you are working with 1GB+ files (of course, depending on the resources available to you on your machine).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Edit 3/25/2016: A much more up-to-date method of &lt;a href=&quot;https://cbrownley.wordpress.com/2015/05/29/reading_writing_csv_with_r_python_julia/&quot; target=&quot;_blank&quot;&gt;reading CSV data into Julia&lt;/a&gt; can be found at this &lt;a href=&quot;https://cbrownley.wordpress.com/2015/05/29/reading_writing_csv_with_r_python_julia/&quot; target=&quot;_blank&quot;&gt;blog post by Clinton Brownley&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      <item>
        <title>A Beginner's Look at Julia</title>
        
          <description>&lt;p&gt;Over the past month or so, I’ve been playing with a new scientific programming language called ‘&lt;a title=&quot;Julia language&quot; href=&quot;http://julialang.org/&quot; target=&quot;_blank&quot;&gt;Julia&lt;/a&gt;’, which aims to be a high-level language with performance approaching that of C. With that goal in mind, Julia could be a replacement for the ‘multi-language’ problem of needing to move between R, Python, MATLAB, C, Fortran, Scala, etc. within a single scientific programming project.  Here are some observations that might be helpful for others looking to get started with Julia.&lt;/p&gt;

&lt;h3 id=&quot;get-used-to-git-and-make&quot;&gt;Get used to ‘Git’ and ‘make’&lt;/h3&gt;

&lt;p&gt;While there are &lt;a title=&quot;Julia language downloads&quot; href=&quot;http://julialang.org/downloads/&quot; target=&quot;_blank&quot;&gt;pre-built binaries&lt;/a&gt; for Julia, due to the rapid pace of development, it’s best to build Julia from source. To be able to keep up with the literally dozen code changes per day, you can clone the &lt;a title=&quot;Julia GitHub repo&quot; href=&quot;https://github.com/JuliaLang/julia&quot; target=&quot;_blank&quot;&gt;Julia GitHub repository&lt;/a&gt; to your local machine. If you use one of the &lt;a title=&quot;GitHub GUI downloads&quot; href=&quot;http://git-scm.com/downloads/guis&quot; target=&quot;_blank&quot;&gt;GitHub GUI’s&lt;/a&gt;, this is as easy as hitting the ‘Sync Branch’ button to receive all of the newest code updates.&lt;/p&gt;

&lt;p&gt;To install Julia, you need to compile the code. The instructions for each supported operating system are listed on the &lt;a title=&quot;Julia GitHub repo&quot; href=&quot;https://github.com/JuliaLang/julia&quot; target=&quot;_blank&quot;&gt;Julia GitHub page&lt;/a&gt;. For Mac users, use Terminal to navigate to the directory where you cloned Julia, then run the following command, where ‘n’ refers to the number of concurrent processes you want the compiler to use:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;make&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; 
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I use 8 concurrent processes on a 2013 MacBook Pro and it works pretty well. Certainly much faster than a single process. Note that the first time you run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;make&lt;/code&gt; command, the build process will take much longer than successive builds, as Julia downloads all the required libraries needed. After the first build, you can just run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;make&lt;/code&gt; command with a single process, as the code updates don’t take very long to build.&lt;/p&gt;

&lt;p&gt;Package management is also done via GitHub. To add &lt;a title=&quot;Julia packages&quot; href=&quot;http://pkg.julialang.org/&quot; target=&quot;_blank&quot;&gt;Julia packages&lt;/a&gt; to your install, you use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Pkg.add()&lt;/code&gt; function, with the package name in double-quotes.&lt;/p&gt;

&lt;h3 id=&quot;julia-code-feels-very-familiar&quot;&gt;Julia code feels very familiar&lt;/h3&gt;

&lt;h4 id=&quot;text-file-import&quot;&gt;Text file import&lt;/h4&gt;

&lt;p&gt;Although the &lt;a title=&quot;Julia documentation&quot; href=&quot;http://docs.julialang.org/en/latest/manual/introduction.html#man-introduction-1&quot; target=&quot;_blank&quot;&gt;Julia documentation&lt;/a&gt; makes numerous references to MATLAB in terms of code similarity, Julia feels very familiar to me as an R and Python user. Take reading a .csv file into a dataframe and finding the dimensions of the resulting object&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#R: Read in 1987.csv from airline dataset into a dataframe&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#No import statement needed to create a dataframe in R&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;csv&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;~/airline/1987.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1311826&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Python: use pandas to create a dataframe&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read_csv&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/airline/1987.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Out&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1311826&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Julia: use DataFrames to create a dataframe&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrames&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readtable&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/airline/1987.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1311826&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In each language, the basic syntax is to call a ‘read’ function, specify the .csv filename, then the defaults of the function read in a basic file. I also could’ve specified other keyword arguments, but for purposes of this example I kept it simple.&lt;/p&gt;

&lt;h4 id=&quot;looping&quot;&gt;Looping&lt;/h4&gt;

&lt;p&gt;Looping in Julia is similar to other languages. Python requires proper spacing for each level of a loop, with a colon for each evaluated expression. And although you generally don’t use many loops in R, to do so requires using parenthesis and brackets.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Python looping to create a term-frequency dictionary&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collections&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Counter&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;term_freq&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Counter&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;english_dictionary&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url_list&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url_list&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;term_freq&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Julia looping to create a term-frequency dictionary&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;term_freq&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int64&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;english_dictionary&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url_list&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;term_freq&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;term_freq&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If you’re coming from a Python background, you can see that there’s not a ton of difference between Python looping into a dictionary vs. Julia. The biggest differences are the use of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;end&lt;/code&gt; control-flow word and that Julia doesn’t currently have the convenience “Counter” object type. R doesn’t natively have a dictionary type, but you can add a similar concept using the &lt;a title=&quot;CRAN hash package&quot; href=&quot;http://cran.r-project.org/web/packages/hash/&quot; target=&quot;_blank&quot;&gt;hash&lt;/a&gt; package.&lt;/p&gt;

&lt;h4 id=&quot;vectorization&quot;&gt;Vectorization&lt;/h4&gt;

&lt;p&gt;While not required to achieve high performance, Julia also provides the &lt;a title=&quot;Is looping as a programming construct bad?&quot; href=&quot;http://slendrmeans.wordpress.com/2013/05/11/julia-loops/&quot; target=&quot;_blank&quot;&gt;functional programming construct of vectorization and list comprehensions&lt;/a&gt;. In R, you use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*apply&lt;/code&gt; family of functions instead of loops in order to &lt;a title=&quot;Functional programming in R&quot; href=&quot;https://github.com/hadley/devtools/wiki/Functional-programming&quot; target=&quot;_blank&quot;&gt;apply a function to multiple elements in a list&lt;/a&gt;. In Python, there are the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;map&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;reduce&lt;/code&gt; functions, but there is also the concept of list comprehensions. In Julia, both of the aforementioned functionalities are possible.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Cube every number from 1 to 100&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Python map function&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cubes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lambda&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Python list comprehension&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cubes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)]&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#R sapply function&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cubes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sapply&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seq&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Julia map function&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cubes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Julia list comprehension&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cubes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]]&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In each case, the syntax is &lt;em&gt;just about&lt;/em&gt; the same to apply a function across a list/array of numbers.&lt;/p&gt;

&lt;h3 id=&quot;a-small-but-intense-community&quot;&gt;A small, but intense community&lt;/h3&gt;

&lt;p&gt;One thing that’s important to note about Julia at this stage is that it’s very early. If you’re going to be messing around with Julia, there’s going to be a lot of alone-time experimenting and reading the &lt;a title=&quot;Julia documentation&quot; href=&quot;http://docs.julialang.org/en/latest/&quot; target=&quot;_blank&quot;&gt;Julia documentation&lt;/a&gt;. There are also several other resources including a &lt;a title=&quot;Julia users Google group&quot; href=&quot;https://groups.google.com/forum/?fromgroups=#!forum/julia-users&quot; target=&quot;_blank&quot;&gt;Julia-Users Google group&lt;/a&gt;, &lt;a title=&quot;Julia for R programmers&quot; href=&quot;http://www.stat.wisc.edu/~bates/JuliaForRProgrammers.pdf&quot; target=&quot;_blank&quot;&gt;Julia for R programmers&lt;/a&gt;, individual discussions on GitHub in the ‘Issues’ section of each Julia package, and a few tutorials floating around (&lt;a title=&quot;Julia tutorials&quot; href=&quot;http://forio.com/julia/tutorials-list&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt; and &lt;a title=&quot;Julia meta tutorial&quot; href=&quot;http://datacommunitydc.org/blog/2013/07/a-julia-meta-tutorial/&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Beyond just the written examples though, I’ve found that the budding Julia community is very helpful and willing in terms of answering questions. I’ve been bugging the hell out of &lt;a title=&quot;John Myles White&quot; href=&quot;http://www.johnmyleswhite.com/&quot; target=&quot;_blank&quot;&gt;John Myles White&lt;/a&gt; and he hasn’t complained (yet!), and even when code issues are raised through the users group or on GitHub, ultimately everyone has been very respectful and eager to help. So don’t be intimidated by the fact that Julia has a very MIT and Ph.D-ness to it…jump right in and migrate some of your favorite code over from other languages.&lt;/p&gt;

&lt;p&gt;While I haven’t moved to using Julia for my everyday workload, I am getting facility to the point where I’m starting to consider using Julia for selected projects. Once the language matures a bit more, &lt;del&gt;&lt;a title=&quot;Julia Studio&quot; href=&quot;http://forio.com/julia/&quot; target=&quot;_blank&quot;&gt;JuliaStudio&lt;/a&gt; starts to approach &lt;a title=&quot;RStudio&quot; href=&quot;http://www.rstudio.com/&quot; target=&quot;_blank&quot;&gt;RStudio&lt;/a&gt; in terms of functionality&lt;/del&gt;, and I get more familiar with the language in general, I can see Julia taking over for at least one if not all of my scientific programming languages.&lt;/p&gt;
</description>
        
        <pubDate>Tue, 23 Jul 2013 12:16:34 +0000</pubDate>
        <link>
        http://randyzwitch.com/julia-language-beginners/</link>
        <guid isPermaLink="true">http://randyzwitch.com/julia-language-beginners/</guid>
        <content type="html" xml:base="/julia-language-beginners/">&lt;p&gt;Over the past month or so, I’ve been playing with a new scientific programming language called ‘&lt;a title=&quot;Julia language&quot; href=&quot;http://julialang.org/&quot; target=&quot;_blank&quot;&gt;Julia&lt;/a&gt;’, which aims to be a high-level language with performance approaching that of C. With that goal in mind, Julia could be a replacement for the ‘multi-language’ problem of needing to move between R, Python, MATLAB, C, Fortran, Scala, etc. within a single scientific programming project.  Here are some observations that might be helpful for others looking to get started with Julia.&lt;/p&gt;

&lt;h3 id=&quot;get-used-to-git-and-make&quot;&gt;Get used to ‘Git’ and ‘make’&lt;/h3&gt;

&lt;p&gt;While there are &lt;a title=&quot;Julia language downloads&quot; href=&quot;http://julialang.org/downloads/&quot; target=&quot;_blank&quot;&gt;pre-built binaries&lt;/a&gt; for Julia, due to the rapid pace of development, it’s best to build Julia from source. To be able to keep up with the literally dozen code changes per day, you can clone the &lt;a title=&quot;Julia GitHub repo&quot; href=&quot;https://github.com/JuliaLang/julia&quot; target=&quot;_blank&quot;&gt;Julia GitHub repository&lt;/a&gt; to your local machine. If you use one of the &lt;a title=&quot;GitHub GUI downloads&quot; href=&quot;http://git-scm.com/downloads/guis&quot; target=&quot;_blank&quot;&gt;GitHub GUI’s&lt;/a&gt;, this is as easy as hitting the ‘Sync Branch’ button to receive all of the newest code updates.&lt;/p&gt;

&lt;p&gt;To install Julia, you need to compile the code. The instructions for each supported operating system are listed on the &lt;a title=&quot;Julia GitHub repo&quot; href=&quot;https://github.com/JuliaLang/julia&quot; target=&quot;_blank&quot;&gt;Julia GitHub page&lt;/a&gt;. For Mac users, use Terminal to navigate to the directory where you cloned Julia, then run the following command, where ‘n’ refers to the number of concurrent processes you want the compiler to use:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;make&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; 
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I use 8 concurrent processes on a 2013 MacBook Pro and it works pretty well. Certainly much faster than a single process. Note that the first time you run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;make&lt;/code&gt; command, the build process will take much longer than successive builds, as Julia downloads all the required libraries needed. After the first build, you can just run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;make&lt;/code&gt; command with a single process, as the code updates don’t take very long to build.&lt;/p&gt;

&lt;p&gt;Package management is also done via GitHub. To add &lt;a title=&quot;Julia packages&quot; href=&quot;http://pkg.julialang.org/&quot; target=&quot;_blank&quot;&gt;Julia packages&lt;/a&gt; to your install, you use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Pkg.add()&lt;/code&gt; function, with the package name in double-quotes.&lt;/p&gt;

&lt;h3 id=&quot;julia-code-feels-very-familiar&quot;&gt;Julia code feels very familiar&lt;/h3&gt;

&lt;h4 id=&quot;text-file-import&quot;&gt;Text file import&lt;/h4&gt;

&lt;p&gt;Although the &lt;a title=&quot;Julia documentation&quot; href=&quot;http://docs.julialang.org/en/latest/manual/introduction.html#man-introduction-1&quot; target=&quot;_blank&quot;&gt;Julia documentation&lt;/a&gt; makes numerous references to MATLAB in terms of code similarity, Julia feels very familiar to me as an R and Python user. Take reading a .csv file into a dataframe and finding the dimensions of the resulting object&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#R: Read in 1987.csv from airline dataset into a dataframe&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#No import statement needed to create a dataframe in R&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;csv&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;~/airline/1987.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1311826&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Python: use pandas to create a dataframe&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read_csv&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/airline/1987.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Out&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1311826&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Julia: use DataFrames to create a dataframe&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFrames&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readtable&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/Users/randyzwitch/airline/1987.csv&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;airline1987&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1311826&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In each language, the basic syntax is to call a ‘read’ function, specify the .csv filename, then the defaults of the function read in a basic file. I also could’ve specified other keyword arguments, but for purposes of this example I kept it simple.&lt;/p&gt;

&lt;h4 id=&quot;looping&quot;&gt;Looping&lt;/h4&gt;

&lt;p&gt;Looping in Julia is similar to other languages. Python requires proper spacing for each level of a loop, with a colon for each evaluated expression. And although you generally don’t use many loops in R, to do so requires using parenthesis and brackets.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Python looping to create a term-frequency dictionary&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collections&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Counter&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;term_freq&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Counter&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;english_dictionary&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url_list&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url_list&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;term_freq&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Julia looping to create a term-frequency dictionary&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;term_freq&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int64&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;}()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;english_dictionary&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url_list&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;term_freq&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;term_freq&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If you’re coming from a Python background, you can see that there’s not a ton of difference between Python looping into a dictionary vs. Julia. The biggest differences are the use of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;end&lt;/code&gt; control-flow word and that Julia doesn’t currently have the convenience “Counter” object type. R doesn’t natively have a dictionary type, but you can add a similar concept using the &lt;a title=&quot;CRAN hash package&quot; href=&quot;http://cran.r-project.org/web/packages/hash/&quot; target=&quot;_blank&quot;&gt;hash&lt;/a&gt; package.&lt;/p&gt;

&lt;h4 id=&quot;vectorization&quot;&gt;Vectorization&lt;/h4&gt;

&lt;p&gt;While not required to achieve high performance, Julia also provides the &lt;a title=&quot;Is looping as a programming construct bad?&quot; href=&quot;http://slendrmeans.wordpress.com/2013/05/11/julia-loops/&quot; target=&quot;_blank&quot;&gt;functional programming construct of vectorization and list comprehensions&lt;/a&gt;. In R, you use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*apply&lt;/code&gt; family of functions instead of loops in order to &lt;a title=&quot;Functional programming in R&quot; href=&quot;https://github.com/hadley/devtools/wiki/Functional-programming&quot; target=&quot;_blank&quot;&gt;apply a function to multiple elements in a list&lt;/a&gt;. In Python, there are the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;map&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;reduce&lt;/code&gt; functions, but there is also the concept of list comprehensions. In Julia, both of the aforementioned functionalities are possible.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-julia&quot; data-lang=&quot;julia&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;c&quot;&gt;#Cube every number from 1 to 100&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Python map function&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cubes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lambda&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Python list comprehension&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cubes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)]&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#R sapply function&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cubes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sapply&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seq&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Julia map function&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cubes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;])&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#Julia list comprehension&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cubes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]]&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In each case, the syntax is &lt;em&gt;just about&lt;/em&gt; the same to apply a function across a list/array of numbers.&lt;/p&gt;

&lt;h3 id=&quot;a-small-but-intense-community&quot;&gt;A small, but intense community&lt;/h3&gt;

&lt;p&gt;One thing that’s important to note about Julia at this stage is that it’s very early. If you’re going to be messing around with Julia, there’s going to be a lot of alone-time experimenting and reading the &lt;a title=&quot;Julia documentation&quot; href=&quot;http://docs.julialang.org/en/latest/&quot; target=&quot;_blank&quot;&gt;Julia documentation&lt;/a&gt;. There are also several other resources including a &lt;a title=&quot;Julia users Google group&quot; href=&quot;https://groups.google.com/forum/?fromgroups=#!forum/julia-users&quot; target=&quot;_blank&quot;&gt;Julia-Users Google group&lt;/a&gt;, &lt;a title=&quot;Julia for R programmers&quot; href=&quot;http://www.stat.wisc.edu/~bates/JuliaForRProgrammers.pdf&quot; target=&quot;_blank&quot;&gt;Julia for R programmers&lt;/a&gt;, individual discussions on GitHub in the ‘Issues’ section of each Julia package, and a few tutorials floating around (&lt;a title=&quot;Julia tutorials&quot; href=&quot;http://forio.com/julia/tutorials-list&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt; and &lt;a title=&quot;Julia meta tutorial&quot; href=&quot;http://datacommunitydc.org/blog/2013/07/a-julia-meta-tutorial/&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Beyond just the written examples though, I’ve found that the budding Julia community is very helpful and willing in terms of answering questions. I’ve been bugging the hell out of &lt;a title=&quot;John Myles White&quot; href=&quot;http://www.johnmyleswhite.com/&quot; target=&quot;_blank&quot;&gt;John Myles White&lt;/a&gt; and he hasn’t complained (yet!), and even when code issues are raised through the users group or on GitHub, ultimately everyone has been very respectful and eager to help. So don’t be intimidated by the fact that Julia has a very MIT and Ph.D-ness to it…jump right in and migrate some of your favorite code over from other languages.&lt;/p&gt;

&lt;p&gt;While I haven’t moved to using Julia for my everyday workload, I am getting facility to the point where I’m starting to consider using Julia for selected projects. Once the language matures a bit more, &lt;del&gt;&lt;a title=&quot;Julia Studio&quot; href=&quot;http://forio.com/julia/&quot; target=&quot;_blank&quot;&gt;JuliaStudio&lt;/a&gt; starts to approach &lt;a title=&quot;RStudio&quot; href=&quot;http://www.rstudio.com/&quot; target=&quot;_blank&quot;&gt;RStudio&lt;/a&gt; in terms of functionality&lt;/del&gt;, and I get more familiar with the language in general, I can see Julia taking over for at least one if not all of my scientific programming languages.&lt;/p&gt;</content>
      </item>
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    
      
      
      
    

  </channel>
</rss>
