Fun With Just-In-Time Compiling: Julia, Python, R and pqR

Recently I’ve been spending a lot of time trying to learn Julia by doing the problems at Project Euler. What’s great about these problems is that it gets me out of my normal design patterns, since I don’t generally think about prime numbers, factorials and other number theory problems during my normal workday. These problems have also given me the opportunity to really think about how computers work, since Julia allows the programmer to pass type declarations to the just-in-time compiler (JIT).

As I’ve been working on optimizing my Julia code, I decided to figure out how fast this problem can be solved using any of the languages/techniques I know. So I decided to benchmark one of the Project Euler problems using Julia, Python, Python with NumbaPyPy, R, R using the compiler package, pqR and pqR using the compiler package. Here’s what I found…

Problem

The problem I’m using for the benchmark is calculating the smallest number that is divisible by all of the numbers in a factorial. For example, for the numbers in 5!, 60 is the smallest number that is divisible by 2, 3, 4 and 5. Here’s the Julia code:

All code versions follow this same pattern: the outside loop will run from 1 up to n!, since by definition the last value in the loop will be divisible by all of the numbers in the factorial. The inner loops go through and do a modulo calculation, checking to see if there is a remainder after division. If there is a remainder, break out of the loop and move to the next number. Once the state occurs where there is no remainder on the modulo calculation and the inner loop value of j equals the last number in the factorial (i.e. it is divisible by all of the factorial numbers), we have found the minimum number.


Benchmarking – Overall

Here are the results of the eight permutations of languages/techniques (see this GitHub Gist for the actual code used, this link for results file, and this GitHub Gist for the ggplot2 code):

jit-comparison

Across the range of tests from 5! to 20!, Julia is the fastest to find the minimum number. Python with Numba is second and PyPy is third. pqR fares better than R in general, but using the compiler package can narrow the gap.

To make more useful comparisons, in the next section I’ll compare each language to its “compiled” function state.

Benchmarking – Individual

Python

JITpython

Amongst the native Python code options, I saw a 16x speedup by using PyPy instead of Python 2.7.6 (10.62s vs. 172.06s at 20!). Using Numba with Python instead of PyPy nets an incremental ~40% speedup using the @autojit decorator (7.63s vs. 10.63 at 20!).

So in the case of Python, using two lines of code with the Numba JIT compiler you can get substantial improvements in performance without needing to do any code re-writes. This is a great benefit given that you can stay in native Python, since PyPy doesn’t support all existing packages within the Python ecosystem.

R/pqR

JITr

It’s understood in the R community that loops are not a strong point of the language. In the case of this problem, I decided to use loops because 1) it keeps the code pattern similar across languages and 2) I hoped I’d see the max benefit from the compiler package by not trying any funky R optimizations up front.

As expected, pqR is generally faster than R and using the compiler package is faster than not using the compiler. I saw ~30% improvement using pqR relative to R and ~20% incremental improvement using the compiler package with pqR. Using the compiler package within R showed ~35% improvement.

So unlike the case with Python, where you could just use Python with Numba and stay within the same language/environment, if you can use pqR and the compiler package, you can get a performance benefit from using both.

Summary

For a comparison like I’ve done above, it’s easy to get carried away and extrapolate the results from one simple test to all programming problems ever. “Julia is the best language for all cases ever!!!11111eleventy!” would be easy to proclaim, but all problems aren’t looping problems using simple division. Once you get into writing longer programs, other tasks such string manipulation and accessing APIs, using a technique from a package only available in one ecosystem but not another, etc., which tool is “best” for solving a problem becomes a much more difficult decision. The only way to know how much improvement you can see from different techniques & tools is to profile your program(s) and experiment.

The main thing that I took away from this exercise is that no matter which tool you are comfortable with to do analysis, there are potentially large performance improvements that can be made just by using a JIT without needing to dramatically re-write your code. For those of us who don’t know C (and/or are too lazy to re-write our code several times to wring out a little extra performance), that’s a great thing.

Comments

  1. I wonder to what extent the results you’re getting are a result of Python and R’s decision to automatically switch over to arbitrary-precision integers when arithmetic operations overflow. In that case, the big performances gains you’re seeing are partly the result of a decision on Julia’s part to obey a 64-bit machine’s real abilities, whereas Python and R are paying a penalty for their decision to pretend that machines can behave like mathematical abstractions.

    • Randy Zwitch says:

      If I’m thinking about this correctly, all of the arithmetic should be staying within the bounds of Int64, since the smallest divisible number for 20! is only 232,792,560. So there shouldn’t be any performance penalty I don’t think, since the outer loop never hits the Int64 max.

      julia> typemax(Int64)
      9223372036854775807

      julia> @time smallestdivisall(20)
      elapsed time: 7.043304011 seconds (64 bytes allocated)
      232792560

      • I think there are two issues:

        (1) The outer loop is literally just at the Int64 typemax bound because factorial(20) is within bounds, but factorial(21) is out of bounds and causes overflow.

        (2) The problem of comparability isn’t really the Julia code: it’s the fact that Python isn’t using a real Int64 right from the start, but this weird magically wrapped data type that dynamically changes type from Int64 to BigInt when overflow happens. I suspect a big part of the performance penalty you see in the Python code is the payment for making that magic conversion happen.

        • Randy Zwitch says:

          That’s a good point in the comparison between Julia and standard Python that the “magic” is inherently a penalty. Numba explicitly allows for setting Int64 as an input/return type (I didn’t show the results, but I did try it), so I assume that Numba is bypassing ‘standard’ Python?

          • I don’t know enough about Numba to say, but I’d guess that Numba is using machine integers in some places. But using machine integers everywhere is really hard to do correctly since replacing Python’s default integer with Int64 will break any code that doesn’t check for overflows. This is part of the reason why the Python compiler projects are so complex: Python’s default semantics are harder to make fast than Julia’s, so you need a smarter compiler to handle code like yours correctly.

  2. I think you should mention that you used a different type
    of code in R vs the code shown – in R you used iterators (because R doesn’t support 64 bit ints). I think it would have been better to use the lowest common denominator of all languages, and use 32 bit ints for all.

    And, almost unrelated comments….

    1. Using the regular R code is actually slower than using iterators I think, so your comparison is conservative with respect to R. It could be that the gain from compilation will be bigger, though.

    2. The R code as written doesn’t really solve the 64 bit problem, because nextElem(m) will return a double or an integer (I’m not sure which) when big enough, and still won’t have enough precision to do check i %%j == 0, or the comparison with factorial(n).

    3. Looking at the source of icount() shows that the iteration is simply done over integers, so it is still 32 bit.

    • Randy Zwitch says:

      That’s a fair point Michael, I didn’t realize that the error message for R that I was seeing was due to R not supporting 64-bit ints, I thought it was complaining that I was trying to make a massive vector (since seq(1:factorial(n)) would be gigantic). So that’s why I ended up using the iterator package instead of native R code.

      I wonder if I ran the code again if I would see any difference, since even though 20! is on the boundary of Int64 overflow, the iterator never actually crosses the Int32 threshold when getting the minimum number divisible by the components of 20!

  3. Hi Randy,
    Nice post, thank you.

    Just a tiny plug for an older post I wrote:
    http://www.r-statistics.com/2012/04/speed-up-your-r-code-using-a-just-in-time-jit-compiler/

    Cheers,
    Tal

  4. I know that talking about using a different algorithm misses the point of this kind of post, but FWIW this is how I’d tackle it in R:

    library(gmp)
    n = 20
    system.time(print(prod((1:n) ^ apply(sapply(1:n, function(x) table(factor(as.integer(factorize(x)), 1:n))), 1, max))))
    % [1] 232792560
    % user system elapsed
    % 0 0 0

  5. Thinking about this more, maybe you should try running the factorial loop up to 21 to see which languages overflow and which don’t?

Trackbacks

  1. […] Programming News: Fun With Just-In-Time Compiling: Julia, Python, R and pqR Recently I’ve been spending a lot of time trying to learn Julia by doing the problems at Project Euler. What’s great about these problems is that it gets me out of my normal design patterns, since I don’t generally think about prime numbers, factorials and other number theory problems during my normal workday. These problems have also given me the opportunity to really think about how computers work, since Julia allows the programmer to pass type declarations to the just-in-time compiler (JIT). As I’ve been working on optimizing my Julia code, I decided to figure out how fast this problem can be solved using any of the languages/techniques I know. So I decided to benchmark one of the Project Euler problems using Julia, Python, Python with Numba, PyPy, R, R using the compiler package, pqR and pqR using the compiler package. Here’s what I found… Read full story => RandyZwitch […]

Leave a Reply