Thursday, April 1, 2010

cython - easier optimisations than writing C

For this Bit Of Cheese I thought I’d present a bit of an example using cython since I just learned how to use it myself for PyWeek.

For the unaware, cython is a neat little extension to Python that makes it much easier to write C-optimised modules for your Python code, including using 3rd-party C libraries.

For my PyWeek entry I need to generate a lot of cubic bézier curves. I also wanted to experiment with mutating them - which basically means re-generating them many times over. The core curve generation code therefore had to be as fast as I could possibly make it.

This code looks something like this in Python:

def generate(p1x, p1y, p2x, p2y, cp1x, cp1y, cp2x,
         cp2y, step):
    '''Given the two points p1 and p2 and their control
     points cp1 and cp2 generate a cubic bezier curve with
     steps of "step".

    Return the list of (x, y) points on the curve.
    '''
    l = []
    # generate the cubic bezier points
    x1 = cp1x * 3; x2 = cp2x * 3
    y1 = cp1y * 3; y2 = cp2y * 3
    t = 0
    while t <= (1 + step):
        a = t; a2 = a**2; a3 = a**3
        b = 1 - t; b2 = b**2; b3 = b**3
        px = p1x*b3 + x1*b2*a + x2*b*a2 + p2x*a3
        py = p1y*b3 + y1*b2*a + y2*b*a2 + p2y*a3
        l.append((px, py))
        t += step
    return l

def speed_test():
    generate(0, 0, 5, 0, 0, 1, 5, -1, .0001)

For generating a curve with quite small steps on my MacBook the result is about 17.5 milliseconds per curve:

$ python -m timeit -s 'import curve' 'curve.speed_test()'
100 loops, best of 3: 17.5 msec per loop

The first thing I tried is simply copying my curve.py to _curve.pyx (the .pyx denoting it being a cython module) and adding some code to the end of the original curve.py to use the cython version:

try:
    import pyximport; pyximport.install()
    import _curve
    generate = _curve.generate
except Exception, e:
    print '_curve not available:', e
    pass

This is only one way to generate cython modules. The other is to use code in your setup.py file – but for this project I don’t actually have one. The pyximport module will compile the “pyrex” format cython files with the .pyx suffix on import. It detects changes to the original file and recompiles, so it’s incredibly convenient!

Re-running the test after that change I found the result was surprisingly similar:

$ python -m timeit -s 'import curve' 'curve.speed_test()'
100 loops, best of 3: 17.5 msec per loop

Hmm.

OK, well, cython has a bunch of hints I can give it like function arguments. These are just C types that I can add to the Python function signature.

Let’s try those to start with. Note that Python floats are actually C doubles:

def generate(double p1x, double p1y, double p2x,
        double p2y, double cp1x, double cp1y,
        double cp2x, double cp2y, double step):

This saves me a couple of seconds:

$ python -m timeit -s 'import curve' 'curve.speed_test()'
100 loops, best of 3: 15.5 msec per loop

So let’s declare all of my variable types (except the Python list):

cdef double x1, x2, y1, y2, t
cdef double a, a2, a3, b, b2, b3, px, py

This change bought me the improvement I was after - the function is now way faster than the original Python version:

$ python -m timeit -s 'import curve' 'curve.speed_test()'
100 loops, best of 3: 2.08 msec per loop

Cool eh?

The final code in _curve.pyx looks like this:

def generate(double p1x, double p1y, double p2x,
        double p2y, double cp1x, double cp1y,
        double cp2x, double cp2y, double step): 
    '''Given the two points p1 and p2 and their control 
    points cp1 and cp2 generate a cubic bezier curve 
    with steps of "step".

    Return the list of (x, y) points on the curve. 
    ''' 
    cdef double x1, x2, y1, y2, t 
    cdef double a, a2, a3, b, b2, b3, px, py 
    l = [] 
    # generate the cubic bezier points 
    x1 = cp1x * 3; x2 = cp2x * 3 
    y1 = cp1y * 3; y2 = cp2y * 3 
    t = 0 
    while t <= (1 + step): 
        a = t; a2 = a**2; a3 = a**3 
        b = 1 - t; b2 = b**2; b3 = b**3 
        px = p1x*b3 + x1*b2*a + x2*b*a2 + p2x*a3 
        py = p1y*b3 + y1*b2*a + y2*b*a2 + p2y*a3 
        l.append((px, py)) 
        t += step 
    return l

The important thing to note here is that the bulk of the function is untouched, pure Python. cython does all the smarts now it knows enough about the types of the variables. Clever cython!

OK, now the downside. It took me a lot of trial-and-error and banging my head against the cython docs to even do something as simple as that.

One of my early mistakes was to declare the function "cdef" like I saw in all the examples. This results in a function that's not actually exported to Python from the module though. It needs to remain a standard "def" (or, if it's to be used in both C and Python it may be a "cpdef"'ed function). This stumble cost me quite a large amount of time.

Eventually I found most of the useful information in a section marked “Old Cython Users Guide” which, though broken in places, still contains the most useful information for a newbie like me.

ps. apologies for the horizontally smooshed code - I've only just found a wider template. Future posts won't be quite as restricted.

2 comments:

  1. For what it's worth, on your original Python version, I get 27msec per loop. If I run it with PyPy, unmodified, I get 8msec per loop.

    Just about a 3.5x speedup - not quite the 8x improvement you saw with Cython, but a bit easier. :)

    ReplyDelete
  2. I might try running my game under PyPy when PyWeek is over. I believe the ctypes implementation is up to the task.

    ReplyDelete