Psychic Origami - Packing Python scripts for the 5K app

So far for the 5K App I've written two example apps - one in Java and one in Javascript. This is partly so that I can figure out how the rules for the competition will need to work. In a lot of 4K competitions a single language is chosen, so things are a bit more straightforward. However Brighton seems like quite a varied place and allowing multiple languages for the competition seems like a nice friendly inclusive thing to do.

So the next example app I'll be writing will be in Python. As Python source is the executable code there's no compilation step (unlike Java) and un-like Javascript there aren't really many tools for reducing code size. Python (like Ruby, Perl and PHP) tends to result in fairly concise code, so normally this isn't too much of a problem. However 5K is quite a tough constraint and we really want to squeeze the most out of our code by compressing and packing it if possible.

So here's a little script for compressing a Python script to make it much smaller:

import sys
import bz2
import base64

write=sys.stdout.write

for name in sys.argv[1:]:
    contents=bz2.compress(open(name).read())
    write('import bz2, base64\n')
    write('exec bz2.decompress(base64.b64decode(\'')
    write(base64.b64encode((contents)))
    write('\'))\n')

To use it save that code into a file (e.g. pack.py). It will write the compressed code to standard output (i.e. the screen), so you can redirect it to whatever file you want:

python pack.py my_script.py > my_script_packed.py

The compressed code looks like this:

import bz2, base64
exec bz2.decompress(base64.b64decode('<base64 encoded compressed data>'))

Which hopefully is fairly clear as to what it's doing, but to summarize:

The script data is base 64 decoded into bytes
The bytes are then decompressed (bz2) into the text of the original script
exec is then called to run the original script

One nice benefit to this way of compressing the script is that the final module namespace (after de-compression) will look essentially the same as it did if it was not compressed. The only difference is the bz2 and base64 modules will also be present. This should mean that you can actually compress multiple Python files and importing from them should still work. Though of course as is the case when adding an extra layer of complexity your mileage may vary...

For an idea of how effective this compression can be I took the Python script for calculating pi on wikipedia and ran it through the script. After confirming that it ran the same, a quick comparison revealed the compressed version had gone from 12658 bytes to 3421 bytes - less than a third of the original size.

It should be possible to create similar scripts for Ruby (using zlib and base64), Perl (using Compress::Zlib and MIME::Base64) and PHP (using bzdecompress and base64_decode).

As I work on the example Python 5K app I should hopefully get a good feel for how the competition rules might need changing to allow "scripting" languages like Python, Ruby, Perl and PHP. I think the plan will be that the app must be contained in a single file (less than 5120 bytes in size), with any resources embedded in the file. This is the norm for Java and Flash, but will probably require an extra packaging step for most other languages/runtimes. That single file can then be run either via a GUI (double-clicking) or via a standard invocation from the command line (e.g. python my_script_packed.py) using only a "standard" version of the language runtime. Note that the standard installed version of Python on MacOS X 10.5 (Leopard) includes quite a few extra libraries (e.g. wxPython) so these libraries would be eligible for use in the 5K app. The same will be true for Ruby and Perl, so that should hopefully help open things out. Otherwise Java's large standard library might give it too much of an advantage...