For those that aren't aware, SQLObject is an Object-Relational Mapping (ORM) library for Python. I use it in chrss (my chess by rss web app) as part of Turbogears. Ian and Kyran also use it as part of the ShowMeDo site.
Chrss and ShowMeDo have quite different levels of traffic. ShowMeDo has a lot more traffic than chrss, so performance might seem like more of an issue for ShowMeDo. However as chrss is a game that requires more interaction from the user this is not necessarily the case. If moving a piece takes even a second the site would seem sluggish. Whereas for a content rich site such as ShowMeDo user expectation can be a bit more forgiving.
Until recently Ian and Kyran have not needed to worry about performance and (rightly so) got on with the things that mattered (e.g. creating more screen-casts and building their community).
However the other day Ian asked me to help him out speed the site up. They were having some issues with a page taking too long to render. When creating chrss I'd spent a bit of extra time worrying about the performance of SQLObject, so I already knew what to look out for in their code. Luckily it mostly only required a few small tweaks and things ran a good deal quicker.
So what can you do to speed up SQLObject?
Enable Query Logging
Obviously don't do this for your production server (it'll only slow things down), but by adding ?debug=1 to your database connection URI, you can enable debug query logging. This will simply make SQLObject print out the details of every SQL statement that is ran against the database.
When developing this can give you a good idea of when you aren't using SQLObject in an appropriate fashion. If you see pages of SQL statements flying past in your console window you should probably have a look to see why!
Enabling query logging is only going to help if you actually understand the SQL that you are looking at. Make sure you do some research if you aren't familiar with SQL. SQLObject makes dealing with a relational database easier, but you still need to understand what it is actually doing to make the most of it.
An example
I quick-started a project with tg-admin and created two model classes using RelatedJoin to link them:
class Entry(SQLObject): title=StringCol(length=255) body=StringCol() tags=RelatedJoin('Tag') class Tag(SQLObject): name=StringCol(length=255, alternateID=True, alternateMethodName="by_tag_name") entries=RelatedJoin('Entry')
Pretty simple stuff. We can define an Entry and add Tag objects to it.
Then I ran tg-admin sql create to populate the (SQLite) database.
Next I ran tg-admin shell so I could create some objects in the database:
entry=Entry(title='a title',body='entry body') test_tag=Tag(name='test_tag') tag2=Tag(name='tag2') entry.addTag(test_tag) entry.addTag(tag2)
I then added ?debug=1 to the database URI:
sqlobject.dburi="sqlite://%(current_dir_uri)s/devdata.sqlite?debug=1"
Then I restarted tg-admin shell (with the IPython shell) and ran the following:
In [1]: entry=Entry.get(1) 1/QueryOne: SELECT title, body FROM entry WHERE id = (1) 1/QueryR : SELECT title, body FROM entry WHERE id = (1) In [2]: for tag in entry.tags: ...: print "tag.name=%s" % tag.name ...: 1/QueryAll: SELECT tag_id FROM entry_tag WHERE entry_id = (1) 1/QueryR : SELECT tag_id FROM entry_tag WHERE entry_id = (1) 1/QueryOne: SELECT name FROM tag WHERE id = (1) 1/QueryR : SELECT name FROM tag WHERE id = (1) 1/QueryOne: SELECT name FROM tag WHERE id = (2) 1/QueryR : SELECT name FROM tag WHERE id = (2) tag.name=test_tag tag.name=tag2
As you can see with a RelatedJoin printing the two tags on the Entry requires the following three queries:
SELECT tag_id FROM entry_tag WHERE entry_id = (1) SELECT name FROM tag WHERE id = (1) SELECT name FROM tag WHERE id = (2)
A minor change
Simply changing RelatedJoin to SQLRelatedJoin in the models and running that same code yields:
In [1]: entry=Entry.get(1) 1/QueryOne: SELECT title, body FROM entry WHERE id = (1) 1/QueryR : SELECT title, body FROM entry WHERE id = (1) In [2]: for tag in entry.tags: ...: print "tag.name=%s" % tag.name ...: 1/Select : SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1))) 1/QueryR : SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1))) tag.name=test_tag tag.name=tag2
Printing out the tag names for the entry now only requires one query:
SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1)))
This is a big improvement - the number of queries we will run now no longer depends on the number of objects being returned.
Some caveats and notes
It's not always this simple, so here are some issues you may encounter:
- RelatedJoin returns a list, whereas SQLRelatedJoin returns a SelectResults object (the same kind of object returned when calling select())
- Large columns (text/binary blobs) won't get lazily loaded with SQLRelatedJoin
- Fewer database queries doesn't always mean your code will run faster - understand what each query is doing
- Make sure you properly index your database
- You need to understand the SQL that SQLObject generates to get the most out of SQLObject
- SQLObject may not be as slow as you think - you might not be using it right