Sunday, December 30, 2007

Using raw SQL in django

For a django project I'm working on, I need to run a simple query with COUNT and GROUP BY statements to collect some summary data. I tried working with the model API, and eventually fell back to using raw SQL instead.

Here's the query I wanted to run:

SELECT
(year/10)*10 as decade,
count(*) as case_count
FROM
docket_case
GROUP BY
decade


One problem I encountered was there was no way (that I could find) to do a GROUP BY. I found one reference to a group_by() method, but I don't think the patch has been submitted or approved.

I spent some time trying to work out the right way to do it using the model API, before realizing I was banging my head on the wrong brick wall. I was using values() and select() extra(select='...') together, which should have clued me in right away. Well, that and the fact that the data I was retrieving wasn't one of my model classes.

Using a tip from page 325 of the new django book (see my earlier post for a review), I decided to just go ahead and run the query directly and work with the result set. The results need to go into a template, so I wanted to have dictionaries instead of the tuples I get from cursor.fetchall(). This is what I came up with:

from itertools import *
from django.db import connection

def query_to_dicts(query_string, *query_args):
"""Run a simple query and produce a generator
that returns the results as a bunch of dictionaries
with keys for the column values selected.
"""
cursor = connection.cursor()
cursor.execute(query_string, query_args)
col_names = [desc[0] for desc in cursor.description]
while True:
row = cursor.fetchone()
if row is None:
break
row_dict = dict(izip(col_names, row))
yield row_dict
return


You call it like:

    results = query_to_dicts("""
SELECT
(year/10)*10 as decade,
count(*) as case_count
FROM
docket_case
GROUP BY
decade
""")


or, with arguments (this query counts the cases in each year of a particular decade):

    results = query_to_dicts("""
SELECT
year,
count(*) as case_count
FROM
docket_case
WHERE
year/10 = %s
GROUP BY
year
""", decade/10)


I don't expect a large result set, but I figured this was an excuse to experiment with generators and iterators. I'm pretty happy with the results, but surprised I had to write something like this myself. I didn't see a cursor method to fetch the rows as dictionaries instead of tuples, and the search I ran didn't return anything that looked useful.

Am I missing something?

[Updated 31 Dec: One correction: The method is extra(), the argument is select. I was adding a count field using extra(), and then asking for the value with values(). There does not seem to be a way to structure the query I wanted using the models API methods.]

8 comments:

David Avraamides said...

This is one of the limitations of Django's ORM that eventually led to me moving away from Django. The problem is that arbitrarty SQL in general will not return rows that map to a model class so you can't leverage all the other functionality that the model classes provide, like accessing related objects.

I know it's not an easy thing to do and I'm not surprised that it's not supported directly in Django, but in my domain (financial services) the norm is to do lots of aggregation in SQL so I was always running into this limitation.

The query_to_dicts function you wrote is pretty common - and a little surprising it's not part of the DB API. (Try searching for "fetchalldict"). One thing you may want to change on your function is to change desc[0] to desc[0].lower().replace(' ', '_'). Eventually, you might use it to call a view or SP that you didn't write and you won't be able to use the .member syntax in templates if they have spaces in them.

Doug Hellmann said...

Thanks for the tip, David, I'll add those changes to my function just to be safe.

Once I figured it out, it was clear to me that trying to use the model interface for these particular queries was just wrong. I don't mind writing SQL (I actually like it, since I can express what I want very clearly), and in this particular case I'm not loading model data, I'm computing meta-data about those models. So, it makes sense to do it directly instead of trying to use a QuerySet() on the models.

What was confusing or surprising was that there didn't seem to be a fetchalldict() method already available.

Anonymous said...

custom sql django does it help?

Doug Hellmann said...

@Anonymous - That's an interesting link, and I'll consider adding a custom manager with some methods instead of embedding the query in my view function. Thanks!

JohnLenton said...

did you try importing dictfetchall (or one, or many) from django.db.backend ?

Doug Hellmann said...

Aha! No, I did not know about dictfetch*(). A quick google shows up some patches, but nothing in the documentation.

Can I ask how you knew about them?

JohnLenton said...

dictfecth* has been part of psycopg's dbapi for a long time (at least as long as I have been using it). I therefore guessed you were using a different backend, so I googled for "MySQLdb dictfetchall", and one of the first results was from django.db.backends.mysql.base. Then, I checked that it was actually in all the backends, and it is -- even in dummy, which makes me think it's ok to use it in my code. You should probably ask if it really is public api, however :)

Doug Hellmann said...

Ok, that makes sense. I've used psycopg for a while on another project and I thought there was such a function, but couldn't seem to discover the name. We use it at work with our own ORM, which has the feature built in but it works somewhat differently.

I came to the opposite conclusion that you did: if I couldn't find anything about it in the (usually) excellent documentation, it either didn't exist, wasn't portable across backends, or wasn't intended to be used publicly.