28th August 2015 | By: marcinkawa
Efficient SQL queries with Django
Creating/fetching/saving/deleting data from database is one of the most common tasks for the the web apps. Most of the time the web app is based on database and can not work without it. Therefore developers need to remember how important is to make efficient queries and not to slow down the application as it might become really frustrating for the end user to wait ages for the page to load.
The common problem during the development is that the database contains not enough data in to notice the problem. When the app goes live and more data is being saved into the database, everything becomes slower, when that happens it’s really hard to find quick solution to the problem. The best approach is to prevent this kind of situation from the development stage, by checking how many queries particular endpoint generates and limiting them to minimum. This approach will definitely save a lot of time and money when the app goes live.
Django provides really powerful and agile way to make queries to the database by using the models module. Although if we don’t know how to use it in the proper way the queries might become messy and inefficient. To demonstrate how Django database backend works, we’ll create a simple app with different type of relations between the models.
from django.db import models class Gender(models.Model): name = models.CharField(max_length=100) def __unicode__(self): return self.name class Color(models.Model): name = models.CharField(max_length=100) def __unicode__(self): return self.color class Info(models.Model): gender = models.ForeignKey('app.Gender') color = models.ForeignKey('app.Color') age = models.PositiveIntegerField(default=0) class Owner(models.Model): name = models.CharField(max_length=100) info = models.OneToOneField('app.Info') kittens = models.ManyToManyField('app.Kitten') def __unicode__(self): return self.name class Kitten(models.Model): name = models.CharField(max_length=100) info = models.OneToOneField('app.Info') def __unicode__(self): return self.name
Efficient object creation
If we’ve got to add many objects at once it it highly recommended to add them all at once. Let’s see how to add many `Color` type objects at once.
color_names = ['Black', 'White', 'Other'] color_list =  for c in color_names: color_list.append(Color(name=c)) Color.objects.bulk_create(color_list)
By using `bulk_create` method, the database will receive only one query.
This code will execute 1 SQL QUERY
Efficient object updating
To update many objects at once we can use `update` method. Let’s change colors of all `Color` objects to be the same.
This code will execute 1 SQL QUERY
Efficient object deleting
This code will execute 3 SQL QUERIES
Efficient object filtering/fetching
These were the simplest examples. Whenever the models are more complicated it becomes harder to keep track about the amount of queries. To demonstrate how tricky it might be we need to populate the database with some test data.
from app.models import * import random gender_names = ['Male', 'Female', 'Other'] color_names = ['Black', 'White', 'Other'] owner_names = ['Mick', 'John', 'Paul', 'Jenny', 'Betty', 'Sarah'] kitten_names = ['Flappy', 'Fluffy', 'Naughty', 'Peele', 'Koko', 'Panny'] # prepare `Gender` objects g_list =  for g in gender_names: g_list.append(Gender(name=g)) # save objects in bulk Gender.objects.bulk_create(g_list) # prepare `Color` objects c_list =  for c in color_names: c_list.append(Color(name=c)) # save `Color` objects Color.objects.bulk_create(c_list) # prepare 'Kitten' objects k_list =  for k in kitten_names: i = Info.objects.create( gender=Gender.objects.order_by('?').first(), color=Color.objects.order_by('?').first(), age=random.randrange(1,10) ) k_list.appned(Kitten( name=k, info=i )) # save 'Kitten' objects Kitten.objects.bulk_create(k_list) # create `Owner` objects o_list =  for o in owner_names: i = Info.objects.create( gender=Gender.objects.order_by('?').first(), color=Color.objects.order_by('?').first(), age=random.randrange(15,70) ) o_obj = Owner.objects.create( name=o, info=i ) # assign kittens to the owner s_s = random.randrange(0, 2) s_f = s_s + random.randrange(0, 4) o_obj.kittens.add(*[k for k in Kitten.objects.filter()[s_s:s_f]])
Fetch object with related models
Now we will try to fetch the `Owner` object with all `Kitten’s`, `Gender` and `Info` objects that belongs to it.
We can define helper function to print all the properties of the owner and all the kittens he owns
def printOwner(owner): print owner.name, owner.info.gender, owner.info.color, owner.info.age for kitten in owner.kittens.all(): print kitten.name, kitten.info.gender, kitten.info.color, kitten.info.age
Now select the owner object and print it’s properties
owner = Owner.objects.get(name='John') printOwner(owner)
This code will execute 11 SQL QUERIES
By accessing the model object and it’s properties like that we did a LOT of queries even for a one object. If we would like to list this way all owners and the kittens belongs to them we could hit database thousands of times. Luckily Django provides two functions to perform JOIN on the related tables.
- `select_related` is used to perform JOIN on ForeignKeys and OneToOne type of fields
- `prefetch_related` is used to preform JOIN on ManyToMany fields and reverse relations(like reverse ForeignKey or reverse ManyToMany field)
It’s also possible to nest the JOINs and prefetch more nested objects.
owner = Owner.objects.select_related( 'info', # JOIN `Info` object 'info__gender', # JOIN `Gender` object for `Info` 'info__color' # JOIN `Color` objects for `Info` ).prefetch_related( 'kittens', # JOIN `Kitten` objects 'kittens__info', # JOIN `Info` for `Kitten` objects 'kittens__info__gender', # JOIN `Gender` for `Info` for `Kitten` objects 'kittens__info__color' # JOIN `Color` for `Info` for `Kitten` objects ).get(name='John') printOwner(owner)
This code will execute 5 SQL QUERIES
This is a huge improvement even if we look for a single object case, however it becomes even more important when we take a look on listing the objects. The first case will hit database at least 10 times for each object when the second one will hit database only 5 times, no matter how many objects will be retrieved.
Django provides reverse relations for the developers convenience, by default it’s name of the related model class plus `_set`. In case of `Kitten` and `Owner` model it would be `owner_set`. It’s easy to access reverse ForeignKey or reverse ManyToMany relation. As same as previous, to limit amount of queries we can specify to join object from reverse relation as well.
kitten = Kitten.objects.select_related( 'info', 'info__gender', 'info__color' ).prefetch_related( 'owner_set', # JOIN `Owner` for `Kitten` 'owner_set__info', # JOIN `Info` for `Owner` for `Kitten` 'owner_set__info__gender', # JOIN `Gender` for `Info` for `Owner` for `Kitten` 'owner_set__info__color' # JOIN `Color` for `Info` for `Owner` for `Kitten` ).get(name="Fluffy") print kitten.name, kitten.info.gender.name for owner in kitten.owner_set.all(): print owner.name, owner.info.color
This code will execute 5 SQL QUERIES
Sometimes it’s hard to keep track, how many queries each endpoint is doing but it’s really worth to always pay attention and limit the database hits to minimum.