Software
12th December 2015 | By:

How to export data as a CSV – Django model

I recently faced a new challenge where I had to generate large CSV files from the backend. The whole process can take a while and if done improperly can cause the server to stop responding or even crash in worst-case scenarios.

Let’s assume we have a database full of kittens, and the following queryset to retrieve them:

kittens_qs = Kitten.objects.all() # Assume >50,000 objects inside

In order to export a Queryset as a CSV we need to know a little bit more about the given model, such as the fields and the relations it holds, to find out more about the model and the dependencies, we can use the “_meta” property:

model = kittens_qs.model
model_fields = model._meta.fields + model._meta.many_to_many

Next let’s define a utility function which will generate a row for each object passed from the queryset. It is very important to use unicode as much as you can, as this will ensure we can export data which is outside of the ASCII capabilities, hence the encoding in “utf-8”. Take a closer look at how we are handling the ManyToMany field, those entries can have one or more results,  so we will be showing them as a comma separated list:

def get_row(obj):
	row = []
	for field in model_fields:
		if type(field) == models.ForeignKey:
			val = getattr(obj, field.name)
			if val:
				val = val.__unicode__()
		elif type(field) == models.ManyToManyField:
			val = u', '.join([item.__unicode__() for item in getattr(obj, field.name).all()])
		elif field.choices:
			val = getattr(obj, 'get_%s_display'%field.name)()
		else:
			val = getattr(obj, field.name)
		row.append(unicode(val).encode("utf-8"))
	return row

When generating a huge CSV file you DON’T want to generate the file and only then send it to the client, what you should do is actually “stream” the data chunk by chunk to the client once you have enough data generated. This can be accomplished by using the “StreamingHttpResponse” rather than the normal HttpResponse/render. You can read more about this in the Django documenation.

class Echo(object):
	"""An object that implements just the write method of the file-like interface."""
	def write(self, value):
		"""Write the value by returning it, instead of storing in a buffer."""
		return value
class KittensExportToCsvView(View):
	def get(self, request, *args, **kwargs):
		pseudo_buffer = Echo()
		writer = csv.writer(pseudo_buffer)
		response = StreamingHttpResponse((writer.writerow(get_row(obj)) for obj in kittens_qs), content_type="text/csv")
		response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
		return response

If you run the previous code you will see that it works quite well, but we are missing a very important part in the CSV, the headers! Let’s add this functionality with another helper function named “stream”. The final code will look like this:

from django.db import models
from django.http import StreamingHttpResponse
from django.views.generic import View
import csv
class Echo(object):
	"""An object that implements just the write method of the file-like interface.
	"""
	def write(self, value):
		"""Write the value by returning it, instead of storing in a buffer."""
		return value
class ContactLogExportCsvView(View):
	def get(self, request, *args, **kwargs):
		kittens_qs = Kitten.objects.all() # Assume 50,000 objects inside
		model = kittens_qs.model
		model_fields = model._meta.fields + model._meta.many_to_many
		headers = [field.name for field in model_fields] # Create CSV headers
		def get_row(obj):
			row = []
			for field in model_fields:
				if type(field) == models.ForeignKey:
					val = getattr(obj, field.name)
					if val:
						val = val.__unicode__()
				elif type(field) == models.ManyToManyField:
					val = u', '.join([item.__unicode__() for item in getattr(obj, field.name).all()])
				elif field.choices:
					val = getattr(obj, 'get_%s_display'%field.name)()
				else:
					val = getattr(obj, field.name)
				row.append(unicode(val).encode("utf-8"))
			return row
		def stream(headers, data): # Helper function to inject headers
			if headers:
				yield headers
			for obj in data:
				yield get_row(obj)
		pseudo_buffer = Echo()
		writer = csv.writer(pseudo_buffer)
		response = StreamingHttpResponse(
			(writer.writerow(row) for row in stream(headers, kittens_qs)),
			content_type="text/csv")
		response['Content-Disposition'] = 'attachment; filename="all_kittens.csv"'
		return response

Hope this helps and happy Djangoing!

Tags: , , ,

1 Comment

  1. Tim
    13th December 2016 @ 10:38

    I only ever get a type error from your code…

    “TypeError at /website/export

    __init__() takes 1 positional argument but 2 were given”

    Any suggestions?