Optimize Django memory usage

Recently I had a problem of memory usage in Django: when I accessed an apparently innocent view I saw the memory usage of my server grow without rest. The problem turned out to be very trivial to solve, but I think the process I used to find the leak is worth a blog post. šŸ˜‰

This is the apparently innocent view

as you can see it’s very simple, the peculiarity hereĀ is the dimension of the queryset, because the table contains about 100k records, and each record has several ForeignKey fields to other models. An experienced Django developer should immediately see the problem, but I’m not one of them ;), so I had to investigate.

If you are impatient, go directly to my solution and have a good time! šŸ™‚

When you want to profile memory usage in Python you’ll find some useful tools, after reading this good article I choose to use objgraph. But there is a problem:Ā objgraph is designed to work in a python console, while my code is running in a Django powered website.

So I put together some code to redirect the standard output used by objgraph to my beloved Django logging system, and here is the result.

In the Django log I saw something like this (I cut from the log some internal objects not so interesting in this example):

[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] dict 106524 +106524
[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] ModelState 101328 +101328
[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] FirstModel 98327 +98327

[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] dict 109526 +3002
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] ModelState 104329 +3001
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] SecondModel 1999 +1000
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] ThirdModel 1999 +1000
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] FourthModel 2000 +1000

[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] dict 112874 +3348
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] ModelState 107330 +3001
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] FourthModel 3000 +1000
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] SecondModel 2999 +1000
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] ThirdModel 2999 +1000

FirstModel has some ForeignKey fields toĀ SecondModel, ThirdModel and so on, so every 1000 row each of them appears in memory. But why Django is putting all those objects in memory? After all I want only to write a record in my dumb.dump file.

It turns out that Django caches the results of each queryset, when you iterate over it.

That is the default behavior, but you canĀ avoid cachingĀ and simply iterate over a queryset usingĀ the iterator method on the queryset. The Django documentation clearly states:

“For a QuerySet which returns a large number of objects that you only need to access once, this can result in better performance and a significant reduction in memory.”

So the solution to my problem was simply adding .iterator() to my queryset, like this:

Fortunately that was a trivial fix, but it was funny to discover the issue and fix it! šŸ˜‰

Update:

It happened to me thatĀ even using .iterator()Ā the data was first fetched on the client side (see python process memory) by the database driver, occupying a lot of memory. This happens because Django doesn’t yet support server side cursors.

A workaround for this issue is using an utility function like this:

You can use simply by passing a normal Django queryset as the single parameter, like this:

for obj in queryset_iterator(HugeQueryset.objects.all()):
   # do something with obj
   pass

This can be useful even when you want to delete an high number of objects that have a ForeignKey pointing to them. In that case using the Queryset .delete()Ā method doesn’t help because Django has to fetch each object in memory to handle the deletion cascade policy given for the objects being deleted. You can do something like this to mitigate that effect:

for obj in queryset_iterator(HugeQueryset.objects.all()):
   obj.delete()

Happy coding! šŸ˜‰

Advertisements

3 thoughts on “Optimize Django memory usage

  1. Hi, regarding last (ie. delete an high number of objects), it is better to use HugeQueryset.objects.filter(pk__in=idsicollectedinlist).delete()

    • Beware that if idsicollectedinlist is a big list of ids, and HugeQueryset objects have foreign keys pointing to them, a big amount of memory will be allocated to collect the objects that should be deleted in a cascade. The proposed method is not efficient in terms of database queries executed, but it is in terms of memory allocated.

  2. Oh my! I love you. Thanks! This solved a RAM leak issue I was having with one of my queries.

    Django just decided to cache it EVERY SINGLE TIME it was executed. And it was a rather big query so my server was running out of ram rapidly.

    Using .iterator() has resolved it!

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s