We can chain querysets together with the chain(…)
function [python-doc] of the itertools
package [python-doc]. For example if we have two Post
s, we can chain the posts with a publish_date
and then the ones where the publish_date
is NULL
:
from itertools import chain
qs1 = Post.objects.filter(publish_date__isnull=False).order_by('publish_date')
qs2 = Post.objects.filter(publish_date=None)
result = chain(qs1, qs2)
Why is it a problem?
The main problem is that the result is not a QuerySet
, but a chain
object. This means that all methods offered by a QuerySet
can no longer be used. Indeed, say that we want to filter the Post
s with:
result.filter(author=some_author)
then this will raise an error. Often such filtering is not done explicitly in the view, but for example by a FilterSet
the developer wants to use.
Another problem is that a chain
can not be enumerated multiple times. Indeed:
>>> c = chain([1,4], [2,5])
>>> list(c)
[1, 4, 2, 5]
>>> list(c)
[]
This thus means if multiple for
loops are used, only the first will iterate over the elements. We can work with list(…)
, and thus use result = list(chain(qs1, qs2))
to prevent this effect.
Another problem is that result
will eventually perform multiple queries. In this example there will be two queries. If we chain however five querysets together, it results in (at least) five queries. This thus makes it more expensive.
What can be done to resolve the problem?
Group the queries together into a single queryset. If the order is of no importance, we can make use of the |
operator:
result = qs1 | qs2
if the order is of importance, we can make use of .union(…)
[Django-doc]:
qs1.union(qs2, all=True)
Extra tips
We can use chain(…)
when we query for example different models like:
from itertools import chain
qs1 = Post.objects.all()
qs2 = Author.objects.all()
result = list(chain(qs1, qs2))
But it is seldomly the case that a collection contains elements of a different type. Especially since very often processing Post
s will be different from processing Author
s.