Monday, May 14, 2012

Re: Tree hierarchy with prefetch_related()?

On Tue, May 15, 2012 at 1:20 AM, Carsten Fuchs <carsten.fuchs@cafu.de> wrote:
> Dear Django group,
>
> I've never worked with hierarchic database models before, and would be very
> grateful for your advice:
>
> In my application I need a hierarchy of objects, e.g.
>
>
> class Account(models.Model):
>    parent = models.ForeignKey('self', null=True, blank=True)
>    name   = models.CharField(max_length=60, blank=True)
>
>
> In our application, there are only few instances of Account (about 150,
> never changing much). So they would easily all fit into memory at once, and
> we (possibly) have to traverse the entire hierarchy multiple times for each
> view anyway.
>
>
> My research and reading so far revealed that for modelling trees, usually
>     django-mptt or django-treebeard are suggested.
>
> Would you still recommend them even for trees with small number of instances
> as outlined above?
>
> Compared to mptt, the above model looks appealingly lightweight and simple
> to me, and my original plan was, for each view request, to iterate once over
> Account.objects.all() in order to assemble the tree "manually" as python
> objects in memory (with a reference to the related Account instance at each
> node), then use that for all further processing.
>
> Does that sound like good Django practice?

The answer, as with all interesting questions, is "It Depends" :-)

The issue with hierarchical data in a database is always the cost of
querying and traversal. The simple approach that you've described
works fine -- but requires either O(N^2) queries to retrieve an N-deep
subtree, or requires in-view processing after retrieving the entire
tree. The benefit of MPTT and other tree-strategies is that they can
cut the number of queries for an arbitrary subtree down to something
closer to O(1).

However, the catch with all O(N) calculations is that it depends on
the size of N, and the number of times you're going to perform the
calculation. If, as you say, know you you will only have 150 items in
your tree, and this isn't going to change, and your site isn't going
to be under extreme load (so a little extra view processing won't
matter *too* much), the benefit of using MPTT (or a similar strategy)
may well be overwhelmed by the engineering cost. You may well be
better served using a simpler database model, and maybe looking into
caching specific subtrees as a way to mitigate the extra load you'll
be putting on the web server.

> Then, doing further research, I found prefetch_related()
> (https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related)
>
> Well, I have read this over and over again, but... is
>
>        Account.objects.all().prefetch_related('parent')
>
> of any help here?

Not if you use it like that :-)

If you're operating on 'parent', you should be looking at
select_related(), not prefetch_related(). prefetch_related() works on
the "N" end of a 1-N relation (so,
Account.objects.prefetch_related('account_set') would make more
sense).

select_related returns a single row, but does a join over the 1 end of
a FK relationship so the single related object can be pre-populated.
This allows you to populate 2 objects with a single query (which,
assuming you have a database that does the join efficiently, should be
faster than 2 queries).

prefetch_related performs 2 queries -- one to retrieve the original
query, and one to retrieve the set of related objects; the results of
the second query are then used to populate the N objects related to
the objects retrieved by the first query.

That said -- if you're retrieving Account.objects.all(), and account
is related to self, then you've already retrieved everything. Adding
select_related or prefetch_related just means you'll be retrieving the
same objects twice (either through a join, or with a second query).
In this case you'd be better served by doing a single query for all
accounts, then building an in-memory index (a dict of pk->object), and
using that for subsequent lookups for tree traversal.

Yours,
Russ Magee %-)

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate