Re: or-ing to QuerySets turns INNER into LEFT OUTER joins?
Hi Tom,
thank you very much for your detailed reply, it helped at lot!
Am 25.07.2013 19:00, schrieb Tom Evans:
> [...]
> As an example you can play around with, pick one user who has log
> entries, and remove all their departments from STAFF_BEREICHE. This
> user should then be found with the slow query with LEFT OUTER joins,
> but not with the fast query with INNER joins [...]
Yes, I can reproduce this, and also for the opposite case, where a STAFF
member has no entries in the related ERFASST table.
And although I've clipped most of your text in the quote above for
brevity, your text really made me understand the background and
reasoning, too! :-D
Now with correctness established, this leaves me with the performance
problem. The formulation with the OR and the LEFT OUTER joins is indeed
what we semantically need (also see below, about Erfasst and logging ;) ).
The best that I could think of so far is to run the individual queries
individually, combine the results in Python code (the "OR" operation),
then (as we need a QuerySet at this place in the code for further
filtering) run another query with `.filter(id__in=ManuallyCombinedResults)`:
S_a = set(Staff.objects. \
filter(bereiche=SomeDep).values_list('id', flat=True))
S_b = set(Staff.objects. \
filter(erfasst__datum__gte=FirstDay,
erfasst__datum__lte=LastDay,
erfasst__bereich=SomeDep).values_list('id', flat=True))
Q_a_or_b = Staff.objects.filter(id__in=S_a | S_b).distinct()
This is still about 50 times better than the form with Q objects and
LEFT OUTER joins. (Do you see a way to optimize this even further (in
the Django ORM)?)
My initial assumption was that the OR in the all-in-one query would not
be much slower than the individual queries, and was really surprised to
learn otherwise...
> * Erfasst is meaning logging here, right? Reaching the limits of my German :)
Translated verbatim, "Erfasst" means "committed" or "recorded" -- which
obviously is a really stupid name for a database table. Alas, this is
the legacy that I got when I overtook the original PHP project, and
backwards-compatibility has so far kept me from updating the name.
What we really are recording here are "workdays". In my attempt to keep
the examples in my initial post both easy to understand and correct, I
only translated the table name for STAFF, but felt that I'd lose
consistency and thus correctness if things got more involved if I
translated ERFASST as well (especially with backwards relation names etc.)
So the above is really an attempt to find all STAFF members who are
assigned to a certain department regularly (via STAFF_BEREICHE), "OR"
those who have spent a workday in that department in the given time range.
(So new staff members who have not yet spent a single workday anywhere
would promptly be missing from any report and form with the
INNER-joins-only query...)
Tom, many thanks for your help!
Best regards,
Carsten
--
Dipl.-Inf. Carsten Fuchs
Carsten Fuchs Software
Industriegebiet 3, c/o Rofu, 55768 Hoppstädten-Weiersbach, Germany
Internet: http://www.cafu.de | E-Mail: info@cafu.de
Cafu - the open-source game and graphics engine for multiplayer 3D action
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
For more options, visit https://groups.google.com/groups/opt_out.
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home