Tuesday, November 22, 2011

Re: Large Queryset Calculation In Background?

You will definitely need to look into caching those results, perhaps "permanently" in a database.

My recommendation is redis[1] and possibly tools like sebleier's django-redis-cache[2]. Cache invalidation is a pain, I know, but it's pretty much the only way to go.

Long term, you will need to profile the bottlenecks and dive into the django generated SQL to find if you can tune it by refactoring, and possibly switching to either .raw() or .sql() in some cases.

There are plenty of presentations from python/django conferences out there that touch on the subject of ORM optimization, so don't be afraid to google.

Good luck!


Cheers,
AT


[2] https://github.com/sebleier/django-redis-cache

On Tue, Nov 22, 2011 at 9:04 PM, Nikolas Stevenson-Molnar <nik.molnar@consbio.org> wrote:
I wouldn't expect it to lock the database (though someone with more database expertise should address that). I would expect it to consume significant CPU. If you're on UNIX, you could address this issue by making your process 'nice': http://docs.python.org/library/os.html#os.nice The nicer a process (higher the value), the less CPU it will hog. IIRC, nice values default to 0 for processes and range from -20 (biggest CPU usage) to +20 (smallest CPU usage).

_Nik


On 11/22/2011 2:37 PM, Nan wrote:
Hi folks --  I need to run a fairly CPU-intensive calculation nightly over a dataset that's already large and growing quickly.  I'm planning to run this via a cron job, but would like to make sure that it neither eats up the entire CPU nor locks the database, so that my site can continue functioning in the meantime.  The rough outline of what it needs to do is as follows:  class OtherThing(models.Model):     anotherthing = models.ManyToManyField(Whatever)     ...  class Thing(models.Model):     other_things = models.ManyToManyField(OtherThing, through='SomethingElse')     ...  for thing in Thing.objects.select_related('other_things', 'other_things__anotherthing__etc'):     calculated = calculation_on_thing_and_its_otherthings(thing) # this mainly involves serialization to a great depth     thing.calculated_data = calculated     thing.save()  Will the above approach lock the database for a while or eat tons of CPU?  Any suggestions?  I'm using Django 1.2, btw.  Thanks, -Nan  

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate