twitter: comp.lang.python - 25 new messages in 13 topics

comp.lang.python
http://groups.google.com/group/comp.lang.python?hl=en

comp.lang.python@googlegroups.com

Today's topics:

* buffer interface problem - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/f6c819c5b45333e0?hl=en
* Pass multidimensional array (matrix) to c function using ctypes - 1 messages,
1 author
http://groups.google.com/group/comp.lang.python/t/eb94b01ce057cbe5?hl=en
* Where's a DOM builder that uses the Builder Pattern to ... build DOMs? - 2
messages, 2 authors
http://groups.google.com/group/comp.lang.python/t/043f14346941ca72?hl=en
* please help shrink this each_with_index() implementation - 1 messages, 1
author
http://groups.google.com/group/comp.lang.python/t/7f172b994506a26e?hl=en
* Python books, literature etc - 3 messages, 3 authors
http://groups.google.com/group/comp.lang.python/t/6ad7a20513065f9f?hl=en
* Exception as the primary error handling mechanism? - 2 messages, 2 authors
http://groups.google.com/group/comp.lang.python/t/7d6191ecba652daf?hl=en
* GUI for multiplatform multimedia project - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/84ea03c77f6b681a?hl=en
* Recommended "new" way for config files - 6 messages, 5 authors
http://groups.google.com/group/comp.lang.python/t/d0042aff58886724?hl=en
* How to execute a script from another script and other script does not do
busy wait. - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/98f0c614770304b7?hl=en
* Do I have to use threads? - 3 messages, 3 authors
http://groups.google.com/group/comp.lang.python/t/0c04059bd243a38b?hl=en
* ANN: Pymazon 0.1.0 released! - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/34514d8cea667ca0?hl=en
* Dictionary used to build a Triple Store - 2 messages, 2 authors
http://groups.google.com/group/comp.lang.python/t/3542b270637779cc?hl=en
* How to reduce the memory size of python - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/d2de46379f176e9a?hl=en

==============================================================================
TOPIC: buffer interface problem
http://groups.google.com/group/comp.lang.python/t/f6c819c5b45333e0?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Jan 7 2010 5:03 am
From: Chris Rebert

On Thu, Jan 7, 2010 at 4:47 AM, Andrew Gillanders
<andrew.gillanders@uqconnect.edu.au> wrote:
> On 07/01/2010, at 7:13 PM, Chris Rebert wrote:
>> On Thu, Jan 7, 2010 at 12:19 AM, Andrew Gillanders
>> <andrew.gillanders@uqconnect.edu.au> wrote:
>>>
>>> I have run into a problem running a Python script that is part of the
>>> TerraGear suite for building scenery for FlightGear. I am using Mac OS X
>>> 10.4, running Python (version 3.0.1) in a Unix terminal.
>>>
>>> The purpose of the script is to walk a directory tree, unzipping files,
>>> and
>>> passing the contents to an executable C program. The problem occurs here:
>>>
>>> gzin = GzipFile(fname, 'rb')
>>> data = gzin.readline()
>>> min_x,min_y = map(atoi,data.split()[:2])
>>>
>>> The input file, when uncompressed, is an ASCII file with a line with two
>>> numbers, then a line of four numbers, then many long lines of numbers. I
>>> can
>>> see what the last is trying to do: split the string into two words,
>>> convert
>>> them to integers, and assign them to min_x and min_y.
>>>
>>> At the third line, I get the message "expected an object with the buffer
>>> interface". Which object is it referring to?
>>
>> The elements of the list produced by `data.split()[:2]`, which are
>> either Unicode strings or bytestrings, neither of which are buffers.
>>
>>> Have some functions been
>>> changed to pass buffer objects instead of strings? How can I fix the
>>> source
>>> code to make it run?
>>
>> The error is being raised by the atoi() function (in the future,
>> please post the full Traceback, not just the final error message).
>> What module/library does your atoi() function come from (look for an
>> `import` statement mentioning it)?
>> The only functions by that name in the Python standard library both
>> operate on strings, not buffers, and thus can't be the same one your
>> code is using.
>>
>> In any case, replacing `atoi` with `int` in your code will likely
>> solve the problem. The built-in int() function* can convert strings to
>> integers.

> Thanks Chris. The atoi function was coming from the locale library (from
> locale import atoi). I changed it to int and now it works.

Hm, that's odd since it was one of the 2 functions in the std lib
which the docs say operates on strings...

> The next hurdle is this:
> gzin = GzipFile(fname, 'rb')
>
> data = gzin.readline()
> # min_x,min_y = map(atoi,data.split()[:2])
> min_x,min_y = map(int,data.split()[:2])
>
> data = gzin.readline()
> # span_x,step_x,span_y,step_y = map(atoi,data.split()[:4])
> span_x,step_x,span_y,step_y = map(int,data.split()[:4])
>
> data = gzin.read().split('\n')
>
> The last line is a problem, giving me this message: Type str doesn't support
> the buffer API (I am guessing a conflict between split and read?)

Ah, looking at the 3.0 docs on buffers, I'd surmise gzin.read()
returns bytes (http://docs.python.org/3.1/library/functions.html#bytes)
rather than a string.
You'll want to decode the bytes into characters first, and then you
can operate on the resulting string normally.
Try:

data = gzin.read().decode('ascii').split('\n')

> Sorry, I am new to Python, so how do I get a Traceback?

You should get one by default. Are you running the script in some
environment other than the command line?

Here's what a traceback looks like:

Traceback (most recent call last):
File "foo", line 161, in <module>
main()
File "foo.py", line 157, in main
bot.run()
File "foo.py", line 68, in bar
self.baz("Enter number: ")
File "foo.py", line 112, in baz
choice = int(raw_input(prompt))-1
ValueError: invalid literal for int() with base 10: 'y'

Cheers,
Chris
--
http://blog.rebertia.com

==============================================================================
TOPIC: Pass multidimensional array (matrix) to c function using ctypes
http://groups.google.com/group/comp.lang.python/t/eb94b01ce057cbe5?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Jan 7 2010 5:23 am
From: Daniel Platz

Thanks a lot. This solves my problem and I understand now much better
what is going on.

Best regards,

Daniel

==============================================================================
TOPIC: Where's a DOM builder that uses the Builder Pattern to ... build DOMs?
http://groups.google.com/group/comp.lang.python/t/043f14346941ca72?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Jan 7 2010 5:36 am
From: Stefan Behnel

Phlip, 05.01.2010 18:00:
> On Jan 5, 12:16 am, Stefan Behnel <stefan...@behnel.de> wrote:
>
>> Note that there are tons of ways to generate HTML with Python.
>
> Forgot to note - I'm generating schematic XML, and I'm trying to find
> a way better than the Django template I started with!

Well, then note that there are tons of ways to generate XML with Python,
including the one I pointed you to.

Stefan

== 2 of 2 ==
Date: Thurs, Jan 7 2010 8:44 am
From: Phlip

On Jan 7, 5:36 am, Stefan Behnel <stefan...@behnel.de> wrote:

> Well, then note that there are tons of ways to generate XML with Python,
> including the one I pointed you to.

from lxml.html import builder as E
xml = E.foo()

All I want is "<foo/>", but I get "AttributeError: 'module' object has
no attribute 'foo'".

A peek at dir(E) shows it only has HTML tags, all hard coded.

So how to get it to generate any random XML tag my clients think of?

I will write this myself with __getattr__ etc, if I can't find it,
because the permissive & expressive builder pattern I'm after would be
very ... permissive & expressive.

All I want is a library that reads my mind!!! Is that too much to
ask??? (Unless if the library insists on throwing a NullMind
exception, on principle...)

--
Phlip
http://twitter.com/Pen_Bird

==============================================================================
TOPIC: please help shrink this each_with_index() implementation
http://groups.google.com/group/comp.lang.python/t/7f172b994506a26e?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Jan 7 2010 5:37 am
From: Roel Schroeven

Phlip schreef:
> Nobody wrote:
>> Writing robust software from the outset puts you at a competitive
>> disadvantage to those who understand how the system works.
>
> And I, not my language, should pick and chose how to be rigorous. The language
> should not make the decision for me.

You can always easily make your program less rigorous than the language,
but the reverse is generally very difficult. So a rigorous language
gives you the choice, why a non-rigorous language does not.

--
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
-- Isaac Asimov

Roel Schroeven

==============================================================================
TOPIC: Python books, literature etc
http://groups.google.com/group/comp.lang.python/t/6ad7a20513065f9f?hl=en
==============================================================================

== 1 of 3 ==
Date: Thurs, Jan 7 2010 6:17 am
From: Stuart Murray-Smith

> Have a look at the Getting Started section of the wiki:
>
> http://wiki.python.org/moin/
>
> specially the PythonBooks section

Perfect! Exactly what I'm looking for :)

Thanks Gabriel!

== 2 of 3 ==
Date: Thurs, Jan 7 2010 9:22 am
From: Jorgen Grahn

On Thu, 2010-01-07, Stuart Murray-Smith wrote:
...
> [...] ESR's guide to
> smart questions [1] helps set the pace of list culture.

It's good, if you can ignore the "These People Are Very Important
Hacker Gods, Not Mere Mortals" subtext.

...
> Anyways, to rephrase, could someone kindly mention any of their
> preferred Python books, websites, tutorials etc to help me get to an
> intermediate/advanced level? Something that would help me add
> functionality to Ubiquity, say.

I may be alone in this, but Alex Martelli's book ("Python in a
nutshell"?) on Python 2.2 and a bit of 2.3, plus the official
documentation, plus this group, is all I think I need.
But I had a lot of Unix, C, C++ and Perl experience to help me.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

== 3 of 3 ==
Date: Thurs, Jan 7 2010 10:34 am
From: Peter

>> Anyways, to rephrase, could someone kindly mention any of their
>> preferred Python books, websites, tutorials etc to help me get to an
>> intermediate/advanced level? Something that would help me add
>> functionality to Ubiquity, say.
>>
> I may be alone in this, but Alex Martelli's book ("Python in a
> nutshell"?) on Python 2.2 and a bit of 2.3, plus the official
> documentation, plus this group, is all I think I need.
> But I had a lot of Unix, C, C++ and Perl experience to help me.
>
> /Jorgen
>
>
I find Alex Martellis "Python Cookbook" excellent/invaluable and ( and
also his Nutshell book mentioned above ) and depending on your
application domain, I liked:

1) Hans Petter Langtangen: Python Scripting for Computational Science
A truly excellent book, not only with respect to Python Scripting , but
also on how to avoid paying license fees by using opensource tools as
an engineer ( plotting, graphing, gui dev etc ). Very good , pratical
introduction to Python with careful and non-trivial examples and exercises.

2) There is a book at Apress on using Python and matplotlib ( amongst
other ) "Beginning Python Visualization" which is not as comprehensive
as reference 1) but useful , especially for beginners who wants to
visualize data from an engineers background

3) "Programming for the semantic web" Oreilly is a very pratical and
interesting guide to things like OWL, triplestore, logic, reasoning,
data mining and it is amongst the very few books on these topics I have
seen that has working code examples

4) "Natural language priocessing with Python " Oreilly is also a
pratical book with lots of working code if you are interested in data
mining, text searching and natural language tasks. It is based on a
rather large opensource library for natural language processing ( sorry
forgot the exact name,but easy to find on the net)

All these book make you feel warm and confortable if you have ever tried
to do these things in Perl, C++ or Java

Peter

==============================================================================
TOPIC: Exception as the primary error handling mechanism?
http://groups.google.com/group/comp.lang.python/t/7d6191ecba652daf?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Jan 7 2010 6:37 am
From: Lie Ryan

On 1/7/2010 10:43 PM, Roel Schroeven wrote:
> - I tend to think that not following that practice trains me to be
> careful in all cases, whereas I'm afraid that following the practice
> will make me careless, which is dangerous in all the cases where the
> practice won't protect me.
>

That's a sign of a gotcha... a well-designed language makes you think
about your problem at hand and less about the language's syntax.

== 2 of 2 ==
Date: Thurs, Jan 7 2010 6:56 am
From: Dave McCormick

Lie Ryan wrote:
> That's a sign of a gotcha... a well-designed language makes you think
> about your problem at hand and less about the language's syntax.
Not until you learn the language that is.
From a Python newbee.... ;-)

==============================================================================
TOPIC: GUI for multiplatform multimedia project
http://groups.google.com/group/comp.lang.python/t/84ea03c77f6b681a?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Jan 7 2010 7:26 am
From: CM

On Jan 6, 4:53 pm, <trzewic...@trzewiczek.info> wrote:
> Hi everyone,
>
> I posted that question on a python-forum, but got answer, so I ask here.
>
> I'm working on an artistic project and I'm looking for the best
> cross-platform GUI solution. The problem is that it's gonna be a tool that
> will have to be double-click installable/runnable and pre-installation of
> any libraries for end-users is very much like an evil. It really has to be
> double-click tool
>
> My first thought was PyQt, because it's a real framework with a lot of
> stuff inside (including Phonon) and I know some cross-platform media
> software written in C++ QT (like VLC). But on the other hand I've heard
> that it's not that easy to make it "double-clicky" multi-platform. Is that
> true?
>
> Another thing that matters for me is ease of integration with libraries
> like OpenCV.
>
> I will be VERY thankful for any help. I'm SO tired googling the problem
> (it's like weeks now!!)
>
> Best from Poland,
> trzewiczek

I don't know this for sure, but I would be surprised if any of the
widget
toolkits gave you much more trouble than any other when making your
app
into a bundled executable.

I have made wxPython apps into a Windows .exe file easily using
GUI2Exe,
which is an excellent GUI interface (written in wxPython by Andrea
Gavana)
to a number of the executable bundlers: py2exe, PyInstaller, py2app,
cx_Freeze, bbFreeze. Some of these are for Windows, some for Mac,
some
for Linux.

wxPython apparently works with OpenCV:
http://opencv.willowgarage.com/wiki/wxpython

While you're Googling you might want to be aware of any legal
concerns with py2exe and distributing dll files (if there are
any).

==============================================================================
TOPIC: Recommended "new" way for config files
http://groups.google.com/group/comp.lang.python/t/d0042aff58886724?hl=en
==============================================================================

== 1 of 6 ==
Date: Thurs, Jan 7 2010 8:10 am
From: Peter

Hi
There seems to be several strategies to enhance the old ini-style config
files with real python code, for example:

1) the documentation tool sphinx uses a python file conf.py that is
exefile(d) , but execfile is suppressed in Python 3
2) there is a module cfgparse on sourceforge that supports a hybrid style
3) modern tools like ipython seems to favor a new style based on python
code config files but also support a hybrid style mixing .ini files and
python code files.
4) I could use __import__ to import modules based on some command line
options

Is there a strategy that should be prefered for new projects ?

thanks
peter

== 2 of 6 ==
Date: Thurs, Jan 7 2010 8:30 am
From: Lie Ryan

On 1/8/2010 3:10 AM, Peter wrote:
> Is there a strategy that should be prefered for new projects ?

The answer is, it depends.

== 3 of 6 ==
Date: Thurs, Jan 7 2010 8:32 am
From: Jean-Michel Pichavant

Peter wrote:
> Hi
> There seems to be several strategies to enhance the old ini-style
> config files with real python code, for example:
>
> 1) the documentation tool sphinx uses a python file conf.py that is
> exefile(d) , but execfile is suppressed in Python 3
> 2) there is a module cfgparse on sourceforge that supports a hybrid style
> 3) modern tools like ipython seems to favor a new style based on
> python code config files but also support a hybrid style mixing .ini
> files and python code files.
> 4) I could use __import__ to import modules based on some command line
> options
>
>
> Is there a strategy that should be prefered for new projects ?
>
> thanks
> peter
I would add the standard module ConfigParser
http://docs.python.org/library/configparser.html to your list.
I don't know exactly what you intend to do with point 4/, but I would
exclude it if any other point may fit. Imports can become tricky when
used out of the common way. Anyway, hacking the import statement for
managing configuration files does not sound very appropriate.

The .ini file is the simpliest solution, at least from the user point of
view, no need to learn any python syntax.
However, speeking for myself, I am using python coded configuration
files, but: we all worship python in the team and thus are familiar with
it.

== 4 of 6 ==
Date: Thurs, Jan 7 2010 8:38 am
From: Robert Kern

On 2010-01-07 10:10 AM, Peter wrote:
> Hi
> There seems to be several strategies to enhance the old ini-style config
> files with real python code, for example:
>
> 1) the documentation tool sphinx uses a python file conf.py that is
> exefile(d) , but execfile is suppressed in Python 3

Only because it is redundant, not because it is a discouraged approach. You can
still read the file and exec() the resulting string.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

== 5 of 6 ==
Date: Thurs, Jan 7 2010 10:19 am
From: Peter

Thanks for your answer, let me be more precise:

> I would add the standard module ConfigParser
> http://docs.python.org/library/configparser.html to your list.
of course, that was the implicit starting point of my request, when
talking about .ini files.
> I don't know exactly what you intend to do with point 4/,
It would allow me to select different conf.py files with command line
switches, like for example a -c <alternative conf file> option.
> but I would exclude it if any other point may fit. Imports can become
> tricky when used out of the common way. Anyway, hacking the import
> statement for managing configuration files does not sound very
> appropriate.
>
Would this be considered a hack ?

#!/usr/bin/env python

import sys

# parse command line options here

if option='standard':
const = __import__('consts')
else:
const = __import__('alternative_consts')

> The .ini file is the simpliest solution, at least from the user point
> of view, no need to learn any python syntax.
I am speaking from the point of view of a python programmer, and I find
the .ini restrictions not necessarily simple, for example when dealing
with structured data (I suppose it is trivial to specify a dictionnary
or a list for the purpose of my request) For example, configuration
files for the logging module get unwieldy when you specify several
loggers , handlers, formatters etc, because you have to break down
structured data ( objects ) to name,value pairs.
> However, speeking for myself, I am using python coded configuration
> files, but: we all worship python in the team and thus are familiar
> with it.
>
so do I.
> JM
>
>
So what is the "worshipped" approach, when you need more than name=value
pairs ?

Peter

== 6 of 6 ==
Date: Thurs, Jan 7 2010 10:23 am
From: Chris Rebert

On Thu, Jan 7, 2010 at 10:19 AM, Peter <vmail@mycircuit.org> wrote:
<snip>
>> The .ini file is the simpliest solution, at least from the user point of
>> view, no need to learn any python syntax.
>
> I am speaking from the point of view of a python programmer, and I find the
> .ini restrictions not necessarily simple, for example when dealing with
> structured data (I suppose it is trivial to specify a dictionnary or a list
> for the purpose of my request) For example, configuration files for the
> logging module get unwieldy when you specify several loggers , handlers,
> formatters etc, because you have to break down structured data ( objects )
> to name,value pairs.
<snip>
> So what is the "worshipped" approach, when you need more than name=value
> pairs ?

JSON is one option: http://docs.python.org/library/json.html

Cheers,
Chris
--
http://blog.rebertia.com

==============================================================================
TOPIC: How to execute a script from another script and other script does not
do busy wait.
http://groups.google.com/group/comp.lang.python/t/98f0c614770304b7?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Jan 7 2010 8:18 am
From: Jorgen Grahn

On Thu, 2010-01-07, Rajat wrote:
> I want to run a python script( aka script2) from another python script
> (aka script1). While script1 executes script2 it waits for script2 to
> complete and in doing so it also does some other useful work.(does not
> do a busy wait).
>
> My intention is to update a third party through script1 that script2
> is going to take longer.

I do not understand that sentence.
What are you trying to do, more exactly? The best solution can be
threads, os.popen, os.system or something different -- depending on
the details of what you want to do.

> Please suggest how should I go about implementing it.
>
> I'm currently executing it as:
>
> import main from script2
> ret_code = main()
> return ret_code
>
> which surely is not going to achieve me what I intend.
>
>
> Thanks,
> Rajat.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

==============================================================================
TOPIC: Do I have to use threads?
http://groups.google.com/group/comp.lang.python/t/0c04059bd243a38b?hl=en
==============================================================================

== 1 of 3 ==
Date: Thurs, Jan 7 2010 8:32 am
From: Jorgen Grahn

On Thu, 2010-01-07, Marco Salden wrote:
> On Jan 6, 5:36�am, Philip Semanchuk <phi...@semanchuk.com> wrote:
>> On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:
>>
>> > Hello people,
>>
>> > I have 5 directories corresponding 5 �different urls .I want to �
>> > download
>> > images from those urls and place them in the respective �
>> > directories.I have
>> > to extract the contents and download them simultaneously.I can �
>> > extract the
>> > contents and do then one by one. My questions is for doing it �
>> > simultaneously
>> > do I have to use threads?
>>
>> No. You could spawn 5 copies of wget (or curl or a Python program that �
>> you've written). Whether or not that will perform better or be easier �
>> to code, debug and maintain depends on the other aspects of your �
>> program(s).
>>
>> bye
>> Philip
>
> Yep, the more easier and straightforward the approach, the better:
> threads are always (programmers')-error-prone by nature.
> But my question would be: does it REALLY need to be simultaneously:
> the CPU/OS only has more overhead doing this in parallel with
> processess. Measuring sequential processing and then trying to
> optimize (e.g. for user response or whatever) would be my prefered way
> to go. Less=More.

Normally when you do HTTP in parallell over several TCP sockets, it
has nothing to do with CPU overhead. You just don't want every GET to
be delayed just because the server(s) are lazy responding to the first
few ones; or you might want to read the text of a web page and the CSS
before a few huge pictures have been downloaded.

His "I have to [do them] simultaneously" makes me want to ask "Why?".

If he's expecting *many* pictures, I doubt that the parallel download
will buy him much. Reusing the same TCP socket for all of them is
more likely to help, especially if the pictures aren't tiny. One
long-lived TCP connection is much more efficient than dozens of
short-lived ones.

Personally, I'd popen() wget and let it do the job for me.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

== 2 of 3 ==
Date: Thurs, Jan 7 2010 9:38 am
From: MRAB

Jorgen Grahn wrote:
> On Thu, 2010-01-07, Marco Salden wrote:
>> On Jan 6, 5:36 am, Philip Semanchuk <phi...@semanchuk.com> wrote:
>>> On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:
>>>
>>>> Hello people,
>>>> I have 5 directories corresponding 5 different urls .I want to
>>>> download
>>>> images from those urls and place them in the respective
>>>> directories.I have
>>>> to extract the contents and download them simultaneously.I can
>>>> extract the
>>>> contents and do then one by one. My questions is for doing it
>>>> simultaneously
>>>> do I have to use threads?
>>> No. You could spawn 5 copies of wget (or curl or a Python program that
>>> you've written). Whether or not that will perform better or be easier
>>> to code, debug and maintain depends on the other aspects of your
>>> program(s).
>>>
>>> bye
>>> Philip
>> Yep, the more easier and straightforward the approach, the better:
>> threads are always (programmers')-error-prone by nature.
>> But my question would be: does it REALLY need to be simultaneously:
>> the CPU/OS only has more overhead doing this in parallel with
>> processess. Measuring sequential processing and then trying to
>> optimize (e.g. for user response or whatever) would be my prefered way
>> to go. Less=More.
>
> Normally when you do HTTP in parallell over several TCP sockets, it
> has nothing to do with CPU overhead. You just don't want every GET to
> be delayed just because the server(s) are lazy responding to the first
> few ones; or you might want to read the text of a web page and the CSS
> before a few huge pictures have been downloaded.
>
> His "I have to [do them] simultaneously" makes me want to ask "Why?".
>
> If he's expecting *many* pictures, I doubt that the parallel download
> will buy him much. Reusing the same TCP socket for all of them is
> more likely to help, especially if the pictures aren't tiny. One
> long-lived TCP connection is much more efficient than dozens of
> short-lived ones.
>
> Personally, I'd popen() wget and let it do the job for me.
>
From my own experience:

I wanted to download a number of webpages.

I noticed that there was a significant delay before it would reply, and
an especially long delay for one of them, so I used a number of threads,
each one reading a URL from a queue, performing the download, and then
reading the next URL, until there were none left (actually, until it
read the sentinel None, which it put back for the other threads).

The result?

Shorter total download time because it could be downloading one webpage
while waiting for another to reply.

(Of course, I had to make sure that I didn't have too many threads,
because that might've put too many demands on the website, not a nice
thing to do!)

== 3 of 3 ==
Date: Thurs, Jan 7 2010 9:53 am
From: Philip Semanchuk

On Jan 7, 2010, at 11:32 AM, Jorgen Grahn wrote:

> On Thu, 2010-01-07, Marco Salden wrote:
>> On Jan 6, 5:36 am, Philip Semanchuk <phi...@semanchuk.com> wrote:
>>> On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:
>>>
>>>> Hello people,
>>>
>>>> I have 5 directories corresponding 5 different urls .I want to
>>>> download
>>>> images from those urls and place them in the respective
>>>> directories.I have
>>>> to extract the contents and download them simultaneously.I can
>>>> extract the
>>>> contents and do then one by one. My questions is for doing it
>>>> simultaneously
>>>> do I have to use threads?
>>>
>>> No. You could spawn 5 copies of wget (or curl or a Python program
>>> that
>>> you've written). Whether or not that will perform better or be
>>> easier
>>> to code, debug and maintain depends on the other aspects of your
>>> program(s).
>>>
>>> bye
>>> Philip
>>
>> Yep, the more easier and straightforward the approach, the better:
>> threads are always (programmers')-error-prone by nature.
>> But my question would be: does it REALLY need to be simultaneously:
>> the CPU/OS only has more overhead doing this in parallel with
>> processess. Measuring sequential processing and then trying to
>> optimize (e.g. for user response or whatever) would be my prefered
>> way
>> to go. Less=More.
>
> Normally when you do HTTP in parallell over several TCP sockets, it
> has nothing to do with CPU overhead. You just don't want every GET to
> be delayed just because the server(s) are lazy responding to the first
> few ones; or you might want to read the text of a web page and the CSS
> before a few huge pictures have been downloaded.
>
> His "I have to [do them] simultaneously" makes me want to ask "Why?".

Exactly what I was thinking. He's surely doing something more
complicated than his post suggests, and without that detail it's
impossible to say whether threads, processes, asynch or voodoo is the
best approach.

bye
P

==============================================================================
TOPIC: ANN: Pymazon 0.1.0 released!
http://groups.google.com/group/comp.lang.python/t/34514d8cea667ca0?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Jan 7 2010 10:13 am
From: "S. Chris Colbert"

Hello,

I'm happy to announce the first non-beta release of Pymazon: a python
implemented downloader for the Amazon mp3 store.

Improvements from the beta:
- Running download status indicator
- Various fixes for Windows
- Some code cleanup

Pymazon was created to be a simple and easy alternative for the Linux version
of the Amazon downloader, and alleviate the pain of getting it to work with
64bit Linux.

You can read about Pymazon at http://pymazon.googlecode.com

You can download from googlecode or the cheeseshop:

$ pip install pymazon

$ easy_install pymazon

It also works on Windows.

Dependencies:
PyCrypto (it's in the ubuntu repos and the cheeseshop)
PyQt4 >= 4.5 (optional, only needed for GUI)

GPLv3 License

Cheers!

SCC

==============================================================================
TOPIC: Dictionary used to build a Triple Store
http://groups.google.com/group/comp.lang.python/t/3542b270637779cc?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Jan 7 2010 10:22 am
From: Lee

Definitely a newbie question, so please bear with me.

I'm reading "Programming the Semantic Web" by Segaran, Evans, and Tayor.

It's about the Semantic Web BUT it uses python to build a "toy" triple
store claimed to have good performance in the "tens of thousands" of
triples.

Just in case anybody doesnt know what an RDF triple is (not that it
matters for my question) think of it as an ordered 3 tuple representing
a Subject, a Predicate, and an Object eg: (John, loves, Mary) (Mary,
has-a, lamb) {theSky, has-color,blue}

To build the triple store entirely in Python, the authors recommend
using the Python hash. Three hashes actually (I get that. You
want to have a hash with the major index being the Subject in one hash,
the Predicate in another hash, or the Object for the third hash)

He creates a class SimpleGraph which initializes itself by setting the
three hashes names _spo, _pos, and _osp thus

class SimpleGraph;
def __init__(self);
self._spo={};
self._pos=();
self._osp={};

So far so good. I get the convention with the double underbars for the
initializer but

Q1: Not the main question but while I'm here....I'm a little fuzzy on
the convention about the use of the single underbar in the definition of
the hashes. Id the idea to "underbar" all objects and methods that
belong to the class? Why do that?

But now the good stuff:

Our authors define the hashes thus: (showing only one of the three
hashes because they're all the same idea)

self._pos = {predicate:{object:set( [subject] ) }}

Q2: Wha? Two surprises ...
1) Why not {predicate:{object:subject}} i.e.
pos[predicate][object]=subject....why the set( [object] ) construct?
putting the object into a list and turning the list into a set to be the
"value" part of a name:value pair. Why not just use the naked subject
for the value?

2) Why not something like pos[predicate][object][subject] = 1
.....or any constant. The idea being to create the set of three indexes.
If the triple exists in the hash, its "in" your tripple store. If not,
then there's no such triple.

== 2 of 2 ==
Date: Thurs, Jan 7 2010 10:39 am
From: "S. Chris Colbert"

> Definitely a newbie question, so please bear with me.
>
> I'm reading "Programming the Semantic Web" by Segaran, Evans, and Tayor.
>
> It's about the Semantic Web BUT it uses python to build a "toy" triple
> store claimed to have good performance in the "tens of thousands" of
> triples.
>
> Just in case anybody doesnt know what an RDF triple is (not that it
> matters for my question) think of it as an ordered 3 tuple representing
> a Subject, a Predicate, and an Object eg: (John, loves, Mary) (Mary,
> has-a, lamb) {theSky, has-color,blue}
>
> To build the triple store entirely in Python, the authors recommend
> using the Python hash. Three hashes actually (I get that. You
> want to have a hash with the major index being the Subject in one hash,
> the Predicate in another hash, or the Object for the third hash)
>
> He creates a class SimpleGraph which initializes itself by setting the
> three hashes names _spo, _pos, and _osp thus
>
> class SimpleGraph;
> def __init__(self);
> self._spo={};
> self._pos=();
> self._osp={};
>
> So far so good. I get the convention with the double underbars for the
> initializer but
>
> Q1: Not the main question but while I'm here....I'm a little fuzzy on
> the convention about the use of the single underbar in the definition of
> the hashes. Id the idea to "underbar" all objects and methods that
> belong to the class? Why do that?
>
> But now the good stuff:
>
> Our authors define the hashes thus: (showing only one of the three
> hashes because they're all the same idea)
>
> self._pos = {predicate:{object:set( [subject] ) }}
>
> Q2: Wha? Two surprises ...
> 1) Why not {predicate:{object:subject}} i.e.
> pos[predicate][object]=subject....why the set( [object] ) construct?
> putting the object into a list and turning the list into a set to be the
> "value" part of a name:value pair. Why not just use the naked subject
> for the value?
>
because the argument has to be iterable.

In [1]: set(1)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)

/home/brucewayne/<ipython console> in <module>()

TypeError: 'int' object is not iterable

> 2) Why not something like pos[predicate][object][subject] = 1
> .....or any constant. The idea being to create the set of three indexes.
> If the triple exists in the hash, its "in" your tripple store. If not,
> then there's no such triple.
>
I can't really answer that, I imagine there is a better way to code what is
trying to be accomplished. But I'm no Steven D'Aprano and I'm already a few
beers in ;)

==============================================================================
TOPIC: How to reduce the memory size of python
http://groups.google.com/group/comp.lang.python/t/d2de46379f176e9a?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Jan 7 2010 10:31 am
From: Terry Reedy

On 1/7/2010 3:34 AM, Mishra Gopal-QBX634 wrote:
>

> Like import logging takes 1MB of memory.
> We only use on function getLogger by 'from logging import getLogger'
>
> But it still take the same 1 MB memory.
>
> Instead of loading whole logging module only load the getLogger
> function.

from x import y

causes creation of module x and binding of the module to sys.modules'x].
It then binds name 'y' in the current namespace to the corresponding
object in x. Functions in general need a reference to the module
namespace to resolve module-level variables.

To save anything, you must cut the function out of the module and verify
that it works in isolation. But I presume 'getLogger' refers to other
stuff in the logging module and would not work in isolation.

Terry Jan Reedy

==============================================================================

You received this message because you are subscribed to the Google Groups "comp.lang.python"
group.

To post to this group, visit http://groups.google.com/group/comp.lang.python?hl=en

To unsubscribe from this group, send email to comp.lang.python+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/comp.lang.python/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

twitter

Thursday, January 7, 2010

comp.lang.python - 25 new messages in 13 topics - digest

0 Comments:

Post a Comment

About Me

Previous Posts