Wednesday, April 7, 2010

comp.lang.python - 25 new messages in 8 topics - digest

comp.lang.python
http://groups.google.com/group/comp.lang.python?hl=en

comp.lang.python@googlegroups.com

Today's topics:

* Striving for PEP-8 compliance - 3 messages, 2 authors
http://groups.google.com/group/comp.lang.python/t/08f3e64e10fdf47e?hl=en
* Regex driving me crazy... - 13 messages, 5 authors
http://groups.google.com/group/comp.lang.python/t/6511effbbbfc5584?hl=en
* ftp and python - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/6183a96d88f4420b?hl=en
* Python and Regular Expressions - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/888b3fe934e2c5e2?hl=en
* Performance of list vs. set equality operations - 3 messages, 3 authors
http://groups.google.com/group/comp.lang.python/t/818d143c7e9550bc?hl=en
* Tkinter inheritance mess? - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/a0146ef7d08ffea0?hl=en
* raise exception with fake filename and linenumber - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/187e8a1d660e5a25?hl=en
* Profiling: Interpreting tottime - 2 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/31995629b8111cd0?hl=en

==============================================================================
TOPIC: Striving for PEP-8 compliance
http://groups.google.com/group/comp.lang.python/t/08f3e64e10fdf47e?hl=en
==============================================================================

== 1 of 3 ==
Date: Wed, Apr 7 2010 5:33 pm
From: Lawrence D'Oliveiro


In message <mailman.1599.1270652040.23598.python-list@python.org>, Tom Evans
wrote:

> I've written a bunch of internal libraries for my company, and they
> all use two space indents, and I'd like to be more consistent and
> conform to PEP-8 as much as I can.

"A foolish consistency is the hobgoblin of little minds"
— Ralph Waldo Emerson


== 2 of 3 ==
Date: Wed, Apr 7 2010 5:35 pm
From: Lawrence D'Oliveiro


In message <mailman.1610.1270655932.23598.python-list@python.org>, Gabriel
Genellina wrote:

> If you only reindent the code (without adding/removing lines) then you can
> compare the compiled .pyc files (excluding the first 8 bytes that contain
> a magic number and the source file timestamp). Remember that code objects
> contain line number information.

Anybody who ever creates another indentation-controlled language should be
beaten to death with a Guido van Rossum voodoo doll.


== 3 of 3 ==
Date: Wed, Apr 7 2010 6:06 pm
From: Chris Rebert


On Wed, Apr 7, 2010 at 5:35 PM, Lawrence D'Oliveiro <@> wrote:
> In message <mailman.1610.1270655932.23598.python-list@python.org>, Gabriel
> Genellina wrote:
>
>> If you only reindent the code (without adding/removing lines) then you can
>> compare the compiled .pyc files (excluding the first 8 bytes that contain
>> a magic number and the source file timestamp). Remember that code objects
>> contain line number information.
>
> Anybody who ever creates another indentation-controlled language should be
> beaten to death with a Guido van Rossum voodoo doll.

I'll go warn Don Syme. :P I wonder how Microsoft will react.
http://blogs.msdn.com/dsyme/archive/2006/08/24/715626.aspx

Cheers,
Chris
--
http://blog.rebertia.com/2010/01/24/of-braces-and-semicolons/

==============================================================================
TOPIC: Regex driving me crazy...
http://groups.google.com/group/comp.lang.python/t/6511effbbbfc5584?hl=en
==============================================================================

== 1 of 13 ==
Date: Wed, Apr 7 2010 5:49 pm
From: Patrick Maupin


On Apr 7, 4:40 pm, J <dreadpiratej...@gmail.com> wrote:
> Can someone make me un-crazy?
>
> I have a bit of code that right now, looks like this:
>
> status = getoutput('smartctl -l selftest /dev/sda').splitlines()[6]
>         status = re.sub(' (?= )(?=([^"]*"[^"]*")*[^"]*$)', ":",status)
>         print status
>
> Basically, it pulls the first actual line of data from the return you
> get when you use smartctl to look at a hard disk's selftest log.
>
> The raw data looks like this:
>
> # 1  Short offline       Completed without error       00%       679         -
>
> Unfortunately, all that whitespace is arbitrary single space
> characters.  And I am interested in the string that appears in the
> third column, which changes as the test runs and then completes.  So
> in the example, "Completed without error"
>
> The regex I have up there doesn't quite work, as it seems to be
> subbing EVERY space (or at least in instances of more than one space)
> to a ':' like this:
>
> # 1: Short offline:::::: Completed without error:::::: 00%:::::: 679:::::::: -
>
> Ultimately, what I'm trying to do is either replace any space that is> one space wiht a delimiter, then split the result into a list and
>
> get the third item.
>
> OR, if there's a smarter, shorter, or better way of doing it, I'd love to know.
>
> The end result should pull the whole string in the middle of that
> output line, and then I can use that to compare to a list of possible
> output strings to determine if the test is still running, has
> completed successfully, or failed.
>
> Unfortunately, my google-fu fails right now, and my Regex powers were
> always rather weak anyway...
>
> So any ideas on what the best way to proceed with this would be?

You mean like this?

>>> import re
>>> re.split(' {2,}', '# 1 Short offline Completed without error 00%')
['# 1', 'Short offline', 'Completed without error', '00%']
>>>

Regards,
Pat


== 2 of 13 ==
Date: Wed, Apr 7 2010 5:50 pm
From: Patrick Maupin


On Apr 7, 4:47 pm, Grant Edwards <inva...@invalid.invalid> wrote:
> On 2010-04-07, J <dreadpiratej...@gmail.com> wrote:
>
> > Can someone make me un-crazy?
>
> Definitely.  Regex is driving you crazy, so don't use a regex.
>
>   inputString = "# 1  Short offline       Completed without error     00%       679         -"
>
>   print ' '.join(inputString.split()[4:-3])
>
> > So any ideas on what the best way to proceed with this would be?
>
> Anytime you have a problem with a regex, the first thing you should
> ask yourself:  "do I really, _really_ need a regex?
>
> Hint: the answer is usually "no".
>
> --
> Grant Edwards               grant.b.edwards        Yow! I'm continually AMAZED
>                                   at               at th'breathtaking effects
>                               gmail.com            of WIND EROSION!!

OK, fine. Post a better solution to this problem than:

>>> import re
>>> re.split(' {2,}', '# 1 Short offline Completed without error 00%')
['# 1', 'Short offline', 'Completed without error', '00%']
>>>

Regards,
Pat


== 3 of 13 ==
Date: Wed, Apr 7 2010 6:03 pm
From: Patrick Maupin


On Apr 7, 7:49 pm, Patrick Maupin <pmau...@gmail.com> wrote:
> On Apr 7, 4:40 pm, J <dreadpiratej...@gmail.com> wrote:
>
>
>
> > Can someone make me un-crazy?
>
> > I have a bit of code that right now, looks like this:
>
> > status = getoutput('smartctl -l selftest /dev/sda').splitlines()[6]
> >         status = re.sub(' (?= )(?=([^"]*"[^"]*")*[^"]*$)', ":",status)
> >         print status
>
> > Basically, it pulls the first actual line of data from the return you
> > get when you use smartctl to look at a hard disk's selftest log.
>
> > The raw data looks like this:
>
> > # 1  Short offline       Completed without error       00%       679         -
>
> > Unfortunately, all that whitespace is arbitrary single space
> > characters.  And I am interested in the string that appears in the
> > third column, which changes as the test runs and then completes.  So
> > in the example, "Completed without error"
>
> > The regex I have up there doesn't quite work, as it seems to be
> > subbing EVERY space (or at least in instances of more than one space)
> > to a ':' like this:
>
> > # 1: Short offline:::::: Completed without error:::::: 00%:::::: 679:::::::: -
>
> > Ultimately, what I'm trying to do is either replace any space that is> one space wiht a delimiter, then split the result into a list and
>
> > get the third item.
>
> > OR, if there's a smarter, shorter, or better way of doing it, I'd love to know.
>
> > The end result should pull the whole string in the middle of that
> > output line, and then I can use that to compare to a list of possible
> > output strings to determine if the test is still running, has
> > completed successfully, or failed.
>
> > Unfortunately, my google-fu fails right now, and my Regex powers were
> > always rather weak anyway...
>
> > So any ideas on what the best way to proceed with this would be?
>
> You mean like this?
>
> >>> import re
> >>> re.split(' {2,}', '# 1  Short offline       Completed without error       00%')
>
> ['# 1', 'Short offline', 'Completed without error', '00%']
>
>
>
> Regards,
> Pat

BTW, although I find it annoying when people say "don't do that" when
"that" is a perfectly good thing to do, and although I also find it
annoying when people tell you what not to do without telling you what
*to* do, and although I find the regex solution to this problem to be
quite clean, the equivalent non-regex solution is not terrible, so I
will present it as well, for your viewing pleasure:

>>> [x for x in '# 1 Short offline Completed without error 00%'.split(' ') if x.strip()]
['# 1', 'Short offline', ' Completed without error', ' 00%']

Regards,
Pat


== 4 of 13 ==
Date: Wed, Apr 7 2010 7:02 pm
From: James Stroud


Patrick Maupin wrote:
> BTW, although I find it annoying when people say "don't do that" when
> "that" is a perfectly good thing to do, and although I also find it
> annoying when people tell you what not to do without telling you what
> *to* do, and although I find the regex solution to this problem to be
> quite clean, the equivalent non-regex solution is not terrible

I propose a new way to answer questions on c.l.python that will (1) give respondents the pleasure of vague admonishment and (2) actually answer the question. The way I propose utilizes the double negative. For example:

"You are doing it wrong! Don't not do <code>re.split('\s{2,}', s[2])</code>."

Please answer this way in the future.

Thank you,
James


== 5 of 13 ==
Date: Wed, Apr 7 2010 7:10 pm
From: Patrick Maupin


On Apr 7, 9:02 pm, James Stroud <nospamjstroudmap...@mbi.ucla.edu>
wrote:
> Patrick Maupin wrote:
> > BTW, although I find it annoying when people say "don't do that" when
> > "that" is a perfectly good thing to do, and although I also find it
> > annoying when people tell you what not to do without telling you what
> > *to* do, and although I find the regex solution to this problem to be
> > quite clean, the equivalent non-regex solution is not terrible
>
> I propose a new way to answer questions on c.l.python that will (1) give respondents the pleasure of vague admonishment and (2) actually answer the question. The way I propose utilizes the double negative. For example:
>
> "You are doing it wrong! Don't not do <code>re.split('\s{2,}', s[2])</code>."
>
> Please answer this way in the future.

I most certainly will not consider when that isn't warranted!

OTOH, in general I am more interested in admonishing the authors of
the pseudo-answers than I am the authors of the questions, despite the
fact that I find this hilarious:

http://despair.com/cluelessness.html

Regards,
Pat


== 6 of 13 ==
Date: Wed, Apr 7 2010 7:36 pm
From: Grant Edwards


On 2010-04-08, Patrick Maupin <pmaupin@gmail.com> wrote:
> On Apr 7, 4:47?pm, Grant Edwards <inva...@invalid.invalid> wrote:
>> On 2010-04-07, J <dreadpiratej...@gmail.com> wrote:
>>
>> > Can someone make me un-crazy?
>>
>> Definitely. ?Regex is driving you crazy, so don't use a regex.
>>
>> ? inputString = "# 1 ?Short offline ? ? ? Completed without error ? ? 00% ? ? ? 679 ? ? ? ? -"
>>
>> ? print ' '.join(inputString.split()[4:-3])
[...]

> OK, fine. Post a better solution to this problem than:
>
>>>> import re
>>>> re.split(' {2,}', '# 1 Short offline Completed without error 00%')
> ['# 1', 'Short offline', 'Completed without error', '00%']

OK, I'll bite: what's wrong with the solution I already posted?

--
Grant

== 7 of 13 ==
Date: Wed, Apr 7 2010 7:38 pm
From: Grant Edwards


On 2010-04-08, James Stroud <nospamjstroudmapson@mbi.ucla.edu> wrote:
> Patrick Maupin wrote:
>> BTW, although I find it annoying when people say "don't do that" when
>> "that" is a perfectly good thing to do, and although I also find it
>> annoying when people tell you what not to do without telling you what
>> *to* do, and although I find the regex solution to this problem to be
>> quite clean, the equivalent non-regex solution is not terrible
>
> I propose a new way to answer questions on c.l.python that will (1) give respondents the pleasure of vague admonishment and (2) actually answer the question. The way I propose utilizes the double negative. For example:
>
> "You are doing it wrong! Don't not do <code>re.split('\s{2,}', s[2])</code>."
>
> Please answer this way in the future.

I will certain try to avoid not answering in a manner not unlike that.

--
Grant


== 8 of 13 ==
Date: Wed, Apr 7 2010 7:45 pm
From: Patrick Maupin


On Apr 7, 9:36 pm, Grant Edwards <inva...@invalid.invalid> wrote:
> On 2010-04-08, Patrick Maupin <pmau...@gmail.com> wrote:> On Apr 7, 4:47?pm, Grant Edwards <inva...@invalid.invalid> wrote:
> >> On 2010-04-07, J <dreadpiratej...@gmail.com> wrote:
>
> >> > Can someone make me un-crazy?
>
> >> Definitely. ?Regex is driving you crazy, so don't use a regex.
>
> >> ? inputString = "# 1 ?Short offline ? ? ? Completed without error ? ? 00% ? ? ? 679 ? ? ? ? -"
>
> >> ? print ' '.join(inputString.split()[4:-3])
>
> [...]
>
> > OK, fine.  Post a better solution to this problem than:
>
> >>>> import re
> >>>> re.split(' {2,}', '# 1  Short offline       Completed without error       00%')
> > ['# 1', 'Short offline', 'Completed without error', '00%']
>
> OK, I'll bite: what's wrong with the solution I already posted?
>
> --
> Grant

Sorry, my eyes completely missed your one-liner, so my criticism about
not posting a solution was unwarranted. I don't think you and I read
the problem the same way (which is probably why I didn't notice your
solution -- because it wasn't solving the problem I thought I saw).

When I saw "And I am interested in the string that appears in the
third column, which changes as the test runs and then completes" I
assumed that, not only could that string change, but so could the one
before it.

I guess my base assumption that anything with words in it could
change. I was looking at the OP's attempt at a solution, and he
obviously felt he needed to see two or more spaces as an item
delimiter.

(And I got testy because of seeing other IMO unwarranted denigration
of re on the list lately.)

Regards,
Pat


== 9 of 13 ==
Date: Wed, Apr 7 2010 7:51 pm
From: Steven D'Aprano


On Wed, 07 Apr 2010 18:03:47 -0700, Patrick Maupin wrote:

> BTW, although I find it annoying when people say "don't do that" when
> "that" is a perfectly good thing to do, and although I also find it
> annoying when people tell you what not to do without telling you what
> *to* do,

Grant did give a perfectly good solution.


> and although I find the regex solution to this problem to be
> quite clean, the equivalent non-regex solution is not terrible, so I
> will present it as well, for your viewing pleasure:
>
> >>> [x for x in '# 1 Short offline Completed without error
> 00%'.split(' ') if x.strip()]
> ['# 1', 'Short offline', ' Completed without error', ' 00%']


This is one of the reasons we're so often suspicious of re solutions:


>>> s = '# 1  Short offline       Completed without error       00%'
>>> tre = Timer("re.split(' {2,}', s)",
... "import re; from __main__ import s")
>>> tsplit = Timer("[x for x in s.split(' ') if x.strip()]",
... "from __main__ import s")
>>>
>>> re.split(' {2,}', s) == [x for x in s.split(' ') if x.strip()]
True
>>>
>>>
>>> min(tre.repeat(repeat=5))
6.1224789619445801
>>> min(tsplit.repeat(repeat=5))
1.8338048458099365


Even when they are correct and not unreadable line-noise, regexes tend to
be slow. And they get worse as the size of the input increases:

>>> s *= 1000
>>> min(tre.repeat(repeat=5, number=1000))
2.3496899604797363
>>> min(tsplit.repeat(repeat=5, number=1000))
0.41538596153259277
>>>
>>> s *= 10
>>> min(tre.repeat(repeat=5, number=1000))
23.739185094833374
>>> min(tsplit.repeat(repeat=5, number=1000))
4.6444299221038818


And this isn't even one of the pathological O(N**2) or O(2**N) regexes.

Don't get me wrong -- regexes are a useful tool. But if your first
instinct is to write a regex, you're doing it wrong.


[quote]
A related problem is Perl's over-reliance on regular expressions
that is exaggerated by advocating regex-based solution in almost
all O'Reilly books. The latter until recently were the most
authoritative source of published information about Perl.

While simple regular expression is a beautiful thing and can
simplify operations with string considerably, overcomplexity in
regular expressions is extremly dangerous: it cannot serve a basis
for serious, professional programming, it is fraught with pitfalls,
a big semantic mess as a result of outgrowing its primary purpose.
Diagnostic for errors in regular expressions is even weaker then
for the language itself and here many things are just go unnoticed.
[end quote]

http://www.softpanorama.org/Scripting/Perlbook/Ch01/
place_of_perl_among_other_lang.shtml

Even Larry Wall has criticised Perl's regex culture:

http://dev.perl.org/perl6/doc/design/apo/A05.html


--
Steven


== 10 of 13 ==
Date: Wed, Apr 7 2010 8:01 pm
From: J


On Wed, Apr 7, 2010 at 22:45, Patrick Maupin <pmaupin@gmail.com> wrote:

> When I saw "And I am interested in the string that appears in the
> third column, which changes as the test runs and then completes" I
> assumed that, not only could that string change, but so could the one
> before it.
>
> I guess my base assumption that anything with words in it could
> change.  I was looking at the OP's attempt at a solution, and he
> obviously felt he needed to see two or more spaces as an item
> delimiter.

I apologize for the confusion, Pat...

I could have worded that better, but at that point I was A:
Frustrated, B: starving, and C: had my wife nagging me to stop working
to come get something to eat ;-)

What I meant was, in that output string, the phrase in the middle
could change in length...
After looking at the source code for smartctl (part of the
smartmontools package for you linux people) I found the switch that
creates those status messages.... they vary in character length, some
with non-text characters like ( and ) and /, and have either 3 or 4
words...

The spaces between each column, instead of being a fixed number of
spaces each, were seemingly arbitrarily created... there may be 4
spaces between two columns or there may be 9, or 7 or who knows what,
and since they were all being treated as individual spaces instead of
tabs or something, I was having trouble splitting the output into
something that was easy to parse (at least in my mind it seemed that
way).

Anyway, that's that... and I do apologize if my original post was
confusing at all...

Cheers
Jeff


== 11 of 13 ==
Date: Wed, Apr 7 2010 8:04 pm
From: Patrick Maupin


On Apr 7, 9:51 pm, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> wrote:
> On Wed, 07 Apr 2010 18:03:47 -0700, Patrick Maupin wrote:
> > BTW, although I find it annoying when people say "don't do that" when
> > "that" is a perfectly good thing to do, and although I also find it
> > annoying when people tell you what not to do without telling you what
> > *to* do,
>
> Grant did give a perfectly good solution.

Yeah, I noticed later and apologized for that. What he gave will work
perfectly if the only data that changes the number of words is the
data the OP is looking for. This may or may not be true. I don't
know anything about the program generating the data, but I did notice
that the OP's attempt at an answer indicated that the OP felt (rightly
or wrongly) he needed to split on two or more spaces.

>
> > and although I find the regex solution to this problem to be
> > quite clean, the equivalent non-regex solution is not terrible, so I
> > will present it as well, for your viewing pleasure:
>
> > >>> [x for x in '# 1  Short offline       Completed without error
> >       00%'.split('  ') if x.strip()]
> > ['# 1', 'Short offline', ' Completed without error', ' 00%']
>
> This is one of the reasons we're so often suspicious of re solutions:
>
> >>> s = '# 1  Short offline       Completed without error       00%'
> >>> tre = Timer("re.split(' {2,}', s)",
>
> ... "import re; from __main__ import s")>>> tsplit = Timer("[x for x in s.split('  ') if x.strip()]",
>
> ... "from __main__ import s")
>
> >>> re.split(' {2,}', s) == [x for x in s.split('  ') if x.strip()]
> True
>
> >>> min(tre.repeat(repeat=5))
> 6.1224789619445801
> >>> min(tsplit.repeat(repeat=5))
>
> 1.8338048458099365
>
> Even when they are correct and not unreadable line-noise, regexes tend to
> be slow. And they get worse as the size of the input increases:
>
> >>> s *= 1000
> >>> min(tre.repeat(repeat=5, number=1000))
> 2.3496899604797363
> >>> min(tsplit.repeat(repeat=5, number=1000))
> 0.41538596153259277
>
> >>> s *= 10
> >>> min(tre.repeat(repeat=5, number=1000))
> 23.739185094833374
> >>> min(tsplit.repeat(repeat=5, number=1000))
>
> 4.6444299221038818
>
> And this isn't even one of the pathological O(N**2) or O(2**N) regexes.
>
> Don't get me wrong -- regexes are a useful tool. But if your first
> instinct is to write a regex, you're doing it wrong.
>
>     [quote]
>     A related problem is Perl's over-reliance on regular expressions
>     that is exaggerated by advocating regex-based solution in almost
>     all O'Reilly books. The latter until recently were the most
>     authoritative source of published information about Perl.
>
>     While simple regular expression is a beautiful thing and can
>     simplify operations with string considerably, overcomplexity in
>     regular expressions is extremly dangerous: it cannot serve a basis
>     for serious, professional programming, it is fraught with pitfalls,
>     a big semantic mess as a result of outgrowing its primary purpose.
>     Diagnostic for errors in regular expressions is even weaker then
>     for the language itself and here many things are just go unnoticed.
>     [end quote]
>
> http://www.softpanorama.org/Scripting/Perlbook/Ch01/
> place_of_perl_among_other_lang.shtml
>
> Even Larry Wall has criticised Perl's regex culture:
>
> http://dev.perl.org/perl6/doc/design/apo/A05.html

Bravo!!! Good data, quotes, references, all good stuff!

I absolutely agree that regex shouldn't always be the first thing you
reach for, but I was reading way too much unsubstantiated "this is
bad. Don't do it." on the subject recently. In particular, when
people say "Don't use regex. Use PyParsing!" It may be good advice
in the right context, but it's a bit disingenuous not to mention that
PyParsing will use regex under the covers...

Regards,
Pat

== 12 of 13 ==
Date: Wed, Apr 7 2010 8:10 pm
From: Grant Edwards


On 2010-04-08, Patrick Maupin <pmaupin@gmail.com> wrote:

> Sorry, my eyes completely missed your one-liner, so my criticism about
> not posting a solution was unwarranted. I don't think you and I read
> the problem the same way (which is probably why I didn't notice your
> solution -- because it wasn't solving the problem I thought I saw).

No worries.

> When I saw "And I am interested in the string that appears in the
> third column, which changes as the test runs and then completes" I
> assumed that, not only could that string change, but so could the one
> before it.

If that's the case, my solution won't work right.

> I guess my base assumption that anything with words in it could
> change. I was looking at the OP's attempt at a solution, and he
> obviously felt he needed to see two or more spaces as an item
> delimiter.

If the requirement is indeed two or more spaces as a delimiter with
spaces allowed in any field, then a regular expression split is
probably the best solution.

--
Grant

== 13 of 13 ==
Date: Wed, Apr 7 2010 8:26 pm
From: Patrick Maupin


On Apr 7, 9:51 pm, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> wrote:

> This is one of the reasons we're so often suspicious of re solutions:
>
> >>> s = '# 1  Short offline       Completed without error       00%'
> >>> tre = Timer("re.split(' {2,}', s)",
>
> ... "import re; from __main__ import s")>>> tsplit = Timer("[x for x in s.split('  ') if x.strip()]",
>
> ... "from __main__ import s")
>
> >>> re.split(' {2,}', s) == [x for x in s.split('  ') if x.strip()]
> True
>
> >>> min(tre.repeat(repeat=5))
> 6.1224789619445801
> >>> min(tsplit.repeat(repeat=5))
>
> 1.8338048458099365

I will confess that, in my zeal to defend re, I gave a simple one-
liner, rather than the more optimized version:

>>> from timeit import Timer
>>> s = '# 1 Short offline Completed without error 00%'
>>> tre = Timer("splitter(s)",
... "import re; from __main__ import s; splitter =
re.compile(' {2,}').split")
>>> tsplit = Timer("[x for x in s.split(' ') if x.strip()]",
... "from __main__ import s")
>>> min(tre.repeat(repeat=5))
1.893190860748291
>>> min(tsplit.repeat(repeat=5))
2.0661051273345947

You're right that if you have an 800K byte string, re doesn't perform
as well as split, but the delta is only a few percent.

>>> s *= 10000
>>> min(tre.repeat(repeat=5, number=1000))
15.331652164459229
>>> min(tsplit.repeat(repeat=5, number=1000))
14.596404075622559

Regards,
Pat

==============================================================================
TOPIC: ftp and python
http://groups.google.com/group/comp.lang.python/t/6183a96d88f4420b?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 7 2010 6:25 pm
From: Tim Chase


Matjaz Pfefferer wrote:
> What would be the easiest way to copy files from one ftp
> folder to another without downloading them to local system?

As best I can tell, this isn't well-supported by FTP[1] which
doesn't seem to have a native "copy this file from
server-location to server-location bypassing the client".
There's a pair of RNFR/RNTO commands that allow you to rename (or
perhaps move as well) a file which ftplib.FTP.rename() supports
but it sounds like you want too copies.

When I've wanted to do this, I've used a non-FTP method, usually
SSH'ing into the server and just using "cp". This could work for
you if you have pycrypto/paramiko installed.

Your last hope would be that your particular FTP server has some
COPY extension that falls outside of RFC parameters -- something
that's not a portable solution, but if you're doing a one-off
script or something in a controlled environment, could work.

Otherwise, you'll likely be stuck slurping the file down just to
send it back up.

-tkc


[1]
http://en.wikipedia.org/wiki/List_of_FTP_commands

==============================================================================
TOPIC: Python and Regular Expressions
http://groups.google.com/group/comp.lang.python/t/888b3fe934e2c5e2?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 7 2010 6:25 pm
From: Patrick Maupin


On Apr 7, 3:52 am, Chris Rebert <c...@rebertia.com> wrote:

> Regular expressions != Parsers

True, but lots of parsers *use* regular expressions in their
tokenizers. In fact, if you have a pure Python parser, you can often
get huge performance gains by rearranging your code slightly so that
you can use regular expressions in your tokenizer, because that
effectively gives you access to a fast, specialized C library that is
built into practically every Python interpreter on the planet.

> Every time someone tries to parse nested structures using regular
> expressions, Jamie Zawinski kills a puppy.

And yet, if you are parsing stuff in Python, and your parser doesn't
use some specialized C code for tokenization (which will probably be
regular expressions unless you are using mxtexttools or some other
specialized C tokenizer code), your nested structure parser will be
dog slow.

Now, for some applications, the speed just doesn't matter, and for
people who don't yet know the difference between regexps and parsing,
pointing them at PyParsing is certainly doing them a valuable service.

But that's twice today when I've seen people warned off regular
expressions without a cogent explanation that, while the re module is
good at what it does, it really only handles the very lowest level of
a parsing problem.

My 2 cents is that something like PyParsing is absolutely great for
people who want a simple parser without a lot of work. But if people
use PyParsing, and then find out that (for their particular
application) it isn't fast enough, and then wonder what to do about
it, if all they remember is that somebody told them not to use regular
expressions, they will just come to the false conclusion that pure
Python is too painfully slow for any real world task.

Regards,
Pat

==============================================================================
TOPIC: Performance of list vs. set equality operations
http://groups.google.com/group/comp.lang.python/t/818d143c7e9550bc?hl=en
==============================================================================

== 1 of 3 ==
Date: Wed, Apr 7 2010 6:41 pm
From: Steven D'Aprano


On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:

> [Gustavo Nare]
>> In other words: The more different elements two collections have, the
>> faster it is to compare them as sets. And as a consequence, the more
>> equivalent elements two collections have, the faster it is to compare
>> them as lists.
>>
>> Is this correct?
>
> If two collections are equal, then comparing them as a set is always
> slower than comparing them as a list. Both have to call __eq__ for
> every element, but sets have to search for each element while lists can
> just iterate over consecutive pointers.
>
> If the two collections have unequal sizes, then both ways immediately
> return unequal.


Perhaps I'm misinterpreting what you are saying, but I can't confirm that
behaviour, at least not for subclasses of list:

>>> class MyList(list):
... def __len__(self):
... return self.n
...
>>> L1 = MyList(range(10))
>>> L2 = MyList(range(10))
>>> L1.n = 9
>>> L2.n = 10
>>> L1 == L2
True
>>> len(L1) == len(L2)
False


--
Steven


== 2 of 3 ==
Date: Wed, Apr 7 2010 6:53 pm
From: Patrick Maupin


On Apr 7, 8:41 pm, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> wrote:
> On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:
> > [Gustavo Nare]
> >> In other words: The more different elements two collections have, the
> >> faster it is to compare them as sets. And as a consequence, the more
> >> equivalent elements two collections have, the faster it is to compare
> >> them as lists.
>
> >> Is this correct?
>
> > If two collections are equal, then comparing them as a set is always
> > slower than comparing them as a list.  Both have to call __eq__ for
> > every element, but sets have to search for each element while lists can
> > just iterate over consecutive pointers.
>
> > If the two collections have unequal sizes, then both ways immediately
> > return unequal.
>
> Perhaps I'm misinterpreting what you are saying, but I can't confirm that
> behaviour, at least not for subclasses of list:
>
> >>> class MyList(list):
>
> ...     def __len__(self):
> ...             return self.n
> ...>>> L1 = MyList(range(10))
> >>> L2 = MyList(range(10))
> >>> L1.n = 9
> >>> L2.n = 10
> >>> L1 == L2
> True
> >>> len(L1) == len(L2)
>
> False
>
> --
> Steven

I think what he is saying is that the list __eq__ method will look at
the list lengths first. This may or may not be considered a subtle
bug for the edge case you are showing.

If I do the following:

>>> L1 = range(10000000)
>>> L2 = range(10000000)
>>> L3 = range(10000001)
>>> L1 == L2
True
>>> L1 == L3
False
>>>

I don't even need to run timeit -- the "True" takes awhile to print
out, while the "False" prints out immediately.

Regards,
Pat


== 3 of 3 ==
Date: Wed, Apr 7 2010 8:14 pm
From: Raymond Hettinger


[Raymond Hettinger]
> > If the two collections have unequal sizes, then both ways immediately
> > return unequal.

[Steven D'Aprano]
> Perhaps I'm misinterpreting what you are saying, but I can't confirm that
> behaviour, at least not for subclasses of list:

For doubters, see list_richcompare() in
http://svn.python.org/view/python/trunk/Objects/listobject.c?revision=78522&view=markup

if (Py_SIZE(vl) != Py_SIZE(wl) && (op == Py_EQ || op == Py_NE)) {
/* Shortcut: if the lengths differ, the lists differ */
PyObject *res;
if (op == Py_EQ)
res = Py_False;
else
res = Py_True;
Py_INCREF(res);
return res;
}

And see set_richcompare() in
http://svn.python.org/view/python/trunk/Objects/setobject.c?revision=78886&view=markup

case Py_EQ:
if (PySet_GET_SIZE(v) != PySet_GET_SIZE(w))
Py_RETURN_FALSE;
if (v->hash != -1 &&
((PySetObject *)w)->hash != -1 &&
v->hash != ((PySetObject *)w)->hash)
Py_RETURN_FALSE;
return set_issubset(v, w);


Raymond

==============================================================================
TOPIC: Tkinter inheritance mess?
http://groups.google.com/group/comp.lang.python/t/a0146ef7d08ffea0?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 7 2010 8:05 pm
From: ejetzer


On 5 avr, 22:32, Lie Ryan <lie.1...@gmail.com> wrote:
> On 04/06/10 02:38, ejetzer wrote:
>
>
>
> > On 5 avr, 12:36, ejetzer <ejet...@gmail.com> wrote:
> >> For a school project, I'm trying to make a minimalist web browser, and
> >> I chose to use Tk as the rendering toolkit. I made my parser classes
> >> into Tkinter canvases, so that I would only have to call pack and
> >> mainloop functions in order to display the rendering. Right now, two
> >> bugs are affecting the program :
> >> 1) When running the full app¹, which fetches a document and then
> >> attempts to display it, I get a TclError :
> >>                  _tkinter.TclError: bad window path name "{Extensible
> >> Markup Language (XML) 1.0 (Fifth Edition)}"
> >> 2) When running only the parsing and rendering test², I get a big
> >> window to open, with nothing displayed. I am not quite familiar with
> >> Tk, so I have no idea of why it acts that way.
>
> >> 1: webbrowser.py
> >> 2: xmlparser.py
>
> > I just realized I haven't included the Google Code project url :
> >http://code.google.com/p/smally-browser/source/browse/#svn/trunk
>
> Check your indentation xmlparser.py in line 63 to 236, are they supposed
> to be correct?

Yes, these are functions that are used exclusively inside the feed
function, so I decided to restrict their namespace. I just realized it
could be confusing, so I placed them in global namsespace.

==============================================================================
TOPIC: raise exception with fake filename and linenumber
http://groups.google.com/group/comp.lang.python/t/187e8a1d660e5a25?hl=en
==============================================================================

== 1 of 1 ==
Date: Wed, Apr 7 2010 8:52 pm
From: "Gabriel Genellina"


En Wed, 07 Apr 2010 17:23:22 -0300, kwatch <kwatch@gmail.com> escribió:

> Is it possible to raise exception with custom traceback to specify
> file and line?
> I'm creating a certain parser.
> I want to report syntax error with the same format as other exception.
> -------------------------
> 1: def parse(filename):
> 2: if something_is_wrong():
> 3: linenum = 123
> 4: raise Exception("syntax error on %s, line %s" % (filename,
> linenum))
> 5:
> 6: parse('example.file')
> -------------------------
>
> my hope is:
> -------------------------
> Traceback (most recent call last):
> File "/tmp/parser.py", line 6, in <module>
> parse('example.file')
> File "/tmp/parser.py", line 4, in parse
> raise Exception("syntax error on %s, line %s" % (filename,
> linenum))
> File "/tmp/example.file", line 123
> foreach item in items # wrong syntax line
> Exception: syntax error
> -------------------------

The built-in SyntaxError exception does what you want. Constructor
parameters are undocumented, but they're as follows:

raise SyntaxError("A descriptive error message", (filename, linenum,
colnum, source_line))

colnum is used to place the ^ symbol (10 in this fake example). Output:

Traceback (most recent call last):
File "1.py", line 9, in <module>
foo()
File "1.py", line 7, in foo
raise SyntaxError("A descriptive error message", (filename, linenum,
colnum, "this is line 123 in example.file"))
File "example.file", line 123
this is line 123 in example.file
^
SyntaxError: A descriptive error message

--
Gabriel Genellina


==============================================================================
TOPIC: Profiling: Interpreting tottime
http://groups.google.com/group/comp.lang.python/t/31995629b8111cd0?hl=en
==============================================================================

== 1 of 2 ==
Date: Wed, Apr 7 2010 8:58 pm
From: "Gabriel Genellina"


En Wed, 07 Apr 2010 18:44:39 -0300, Nikolaus Rath <Nikolaus@rath.org>
escribió:

> def check_s3_refcounts():
> """Check s3 object reference counts"""
>
> global found_errors
> log.info('Checking S3 object reference counts...')
>
> for (key, refcount) in conn.query("SELECT id, refcount FROM
> s3_objects"):
>
> refcount2 = conn.get_val("SELECT COUNT(inode) FROM blocks WHERE
> s3key=?",
> (key,))
> if refcount != refcount2:
> log_error("S3 object %s has invalid refcount, setting from
> %d to %d",
> key, refcount, refcount2)
> found_errors = True
> if refcount2 != 0:
> conn.execute("UPDATE s3_objects SET refcount=? WHERE
> id=?",
> (refcount2, key))
> else:
> # Orphaned object will be picked up by check_keylist
> conn.execute('DELETE FROM s3_objects WHERE id=?', (key,))
>
> When I ran cProfile.Profile().runcall() on it, I got the following
> result:
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 7639.962 7639.962 7640.269 7640.269
> fsck.py:270(check_s3_refcounts)
>
> So according to the profiler, the entire 7639 seconds where spent
> executing the function itself.
>
> How is this possible? I really don't see how the above function can
> consume any CPU time without spending it in one of the called
> sub-functions.

Is the conn object implemented as a C extension? The profiler does not
detect calls to C functions, I think.
You may be interested in this package by Robert Kern:
http://pypi.python.org/pypi/line_profiler
"Line-by-line profiler.
line_profiler will profile the time individual lines of code take to
execute."

--
Gabriel Genellina

== 2 of 2 ==
Date: Wed, Apr 7 2010 8:58 pm
From: "Gabriel Genellina"


En Wed, 07 Apr 2010 18:44:39 -0300, Nikolaus Rath <Nikolaus@rath.org>
escribió:

> def check_s3_refcounts():
> """Check s3 object reference counts"""
>
> global found_errors
> log.info('Checking S3 object reference counts...')
>
> for (key, refcount) in conn.query("SELECT id, refcount FROM
> s3_objects"):
>
> refcount2 = conn.get_val("SELECT COUNT(inode) FROM blocks WHERE
> s3key=?",
> (key,))
> if refcount != refcount2:
> log_error("S3 object %s has invalid refcount, setting from
> %d to %d",
> key, refcount, refcount2)
> found_errors = True
> if refcount2 != 0:
> conn.execute("UPDATE s3_objects SET refcount=? WHERE
> id=?",
> (refcount2, key))
> else:
> # Orphaned object will be picked up by check_keylist
> conn.execute('DELETE FROM s3_objects WHERE id=?', (key,))
>
> When I ran cProfile.Profile().runcall() on it, I got the following
> result:
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 7639.962 7639.962 7640.269 7640.269
> fsck.py:270(check_s3_refcounts)
>
> So according to the profiler, the entire 7639 seconds where spent
> executing the function itself.
>
> How is this possible? I really don't see how the above function can
> consume any CPU time without spending it in one of the called
> sub-functions.

Is the conn object implemented as a C extension? The profiler does not
detect calls to C functions, I think.
You may be interested in this package by Robert Kern:
http://pypi.python.org/pypi/line_profiler
"Line-by-line profiler.
line_profiler will profile the time individual lines of code take to
execute."

--
Gabriel Genellina

==============================================================================

You received this message because you are subscribed to the Google Groups "comp.lang.python"
group.

To post to this group, visit http://groups.google.com/group/comp.lang.python?hl=en

To unsubscribe from this group, send email to comp.lang.python+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/comp.lang.python/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate