Tuesday, January 26, 2010

comp.lang.python - 15 new messages in 6 topics - digest

comp.lang.python
http://groups.google.com/group/comp.lang.python?hl=en

comp.lang.python@googlegroups.com

Today's topics:

* list.pop(0) vs. collections.dequeue - 10 messages, 6 authors
http://groups.google.com/group/comp.lang.python/t/9221d87f93748b3f?hl=en
* ctypes for AIX - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/a93f969410d0d086?hl=en
* Sikuli: the coolest Python project I have yet seen... - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/766e5530f706ed52?hl=en
* easy_install error ... - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/958414df621ec1a5?hl=en
* Splitting text at whitespace but keeping the whitespace in thereturned list -
1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/22533b4383d301bb?hl=en
* Authenticated encryption with PyCrypto - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/26ef2de83c5a0337?hl=en

==============================================================================
TOPIC: list.pop(0) vs. collections.dequeue
http://groups.google.com/group/comp.lang.python/t/9221d87f93748b3f?hl=en
==============================================================================

== 1 of 10 ==
Date: Mon, Jan 25 2010 3:00 pm
From: Steve Howell


On Jan 25, 1:00 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Steve Howell <showel...@yahoo.com> writes:
> > These are the reasons I am not using deque:
>
> Thanks for these.  Now we are getting somewhere.
>
> >   1) I want to use native lists, so that downstream methods can use
> > them as lists.
>
> It sounds like that could be fixed by making the deque API a proper
> superset of the list API.

That is probably a good idea.

> >   2) Lists are faster for accessing elements.
>
> It sounds like that could be fixed by optimizing deque somewhat.  Also,
> have you profiled your application to show that accessing list elements
> is actually using a significant fraction of its runtime and that it
> would be slowed down noticably by deque?  If not, it's a red herring.

I haven't profiled deque vs. list, but I think you are correct about
pop() possibly being a red herring.

It appears that the main bottleneck might still be the processing I do
on each line of text, which in my cases is regexes.

For really large lists, I suppose memmove() would eventually start to
become a bottleneck, but it's brutally fast when it just moves a
couple kilobytes of data around.

> >   3) I want to be able to insert elements into the middle of the list.
>
> I just checked, and was surprised to find that deque doesn't support
> this.  I'd say go ahead and file a feature request to add it to deque.
>

It might be a good thing to add just for consistency sake. If
somebody first implements an algorithm with lists, then discovers it
has overhead relating to inserting/appending at the end of the list,
then the more deque behaves like a list, the more easily they could
switch over their code to deque. Not knowing much about deque's
internals, I assume its performance for insert() would O(N) just like
list, although maybe a tiny bit slower.

> >   4) I have no need for rotating elements.
>
> That's unpersuasive since you're advocating adding a feature to list
> that many others have no need for.  
>

To be precise, I wasn't really advocating for a new feature but an
internal optimization of a feature that already exists.

> > Adding a word or two to a list is an O(1) addition to a data structure
> > that takes O(N) memory to begin with.
>
> Yes, as mentioned, additive constants matter.
>
> > Another way of looking at it is that you would need to have 250 or so
> > lists in memory at the same time before the extra pointer was even
> > costing you kilobytes of memory.
>
> I've often run applications with millions of lists, maybe tens of
> millions.  Of course it would be 100's of millions if the machines were
> big enough.
>

I bet even in your application, the amount of memory consumed by the
PyListObjects themselves is greatly dwarfed by other objects, notably
the list elements themselves, not to mention any dictionaries that
your app uses.

> > My consumer laptop has 3027908k of memory.
>
> I thought the idea of buying bigger machines was to solve bigger
> problems, not to solve the same problems more wastefully.

Well, I am not trying to solve problems wastefully here. CPU cycles
are also scarce, so it seems wasteful to do an O(N) memmove that could
be avoided by storing an extra pointer per list. I also think that
encouraging the use of pop(0) would actually make many programs more
memory efficient, in the sense that you can garbage collect list
elements earlier.

Thanks for your patience in responding to me, despite the needlessly
abrasive tone of my earlier postings. I am coming around to this
thinking:

1) Summarize all this discussion and my lessons learned in some kind
of document. It does not have to be a PEP per se, but I could provide
a useful service to the community by listing pros/cons/etc.

2) I would still advocate for removing the warning against list.pop
(0) from the tutorial. I agree with Steven D'Aprano that docs really
should avoid describing implementation details in many instances
(although I do not know what he thinks about this particular case). I
also think that the performance penalty for pop(0) is negligible for
most medium-sized programs. For large-sized programs where you really
want to swap in deque, I think most authors are beyond reading the
tutorial and are looking elsewhere for insight on Python data
structures.

3) I am gonna try to implement the patch anyway for my own
edification.

4) I do think that there are ways that deque could be improved, but
it is not high on my priority list. I will try to mention it in the
PEP, though.

== 2 of 10 ==
Date: Mon, Jan 25 2010 3:30 pm
From: Steve Howell


On Jan 25, 1:32 pm, Arnaud Delobelle <arno...@googlemail.com> wrote:
> Steve Howell <showel...@yahoo.com> writes:
>
> [...]
>
> > My algorithm does exactly N pops and roughly N list accesses, so I
> > would be going from N*N + N to N + N log N if switched to blist.
>
> Can you post your algorithm?  It would be interesting to have a concrete
> use case to base this discussion on.
>

These are the profile results for an admittedly very large file
(430,000 lines), which shows that pop() consumes more time than any
other low level method. So pop() is not a total red herring. But I
have to be honest and admit that I grossly overestimated the penalty
for smaller files. Typical files are a couple hundred lines, and for
that use case, pop()'s expense gets totally drowned out by regex
handling. In other words, it's a lot cheaper to move a couple hundred
pointers per list element pop than it is to apply a series of regexes
to them, which shouldn't be surprising.

ncalls tottime percall cumtime percall filename:lineno
(function)
230001/1 149.508 0.001 222.432 222.432 /home/showell/workspace/
shpaml_website/shpaml.py:192(recurse)
429999 17.667 0.000 17.667 0.000 {method 'pop' of 'list'
objects}
530000 8.428 0.000 14.125 0.000 /home/showell/workspace/
shpaml_website/shpaml.py:143(get_indented_block)
3700008 7.877 0.000 7.877 0.000 {built-in method match}
5410125/5410121 5.697 0.000 5.697 0.000 {len}
300000 3.938 0.000 22.286 0.000 /home/showell/workspace/
shpaml_website/shpaml.py:96(convert_line)
959999 3.847 0.000 6.759 0.000 /home/showell/workspace/
shpaml_website/shpaml.py:29(INDENT)
959999 3.717 0.000 12.547 0.000 /home/showell/workspace/
shpaml_website/shpaml.py:138(find_indentation)
370000 3.495 0.000 20.204 0.000 /home/showell/workspace/
shpaml_website/shpaml.py:109(apply_jquery)
370000 3.322 0.000 6.528 0.000 {built-in method sub}
1469999 2.575 0.000 2.575 0.000 {built-in method groups}

As an aside, I am a little surprised by how often I call len() and
that it also takes a large chunk of time, but that's my problem to
fix.

== 3 of 10 ==
Date: Mon, Jan 25 2010 3:40 pm
From: Steve Howell


--- On Mon, 1/25/10, Chris Colbert <sccolbert@gmail.com> wrote:

>
> looking at that code, i think you could solve
> your whole problem with a single called to reversed() (which
> is NOT the same as list.reverse()) 
>

I do not think that's actually true. It does no good to pop elements off a copy of the list if there is still code that refers to the original list. So I think you really do want list.reverse().

The problem with reversing the lists is that it gets sliced and diced and passed around to other methods, one of which, html_block_tag, recursively calls back to the main method. So you could say that everybody just has to work with a reversed list, but in my mind, that would be just backward and overly complicated.

I am not completely ruling out the approach, though. The idea of modelling the program essentially as a stack has some validity, and it probably would run faster.

https://bitbucket.org/showell/shpaml_website/src/tip/shpaml.py

== 4 of 10 ==
Date: Mon, Jan 25 2010 3:45 pm
From: geremy condra


On Sat, Jan 23, 2010 at 4:38 AM, Alf P. Steinbach <alfps@start.no> wrote:

<snip>

> Hm, it would be nice if the Python docs offered complexity (time)
> guarantees in general...
>
> Cheers,
>
> - Alf

This would be a very welcome improvement IMHO- especially
in collections.

Geremy Condra


== 5 of 10 ==
Date: Mon, Jan 25 2010 4:25 pm
From: Ethan Furman


Steve Howell wrote:
>> On Sat, 23 Jan 2010 09:57:04 -0500, Roy Smith wrote:
>>> So, we're right back to my statement earlier in this thread that the
>>> docs are deficient in that they describe behavior with no hint about
>>> cost. Given that, it should be no surprise that users make incorrect
>>> assumptions about cost.

No hint? Looking at the below snippet of docs -- "not efficient" and
"slow" sound like pretty good hints to me.

> Bringing this thread full circle, does it make sense to strike this
> passage from the tutorial?:
>
> '''
> It is also possible to use a list as a queue, where the first element
> added is the first element retrieved ("first-in, first-out"); however,
> lists are not efficient for this purpose. While appends and pops from
> the end of list are fast, doing inserts or pops from the beginning of
> a list is slow (because all of the other elements have to be shifted
> by one).
> '''
>
> I think points #3 and #6 possibly apply. Regarding points #2 and #4,
> the tutorial is at least not overly technical or specific; it just
> explains the requirement to shift other elements one by one in simple
> layman's terms.
>

I think the paragraph is fine. Instead of waiting for the (hundreds
of?) posts wondering why making a FIFO queue from a list is so slow, and
what's wrong with Python, etc, etc, it points out up front that yes you
can, and here's why you don't want to. This does not strike me as too
much knowledge.

~Ethan~


== 6 of 10 ==
Date: Mon, Jan 25 2010 5:13 pm
From: "Alf P. Steinbach"


* Ethan Furman:
> Steve Howell wrote:
>>> On Sat, 23 Jan 2010 09:57:04 -0500, Roy Smith wrote:
>>>> So, we're right back to my statement earlier in this thread that the
>>>> docs are deficient in that they describe behavior with no hint about
>>>> cost. Given that, it should be no surprise that users make incorrect
>>>> assumptions about cost.
>
> No hint? Looking at the below snippet of docs -- "not efficient" and
> "slow" sound like pretty good hints to me.
>
>> Bringing this thread full circle, does it make sense to strike this
>> passage from the tutorial?:
>>
>> '''
>> It is also possible to use a list as a queue, where the first element
>> added is the first element retrieved ("first-in, first-out"); however,
>> lists are not efficient for this purpose. While appends and pops from
>> the end of list are fast, doing inserts or pops from the beginning of
>> a list is slow (because all of the other elements have to be shifted
>> by one).
>> '''
>>
>> I think points #3 and #6 possibly apply. Regarding points #2 and #4,
>> the tutorial is at least not overly technical or specific; it just
>> explains the requirement to shift other elements one by one in simple
>> layman's terms.
>>
>
> I think the paragraph is fine. Instead of waiting for the (hundreds
> of?) posts wondering why making a FIFO queue from a list is so slow, and
> what's wrong with Python, etc, etc, it points out up front that yes you
> can, and here's why you don't want to. This does not strike me as too
> much knowledge.

Is the tutorial regarded as part of the language specification?

I understand that the standard library docs are part (e.g. 'object' is only
described there), and that at least some PEPs are.


Cheers,

- Alf


== 7 of 10 ==
Date: Mon, Jan 25 2010 5:58 pm
From: Jerry Hill


On Sat, Jan 23, 2010 at 4:38 AM, Alf P. Steinbach <alfps@start.no> wrote:
> Hm, it would be nice if
> the Python docs offered complexity (time) guarantees in general...

Last time it came up, I don't think there was any core developer
interest in putting complexity guarantees in the Python Language
Reference. Some folks did document the behavior of most of the common
CPython containers though: http://wiki.python.org/moin/TimeComplexity

--
Jerry


== 8 of 10 ==
Date: Mon, Jan 25 2010 8:31 pm
From: Paul Rubin


Steve Howell <showell30@yahoo.com> writes:
> I haven't profiled deque vs. list, but I think you are correct about
> pop() possibly being a red herring....
> For really large lists, I suppose memmove() would eventually start to
> become a bottleneck, but it's brutally fast when it just moves a
> couple kilobytes of data around.

One way to think of Python is as a scripting wrapper around a bunch of C
functions, rather than as a full-fledged programming language. Viewed
that way, list operations like pop(0) are essentially constant time
unless the list is quite large. By that I mean you can implement
classic structures like doubly-linked lists using Python tuples, but
even though inserting into the middle of them is theoretically O(1), the
memmove's of the native list operations will be much faster in practice.
Programs dealing with large lists (more than a few thousand elements)
are obviously different and if your program is using such large lists,
you have to plan a little differently when writing the code.

>> I've often run applications with millions of lists
> I bet even in your application, the amount of memory consumed by the
> PyListObjects themselves is greatly dwarfed by other objects, notably
> the list elements themselves

Such lists often would just one element or even be empty. For example,
you might have a dictionary mapping names to addresses. Most people
have just one address, but some might have no address, and a few might
have more than one address, so you would have a list of addresses for
each name. Of course the dictionary slots in that example would also
use space.

> Well, I am not trying to solve problems wastefully here. CPU cycles
> are also scarce, so it seems wasteful to do an O(N) memmove that could
> be avoided by storing an extra pointer per list.

Realistically the CPython interpreter is so slow that the memmove is
unnoticable, and Python (at least CPython) just isn't all that
conductive to writing fast code. It makes up for this in programmer
productivity for the many sorts of problems in which moderate speed
is acceptable.

> Thanks for your patience in responding to me, despite the needlessly
> abrasive tone of my earlier postings.

I wondered whether you might have come over from the Lisp newsgroups,
which are pretty brutal. We try to be friendlier here (not that we're
always successful). Anyway, welcome.

> 1) Summarize all this discussion and my lessons learned in some kind
> of document. It does not have to be a PEP per se, but I could provide
> a useful service to the community by listing pros/cons/etc.

I suppose that can't hurt, but there are probably other areas (multicore
parallelism is a perennial one) of much higher community interest.

http://wiki.python.org/moin/ is probably a good place to put such
a document.

> 2) I would still advocate for removing the warning against list.pop
> (0) from the tutorial. I agree with Steven D'Aprano that docs really
> should avoid describing implementation details in many instances

On general principles I agree with Alex Stepanov that the running time
of a function should be part of its interface (nobody wants to use a
stack of popping an element takes quadratic time) and therefore should
be stated in the docs. Python just has a weird incongruence between the
interpreter layer and the C layer, combined with a library well-evolved
for everyday problem sizes, so the traditional asymptotic approach to
algorithm selection often doesn't give the best practical choice.

I don't feel like looking up what the tutorial says about pop(0), but if
it just warns against it without qualification, it should probably
be updated.


== 9 of 10 ==
Date: Mon, Jan 25 2010 9:00 pm
From: Steve Howell


On Jan 24, 11:28 am, a...@pythoncraft.com (Aahz) wrote:
> In article <b4440231-f33f-49e1-9d6f-5fbce0a63...@b2g2000yqi.googlegroups.com>,
> Steve Howell  <showel...@yahoo.com> wrote:
>
>
>
> >Even with realloc()'s brokenness, you could improve pop(0) in a way
> >that does not impact list access at all, and the patch would not change
> >the time complexity of any operation; it would just add negligible
> >extract bookkeeping within list_resize() and a few other places.
>
> Again, your responsibility is to provide a patch and a spectrum of
> benchmarking tests to prove it.  Then you would still have to deal with
> the objection that extensions use the list internals -- that might be an
> okay sell given the effort otherwise required to port extensions to
> Python 3, but that's not the way to bet.
>

Ok, I just submitted a patch to python-dev that illustrates a 100x
speedup on an admittedly artificial program. It still has a long way
to go, but it demonstrates proof of concept. I'm done for the day,
but tomorrow I will try to polish it up and improve it, even if its
doomed for rejection. Apologies to all I have offended in this
thread. I frankly found some of the pushback to be a bit hasty and
disrespectful, but I certainly overreacted to some of the criticism.
And now I'm in the awkward position of asking the people I offended to
help me with the patch. If anybody can offer me a hand in
understanding some of CPython's internals, particularly with regard to
memory management, it would be greatly appreciated.

(Sorry I don't have a link to the python-dev posting; it is not
showing up in the archives yet for some reason.)


== 10 of 10 ==
Date: Mon, Jan 25 2010 9:13 pm
From: Steve Howell


On Jan 25, 8:31 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Steve Howell <showel...@yahoo.com> writes:
> > I haven't profiled deque vs. list, but I think you are correct about
> > pop() possibly being a red herring....
> > For really large lists, I suppose memmove() would eventually start to
> > become a bottleneck, but it's brutally fast when it just moves a
> > couple kilobytes of data around.
>
> One way to think of Python is as a scripting wrapper around a bunch of C
> functions, rather than as a full-fledged programming language.  Viewed
> that way, list operations like pop(0) are essentially constant time
> unless the list is quite large.  By that I mean you can implement
> classic structures like doubly-linked lists using Python tuples, but
> even though inserting into the middle of them is theoretically O(1), the
> memmove's of the native list operations will be much faster in practice.
> Programs dealing with large lists (more than a few thousand elements)
> are obviously different and if your program is using such large lists,
> you have to plan a little differently when writing the code.

Thanks. That is a good way of looking at.

>
> Realistically the CPython interpreter is so slow that the memmove is
> unnoticable, and Python (at least CPython) just isn't all that
> conductive to writing fast code.  It makes up for this in programmer
> productivity for the many sorts of problems in which moderate speed
> is acceptable.
>

Definitely, and moderate speed is enough in a surprisingly large
number of applications.


> > Thanks for your patience in responding to me, despite the needlessly
> > abrasive tone of my earlier postings.  
>
> I wondered whether you might have come over from the Lisp newsgroups,
> which are pretty brutal.  We try to be friendlier here (not that we're
> always successful).  Anyway, welcome.
>

:)

> >   1) Summarize all this discussion and my lessons learned in some kind
> > of document.  It does not have to be a PEP per se, but I could provide
> > a useful service to the community by listing pros/cons/etc.
>
> I suppose that can't hurt, but there are probably other areas (multicore
> parallelism is a perennial one) of much higher community interest.
>
> http://wiki.python.org/moin/is probably a good place to put such
> a document.
>

Ok, that's where I'll start.

> >   2) I would still advocate for removing the warning against list.pop
> > (0) from the tutorial.  I agree with Steven D'Aprano that docs really
> > should avoid describing implementation details in many instances
>
> On general principles I agree with Alex Stepanov that the running time
> of a function should be part of its interface (nobody wants to use a
> stack of popping an element takes quadratic time) and therefore should
> be stated in the docs.  Python just has a weird incongruence between the
> interpreter layer and the C layer, combined with a library well-evolved
> for everyday problem sizes, so the traditional asymptotic approach to
> algorithm selection often doesn't give the best practical choice.
>
> I don't feel like looking up what the tutorial says about pop(0), but if
> it just warns against it without qualification, it should probably
> be updated.

Here it is:

http://docs.python.org/tutorial/datastructures.html#using-lists-as-queues

My opinion is that the warning should be either removed or qualified,
but it is probably fine as written.

'''
It is also possible to use a list as a queue, where the first element
added is the first element retrieved ("first-in, first-out"); however,
lists are not efficient for this purpose. While appends and pops from
the end of list are fast, doing inserts or pops from the beginning of
a list is slow (because all of the other elements have to be shifted
by one).
'''

The qualifications would be that deque lacks some features that list
has, and that the shift-by-one operation is actually a call to memmove
() and may not apply to all implementations.


==============================================================================
TOPIC: ctypes for AIX
http://groups.google.com/group/comp.lang.python/t/a93f969410d0d086?hl=en
==============================================================================

== 1 of 1 ==
Date: Mon, Jan 25 2010 3:36 pm
From: "Waddle, Jim"


Chris,
Thanks for responding to my email.
I apologize for the remark about python only being developed for windows. I got the impression when I was looking at the ActivePython web site and saw that the version of python that they had available was not supported on very many unix systems. I should not make general statement based on only one web site. After reading your email I decided to see for myself what the issue was about compiling python on AIX 5.3.

This is the error I saw the first time I tried to use ctypes.

Python 2.4.3 (#1, Jul 17 2006, 20:00:23) [C] on aix5
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ImportError: No module named ctypes

This version of python was downloaded and installed from ActivePython and when I checked their webpage it states that ctypes is not available on AIX.
I then figured I would get a new copy of python and install it on AIX. I downloaded python.2.5.5c2 from http://www.python.org. I did the configure and make which posted many errors in the ctypes function which I guess is the reason that is does not get include in the final make.

an example of the build error I get when doing the make is:
xlc_r -q64 -DNDEBUG -O -I. -I/s/users/cz030a/xferjunk/python/Python-2.5.5c2/./Include -Ibuild/temp.aix-5.3-2.5/libffi/inclu
de -Ibuild/temp.aix-5.3-2.5/libffi -I/s/users/cz030a/xferjunk/python/Python-2.5.5c2/Modules/_ctypes/libffi/src -I/s/users/c
z030a/xferjunk/ots/python2.5/include -I. -IInclude -I./Include -I/s/users/cz030a/xferjunk/python/Python-2.5.5c2/Include -I/
s/users/cz030a/xferjunk/python/Python-2.5.5c2 -c /s/users/cz030a/xferjunk/python/Python-2.5.5c2/Modules/_ctypes/_ctypes.c -
o build/temp.aix-5.3-2.5/s/users/cz030a/xferjunk/python/Python-2.5.5c2/Modules/_ctypes/_ctypes.o
"/s/users/cz030a/xferjunk/python/Python-2.5.5c2/Modules/_ctypes/_ctypes.c", line 2820.31: 1506-068 (W) Operation between ty
pes "void*" and "int(*)(void)" is not allowed.
"/s/users/cz030a/xferjunk/python/Python-2.5.5c2/Modules/_ctypes/_ctypes.c", line 3363.28: 1506-280 (W) Function argument as
signment between types "int(*)(void)" and "void*" is not allowed.
"/s/users/cz030a/xferjunk/python/Python-2.5.5c2/Modules/_ctypes/_ctypes.c", line 4768.67: 1506-280 (W) Function argument as
signment between types "void*" and "void*(*)(void*,const void*,unsigned long)" is not allowed.
"/s/users/cz030a/xferjunk/python/Python-2.5.5c2/Modules/_ctypes/_ctypes.c", line 4769.66: 1506-280 (W) Function argument as
signment between types "void*" and "void*(*)(void*,int,unsigned long)" is not allowed.

I do not have sufficient knowledge to know how to fix this. I would think that this error somehow is related to compiling on aix. If you have any suggestions on how to correct this problem , I would appreciate it

Jim Waddle
KIT-D
425-785-5194

-----Original Message-----
From: chris@rebertia.com [mailto:chris@rebertia.com] On Behalf Of Chris Rebert
Sent: Sunday, January 24, 2010 7:31 AM
To: Waddle, Jim
Cc: python-list@python.org
Subject: Re: ctypes for AIX

On Sun, Jan 24, 2010 at 5:54 AM, Waddle, Jim <jim.waddle@boeing.com> wrote:
> I need to use ctypes with python running on AIX.

According to the ctypes readme, ctypes is based on libffi, which
according to its website, supports AIX for PowerPC64.
So, perhaps you could state what the actual error or problem you're
encountering is?
It is theoretically possible the ctypes-bundled libffi is either
outdated or had the AIX-specific bits removed; I don't know, I'm not a
CPython dev.

> It appears that python is being developed mostly for windows.

No, not really; your statement is especially ironic considering one of
Python's primary areas of use is for web applications as part of a
LAMP stack.

> Is there a policy concerning getting functions like ctypes working on AIX.

No idea. Someone will probably chime in though.

Cheers,
Chris
--
http://blog.rebertia.com

==============================================================================
TOPIC: Sikuli: the coolest Python project I have yet seen...
http://groups.google.com/group/comp.lang.python/t/766e5530f706ed52?hl=en
==============================================================================

== 1 of 1 ==
Date: Mon, Jan 25 2010 8:25 pm
From: Ron


OK, here's an idea. I used to do screen scraping scripts and run them
as CGI scripts with an HTMl user interface. Why not run Sikuli on
Jython on a JVM running on my server, so that I can do my screen
scraping with Sikuli? I can take user inputs by using CGI forms from a
web client, process the requests using a Sikuli script on the server,
and send the results back to the web client.

This sounds like fun to me, and easier to highlight and capture the
appropriate screen information on targeted web sites using Sikuli than
to hand code location information or even using Beautiful Soup.

==============================================================================
TOPIC: easy_install error ...
http://groups.google.com/group/comp.lang.python/t/958414df621ec1a5?hl=en
==============================================================================

== 1 of 1 ==
Date: Mon, Jan 25 2010 8:31 pm
From: tekion

FYI,
I figured out what I was doing wrong. After reading the setuptools
docs, I noticed took out the quotes around the package name and it
works, see details below:
python setup.py easy_install -m docutils==0.4
running easy_install
Searching for docutils==0.4
Best match: docutils 0.4
Processing docutils-0.4-py2.6.egg
Removing docutils 0.4 from easy-install.pth file
Installing rst2html.py script to C:\Python26\Scripts
Installing rst2latex.py script to C:\Python26\Scripts
Installing rst2newlatex.py script to C:\Python26\Scripts
Installing rst2pseudoxml.py script to C:\Python26\Scripts
Installing rst2s5.py script to C:\Python26\Scripts
Installing rst2xml.py script to C:\Python26\Scripts

Using c:\python26\lib\site-packages\docutils-0.4-py2.6.egg

Because this distribution was installed --multi-version, before you
can
import modules from this package in an application, you will need to
'import pkg_resources' and then use a 'require()' call similar to one
of
these examples, in order to select the desired version:

pkg_resources.require("docutils") # latest installed version
pkg_resources.require("docutils==0.4") # this exact version
pkg_resources.require("docutils>=0.4") # this version or higher

Processing dependencies for docutils==0.4
Finished processing dependencies for docutils==0.4

I hope this helps other having the similar issue as me.

==============================================================================
TOPIC: Splitting text at whitespace but keeping the whitespace in thereturned
list
http://groups.google.com/group/comp.lang.python/t/22533b4383d301bb?hl=en
==============================================================================

== 1 of 1 ==
Date: Mon, Jan 25 2010 9:47 am
From: "Tim Arnold"


"MRAB" <python@mrabarnett.plus.com> wrote in message
news:mailman.1362.1264353878.28905.python-list@python.org...
> python@bdurham.com wrote:
>> I need to parse some ASCII text into 'word' sized chunks of text AND
>> collect the whitespace that seperates the split items. By 'word' I mean
>> any string of characters seperated by whitespace (newlines, carriage
>> returns, tabs, spaces, soft-spaces, etc). This means that my split text
>> can contain punctuation and numbers - just not whitespace.
>> The split( None ) method works fine for returning the word sized chunks
>> of text, but destroys the whitespace separators that I need.
>> Is there a variation of split() that returns delimiters as well as
>> tokens?
>>
> I'd use the re module:
>
> >>> import re
> >>> re.split(r'(\s+)', "Hello world!")
> ['Hello', ' ', 'world!']

also, partition works though it returns a tuple instead of a list.
>>> s = 'hello world'
>>> s.partition(' ')
('hello', ' ', 'world')
>>>

--Tim Arnold

==============================================================================
TOPIC: Authenticated encryption with PyCrypto
http://groups.google.com/group/comp.lang.python/t/26ef2de83c5a0337?hl=en
==============================================================================

== 1 of 1 ==
Date: Mon, Jan 25 2010 9:26 pm
From: Daniel


Just got done reading this thread:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/b31a5b5f58084f12/0e09f5f5542812c3

and I'd appreciate feedback on this recipe:

http://code.activestate.com/recipes/576980/

Of course, it does not meet all of the requirements set forth by the
OP in the referenced thread (the pycrypto dependency is a problem),
but it is an attempt to provide a simple interface for performing
strong, password-based encryption. Are there already modules out there
that provide such a simple interface? If there are, they seem to be
hiding somewhere out of Google's view.

I looked at ezPyCrypto, but it seemed to require public and private
keys, which was not convenient in my situation... maybe password-based
encryption is trivial to do with ezPyCrypto as well? In addition to
ezPyCrypto, I looked at Google's keyczar, but despite the claims of
the documentation, the API seemed overly complicated. Is it possible
to have a simple API for an industry-strength encryption module? If
not, is it possible to document that complicated API such that a non-
cryptographer could use it and feel confident that he hadn't made a
critical mistake?

Also, slightly related, is there an easy way to get the sha/md5
deprecation warnings emitted by PyCrypto in Python 2.6 to go away?

~ Daniel


==============================================================================

You received this message because you are subscribed to the Google Groups "comp.lang.python"
group.

To post to this group, visit http://groups.google.com/group/comp.lang.python?hl=en

To unsubscribe from this group, send email to comp.lang.python+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/comp.lang.python/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate