Thursday, March 25, 2010

comp.lang.python - 25 new messages in 12 topics - digest

comp.lang.python
http://groups.google.com/group/comp.lang.python?hl=en

comp.lang.python@googlegroups.com

Today's topics:

* from import and __init__.py - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/231a04c1e8782b22?hl=en
* Create a class at run-time - 2 messages, 2 authors
http://groups.google.com/group/comp.lang.python/t/c6539cb7e78776cc?hl=en
* Python database of plain text editable by notepad or vi - 3 messages, 3
authors
http://groups.google.com/group/comp.lang.python/t/f8888a80f2f98438?hl=en
* Sniffing encoding type by looking at file BOM header - 2 messages, 2 authors
http://groups.google.com/group/comp.lang.python/t/263475e5cd9d4756?hl=en
* Don't understand behavior; instance form a class in another class' instance -
5 messages, 3 authors
http://groups.google.com/group/comp.lang.python/t/974322ccba372b3e?hl=en
* python logging writes an empty file - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/d85aae7f2485a150?hl=en
* Revisiting Generators and Subgenerators - 4 messages, 4 authors
http://groups.google.com/group/comp.lang.python/t/8c38b742691fcbc8?hl=en
* Traversing through Dir() - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/e9c305645b4d3046?hl=en
* Represent object type as - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/6598773376e66591?hl=en
* Automatic import ? - 2 messages, 2 authors
http://groups.google.com/group/comp.lang.python/t/5981f9a71fd4cd21?hl=en
* Is there any library for indexing binary data? - 2 messages, 2 authors
http://groups.google.com/group/comp.lang.python/t/73b172b8c5c572ff?hl=en
* U Want to Earn More Money - Join Now - 1 messages, 1 author
http://groups.google.com/group/comp.lang.python/t/7a2efd127c2f81f6?hl=en

==============================================================================
TOPIC: from import and __init__.py
http://groups.google.com/group/comp.lang.python/t/231a04c1e8782b22?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 25 2010 3:28 pm
From: egbert


On Thu, Mar 25, 2010 at 12:43:13PM -0400, Terry Reedy wrote:
> On 3/25/2010 6:16 AM, egbert wrote:
> >When I do 'from some_package import some_module'
> >the __init__.py of some_package will be run.
> >However, there will not be anything like a package-module,
> >and the effects of __init__.py seem all to be lost. Is that true ?
>
> No. If you do
>
> from sys import modules
> print(modules.keys())
>
> you will see both some_package and some_package.some_module among
> the entries.
Yes, you are right. And I can reach everything with
modules['some_package']
or variants thereof.
Thanks,
egbert

--
Egbert Bouwman - Keizersgracht 197 II - 1016 DS Amsterdam - 020 6257991
========================================================================

==============================================================================
TOPIC: Create a class at run-time
http://groups.google.com/group/comp.lang.python/t/c6539cb7e78776cc?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Mar 25 2010 3:34 pm
From: Patrick Maupin


On Mar 25, 5:00 pm, Michel <michel.metz...@gmail.com> wrote:
> Hi everyone,
>
> I'm trying to dynamically create a class. What I need is to define a
> class, add methods to it and later instantiate this class. Methods
> need to be bound to the instance though, and that's my problem. Here
> is what I have so far:

Well, you should just fill your empty dict with function definitions,
BEFORE you build the class. That's easiest. Also, you can just use
type:

def foo(*whatever):
print foo

bar = type('MyDynamicClass', (object,), dict(foo=foo))

HTH,
Pat


== 2 of 2 ==
Date: Thurs, Mar 25 2010 6:53 pm
From: I V


On Thu, 25 Mar 2010 15:00:35 -0700, Michel wrote:
> I'm trying to dynamically create a class. What I need is to define a
> class, add methods to it and later instantiate this class. Methods need
> to be bound to the instance though, and that's my problem. Here is what
> I have so far:

I'm not entirely sure what you mean by binding methods to the instance.
Do you mean you need to dynamically add methods to a specific instance?
Or that you need to add methods to a class, such that they can be invoked
on specific instances? For the latter, just do:

TestClass.test_foo = test_foo

For the former, try:

tc = TestClass()
tc.test_foo = types.MethodType(test_foo, tc)

==============================================================================
TOPIC: Python database of plain text editable by notepad or vi
http://groups.google.com/group/comp.lang.python/t/f8888a80f2f98438?hl=en
==============================================================================

== 1 of 3 ==
Date: Thurs, Mar 25 2010 3:40 pm
From: James Harris


I am looking to store named pieces of text in a form that can be
edited by a standard editor such as notepad (under Windows) or vi
(under Unix) and then pulled into Python as needed. The usual record
locking and transactions of databases are not required.

Another way to look at it is to treat the separate files as entries in
a dictionary. The file name would be the key and the lines of the file
the value.

Anyone know of a database (with a Python interface) which will allow
text files to be treated as database fields? If not I can just write
it but I thought it best to ask if there was an existing solution
first.

James


== 2 of 3 ==
Date: Thurs, Mar 25 2010 3:55 pm
From: jkn


Kirbybase is one possibility.

http://pypi.python.org/pypi/KirbyBase/1.9


J^n

== 3 of 3 ==
Date: Thurs, Mar 25 2010 3:56 pm
From: Jon Clements


On 25 Mar, 22:40, James Harris <james.harri...@googlemail.com> wrote:
> I am looking to store named pieces of text in a form that can be
> edited by a standard editor such as notepad (under Windows) or vi
> (under Unix) and then pulled into Python as needed. The usual record
> locking and transactions of databases are not required.
>
> Another way to look at it is to treat the separate files as entries in
> a dictionary. The file name would be the key and the lines of the file
> the value.
>
> Anyone know of a database (with a Python interface) which will allow
> text files to be treated as database fields? If not I can just write
> it but I thought it best to ask if there was an existing solution
> first.
>
> James

I could be missing something here, but aren't you basically just
talking about an OS's filesystem?

glob or listdir somewhere, then create a dict using the file contents
would meet your criteria, with very little lines of code -- but I
would be interested to know what the use-case was for this... Is it
read completely at start up time, or if each file contains a large
amount of lines and aren't fixed width (or has no other indexing
support without maintenance), then is a complete sequential-scan
required each time, or do you just tell the user to not update when
running (unless I s'pose something along the lines of a SIGHUP for
config files is applicable).

Sorry, just don't understand why you'd want this.

Jon.

==============================================================================
TOPIC: Sniffing encoding type by looking at file BOM header
http://groups.google.com/group/comp.lang.python/t/263475e5cd9d4756?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Mar 25 2010 4:16 pm
From: Lawrence D'Oliveiro


In message <mailman.1139.1269442366.23598.python-list@python.org>,
python@bdurham.com wrote:

> BOM_UTF8 = '\xef\xbb\xbf'

Since when does UTF-8 need a BOM?


== 2 of 2 ==
Date: Thurs, Mar 25 2010 4:21 pm
From: Irmen de Jong


On 26-3-2010 0:16, Lawrence D'Oliveiro wrote:
> In message<mailman.1139.1269442366.23598.python-list@python.org>,
> python@bdurham.com wrote:
>
>> BOM_UTF8 = '\xef\xbb\xbf'
>
> Since when does UTF-8 need a BOM?

It doesn't, but it is allowed. Not recommended though.
Unfortunately several tools, such as notepad.exe, have a tendency of
silently adding it when saving files.

-irmen


==============================================================================
TOPIC: Don't understand behavior; instance form a class in another class'
instance
http://groups.google.com/group/comp.lang.python/t/974322ccba372b3e?hl=en
==============================================================================

== 1 of 5 ==
Date: Thurs, Mar 25 2010 4:29 pm
From: "Martin P. Hellwig"


Hi all,

When I run the following snippet (drastically simplified, to just show
what I mean):
>>
import platform, sys

class One(object):
def __init__(self):
self.one = True

def change(self):
self.one = False

class Two(object):
def __init__(self):
self._instance_one = One()
self.one = self._instance_one.one
self.change = self._instance_one.change

if __name__ == '__main__':
print(sys.version)
print(platform.machine())
print(80*'#')
TEST1 = One()
print(TEST1.one)
TEST1.change()
print(TEST1.one)
TEST1 = None
print(80*'#')
TEST2 = Two()
print(TEST2.one)
TEST2.change()
print(TEST2.one
>>

I get the following result:

<<
[GCC 4.2.1 20070719 [FreeBSD]]
amd64
################################################################################
True
False
################################################################################
True
True
################################################################################
<<

What I don't understand why in the second test, the last boolean is True
instead of (what I expect) False.
Could somebody enlighten me please as this has bitten me before and I am
confused by this behavior.

Thanks in advance

--
mph


== 2 of 5 ==
Date: Thurs, Mar 25 2010 4:41 pm
From: Christian Heimes


Martin P. Hellwig schrieb:
> What I don't understand why in the second test, the last boolean is True
> instead of (what I expect) False.
> Could somebody enlighten me please as this has bitten me before and I am
> confused by this behavior.

Hint: TEST2.one is not a reference to TEST2.__instance_one.one. When you
alter TEST2.__instance_one.one you don't magically change TEST2.one,
too. Python doesn't have variables like C pointers. Python's copy by
object (or share by object) behavior can be understand as labels. The
label TEST2.one references the same object as TEST2.__instance_one.one
until you change where the label TEST2.__instance_one.one points to.

Christian

== 3 of 5 ==
Date: Thurs, Mar 25 2010 5:06 pm
From: "Martin P. Hellwig"


On 03/25/10 23:41, Christian Heimes wrote:
> Martin P. Hellwig schrieb:
>> What I don't understand why in the second test, the last boolean is True
>> instead of (what I expect) False.
>> Could somebody enlighten me please as this has bitten me before and I am
>> confused by this behavior.
>
> Hint: TEST2.one is not a reference to TEST2.__instance_one.one. When you
> alter TEST2.__instance_one.one you don't magically change TEST2.one,
> too. Python doesn't have variables like C pointers. Python's copy by
> object (or share by object) behavior can be understand as labels. The
> label TEST2.one references the same object as TEST2.__instance_one.one
> until you change where the label TEST2.__instance_one.one points to.
>
> Christian
>

Ah okay thanks for the explanation, Am I correct in thinking (please
correct me if I mangle up the terminology and/or totally are in the
wrong ballpark) that this is more or less because the label of the first
class is to an object (boolean with value False)
and the label of the second class does not cascade to the first label
for looking something up but instead during assignment sees that it is a
label to an object instead of the object itself thus copies the label
content instead?

I probably expected classes namespaces to behave in about the same way
as lists and dictionaries do, don't know where I picked that up.

Thanks again,

--
mph


== 4 of 5 ==
Date: Thurs, Mar 25 2010 6:10 pm
From: "Rhodri James"


On Fri, 26 Mar 2010 00:06:06 -0000, Martin P. Hellwig
<martin.hellwig@dcuktec.org> wrote:

> On 03/25/10 23:41, Christian Heimes wrote:
>> Martin P. Hellwig schrieb:
>>> What I don't understand why in the second test, the last boolean is
>>> True
>>> instead of (what I expect) False.
>>> Could somebody enlighten me please as this has bitten me before and I
>>> am
>>> confused by this behavior.
>>
>> Hint: TEST2.one is not a reference to TEST2.__instance_one.one. When you
>> alter TEST2.__instance_one.one you don't magically change TEST2.one,
>> too. Python doesn't have variables like C pointers. Python's copy by
>> object (or share by object) behavior can be understand as labels. The
>> label TEST2.one references the same object as TEST2.__instance_one.one
>> until you change where the label TEST2.__instance_one.one points to.
>>
>> Christian
>>
>
> Ah okay thanks for the explanation, Am I correct in thinking (please
> correct me if I mangle up the terminology and/or totally are in the
> wrong ballpark) that this is more or less because the label of the first
> class is to an object (boolean with value False)
> and the label of the second class does not cascade to the first label
> for looking something up but instead during assignment sees that it is a
> label to an object instead of the object itself thus copies the label
> content instead?

Pretty much. In the sense that you're thinking of, every assignment works
that way, even the initial "TEST1 = One()". Assignment binds names to
objects, though you have to be aware that names can be such exotic things
as "t", "a[15]" or "TEST2.__instance_one.one"

> I probably expected classes namespaces to behave in about the same way
> as lists and dictionaries do, don't know where I picked that up.

They do, in fact, which isn't terribly surprising considering that class
namespaces are implemented with dictionaries. The distinction you're
missing is that lists and dictionaries are mutable, while booleans aren't;
you can change the contents of a dictionary, but you can't change the
'contents' of a boolean.

--
Rhodri James *-* Wildebeeste Herder to the Masses


== 5 of 5 ==
Date: Thurs, Mar 25 2010 6:56 pm
From: "Martin P. Hellwig"


On 03/26/10 01:10, Rhodri James wrote:
<cut>
>
> Pretty much. In the sense that you're thinking of, every assignment
> works that way, even the initial "TEST1 = One()". Assignment binds names
> to objects, though you have to be aware that names can be such exotic
> things as "t", "a[15]" or "TEST2.__instance_one.one"
>
>> I probably expected classes namespaces to behave in about the same way
>> as lists and dictionaries do, don't know where I picked that up.
>
> They do, in fact, which isn't terribly surprising considering that class
> namespaces are implemented with dictionaries. The distinction you're
> missing is that lists and dictionaries are mutable, while booleans
> aren't; you can change the contents of a dictionary, but you can't
> change the 'contents' of a boolean.
>

All makes sense now, thanks Rhodri & Christian.

--
mph

==============================================================================
TOPIC: python logging writes an empty file
http://groups.google.com/group/comp.lang.python/t/d85aae7f2485a150?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 25 2010 5:11 pm
From: Ovidiu Deac


Hi,

I have the following situation:

My application uses nosetests to discover&run the unittests. I pass
the log configuration file as --logging-config=logging.conf
Everything works just fine, the logs are printed as required by the
configuration file which makes me happy. I take this as a sign that my
logging.conf is correct

Then in my main script, which starts the production application, I
have this line:
logging.config.fileConfig("logging.conf")

The logging module is configured without errors BUT my output.log is
EMPTY. It's like all the messages are filtered.

If I configure logging like this:
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(name)-12s %(levelname)s: %(message)s',
datefmt='%m-%d %H:%M:%S',
filename=file,
filemode='w')
Then the logs are printed ok.

Then I tried this:
file = logging.FileHandler(logFileBasename, 'w')
file.setLevel(logging.INFO)
# set a format which is simpler for console use
formatter = logging.Formatter('%(asctime)s %(name)-12s
%(levelname)-8s %(message)s',)
# tell the handler to use this format
file.setFormatter(formatter)
# add the handler to the root logger
logging.getLogger('').addHandler(file)
logging.getLogger('')

...which also leaves my output file EMPTY.

I'm out of ideas. Can anybody help me with this?

Thanks in advance!
ovidiu

Here is my logging.conf:

[formatters]
keys: detailed,simple

[handlers]
keys: console,file

[loggers]
keys: root

[formatter_simple]
format: %(name)s:%(levelname)s: %(message)s

[formatter_detailed]
format: %(name)s:%(levelname)s %(module)s:%(lineno)d: %(message)s

[handler_console]
class: StreamHandler
args: []
formatter: detailed

[handler_file]
class=FileHandler
level=DEBUG
formatter=detailed
args=('output.log', 'w')
filename=output.log
mode=w

[logger_root]
level: INFO
handlers: file
propagate: 1

==============================================================================
TOPIC: Revisiting Generators and Subgenerators
http://groups.google.com/group/comp.lang.python/t/8c38b742691fcbc8?hl=en
==============================================================================

== 1 of 4 ==
Date: Thurs, Mar 25 2010 5:23 pm
From: Cameron Simpson


On 25Mar2010 14:39, Winston <winstonw@stratolab.com> wrote:
| Here's my proposal again, but hopefully with better formatting so you
| can read it easier.

Having quickly read the Abstract and Motivation, why is this any better
than a pair of threads and a pair of Queue objects? (Aside from
co-routines being more lightweight in terms of machine resources?)

On the flipside, given that generators were recently augumented to
support coroutines I can see your motivation within that framework.

Cheers,
--
Cameron Simpson <cs@zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

C makes it easy for you to shoot yourself in the foot. C++ makes that
harder, but when you do, it blows away your whole leg.
- Bjarne Stroustrup


== 2 of 4 ==
Date: Thurs, Mar 25 2010 5:31 pm
From: Winston Wolff


Coroutines achieve very similar things to threads, but avoid problems resulting from the pre-emptive nature of threads. Specifically, a coroutine indicates where it will yield to the other coroutine. This avoids lots of problems related to synchronization. Also the lightweight aspect is apparently important for some simulations when they have many thousands of agents to simulate--this number of threads becomes a problem.

-Winston

Winston Wolff
Stratolab - Games for Learning
tel: (646) 827-2242
web: www.stratolab.com

On Mar 25, 2010, at 5:23 PM, Cameron Simpson wrote:

>
> Having quickly read the Abstract and Motivation, why is this any better
> than a pair of threads and a pair of Queue objects? (Aside from
> co-routines being more lightweight in terms of machine resources?)
>
> On the flipside, given that generators were recently augumented to
> support coroutines I can see your motivation within that framework.
>
> Cheers,
> --
> Cameron Simpson <cs@zip.com.au> DoD#743
> http://www.cskk.ezoshosting.com/cs/
>
> C makes it easy for you to shoot yourself in the foot. C++ makes that
> harder, but when you do, it blows away your whole leg.
> - Bjarne Stroustrup

== 3 of 4 ==
Date: Thurs, Mar 25 2010 8:30 pm
From: Patrick Maupin


On Mar 25, 7:31 pm, Winston Wolff <winst...@stratolab.com> wrote:

(a bunch of stuff about coroutines)

There have been proposals in the past for more full-featured
generators, that would work as general purpose coroutines. Among
other things, there were issues with exception propagation, and the
design was deliberately simplified to what we have today. Before
proposing anything in this area you should carefully read PEPs 288,
325, and 342, and all the discussion about those PEPs in the python-
dev archives.

After reading all that, and still being convinced that you have the
greatest thing since sliced bread (and that you REALLY understand all
the concerns about exceptions and other things), you need to update
your document to address all the concerns raised in the discussions on
those PEPs, put on your asbestos suit (modern asbestos-free
replacements never work as advertised), and then re-post your
document.

Personally, I am very interested in co-routines, but I have very
little time right now, and am not at all interested in reading a
proposal from somebody who doesn't know the full history of how
generators got to be the way they are (the lack of coroutines is not
an accidental oversight). I suspect I am not alone in this opinion,
so there is probably some interest in a realistic proposal, but
perhaps also some skepticism about whether a realistic proposal can
actually be engineered...

Best regards and good luck!
Pat


== 4 of 4 ==
Date: Thurs, Mar 25 2010 9:51 pm
From: Stefan Behnel


Patrick Maupin, 26.03.2010 04:30:
> ... and then re-post your document.

... preferably to the python-ideas mailing list. Although it seems to me
that this is something that could be explored as a library first - which
usually means that people will tell you exactly that on python-ideas and
ask you to come back with an implementation that has proven to be useful in
practice.

Stefan


==============================================================================
TOPIC: Traversing through Dir()
http://groups.google.com/group/comp.lang.python/t/e9c305645b4d3046?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 25 2010 5:25 pm
From: Andrej Mitrovic


I would like to traverse through the entire structure of dir(), and
write it to a file.

Now, if I try to write the contents of dir() to a file (via pickle), I
only get the top layer. So even if there are lists within the returned
list from dir(), they get written as a list of strings to the file.

Basically, I have an embedded and somewhat stripped version of Python.
I would like to find out just how much functionality it has (I have no
documentation for it), so I thought the best way to do that is
traverse thru the dir() call. Any clues as to how I could write the
whole structure to a file? I guess I'll need some kind of recursion
here. :)


==============================================================================
TOPIC: Represent object type as
http://groups.google.com/group/comp.lang.python/t/6598773376e66591?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 25 2010 5:49 pm
From: Jason


On Mar 26, 12:00 am, Bruno Desthuilliers <bruno.
42.desthuilli...@websiteburo.invalid> wrote:
>            attrs['type'] = type(self)
>
> Do the same thing with less work !-)

Ah, silly me :P

>      attrs['__typename__'] = type(self).__name__

That's exactly what I needed — I was not aware of the "__name__"
attribute.

> Warning: won't be very useful if your code still uses old-style classes.

No, all the objects are new-style classes so that's fine.

> Depends on what you do with this dict, DBUS etc. And of your definition
> of "better", of course.

Simplest possible :P But this looks like it, so thanks very much :)

Cheers,
Jason

==============================================================================
TOPIC: Automatic import ?
http://groups.google.com/group/comp.lang.python/t/5981f9a71fd4cd21?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Mar 25 2010 6:03 pm
From: "C. B."


Hi everyone,

I'm currently coding a C library which provides several modules and
objects.

Let's say that some of these objects are classes called AAA and BBB.
The constructor of AAA needs to get BBB as argument.

So I can run the following code :

from mymodule import AAA
from mymodule import BBB

a = AAA(BBB()))

But, as there is no case where AAA can be used without BBB, I would
like to avoid importing BBB in my Python scripts when I already import
AAA.

For now, I think that reproducing the behavior of the __init__.py file
could be a good way to do this, but how can I code that using only the
C API ?

Are there any other solutions ? Is this kind of importation a good
idea ?

Greetings,
Cyrille Bagard


== 2 of 2 ==
Date: Thurs, Mar 25 2010 9:20 pm
From: Steven D'Aprano


On Thu, 25 Mar 2010 18:03:58 -0700, C. B. wrote:

> Hi everyone,
>
> I'm currently coding a C library which provides several modules and
> objects.
>
> Let's say that some of these objects are classes called AAA and BBB. The
> constructor of AAA needs to get BBB as argument.
>
> So I can run the following code :
>
> from mymodule import AAA
> from mymodule import BBB
>
> a = AAA(BBB()))
>
> But, as there is no case where AAA can be used without BBB, I would like
> to avoid importing BBB in my Python scripts when I already import AAA.

Since AAA must take an argument of BBB, then give it a default:

# in mymodule
def AAA(arg=BBB()):
...

# in another module
from mymodule import AAA
a = AAA()


Or do this:

from mymodule import AAA, BBB
a = AAA(BBB())


> For now, I think that reproducing the behavior of the __init__.py file
> could be a good way to do this, but how can I code that using only the C
> API ?

What is the behaviour of the __init__.py file?

--
Steven

==============================================================================
TOPIC: Is there any library for indexing binary data?
http://groups.google.com/group/comp.lang.python/t/73b172b8c5c572ff?hl=en
==============================================================================

== 1 of 2 ==
Date: Thurs, Mar 25 2010 6:45 pm
From: 甜瓜


Many thanks for your kind reply. As you mentioned, a sparse array may
be the best choice.
Storing offset rather than payload itself can greatly save memory space.

1e7 queries per second is my ideal aim. But 1e6 must be achieved.
Currently I have implemented 5e6 on one PC (without incremental
indexing and all incoming queries coming from local data stream).
Since the table is very big and responding is time critical, the
finally system will be definitely distributed computing. I hope that
Judy algorithm can simplify indexing, so I can focus on implementing
data persistence and distributed computing affairs.

--
ShenLei

在 2010年3月26日 上午2:55,Irmen de Jong <irmen-NOSPAM-@xs4all.nl> 写道:
> On 25-3-2010 10:55, 甜瓜 wrote:
>> Thank you irmen. I will take a look at pytable.
>> FYI, let me explain the case clearly.
>>
>> Originally, my big data table is simply array of Item:
>> struct Item
>> {
>> long id; // used as key
>> BYTE payload[LEN]; // corresponding value with fixed length
>> };
>>
>> All items are stored in one file by using "stdio.h" function:
>> fwrite(itemarray, sizeof(Item), num_of_items, fp);
>>
>> Note that "id" is randomly unique without any order. To speed up
>> searching I regrouped / sorted them into two-level hash tables (in
>> the form of files). I want to employ certain library to help me index
>> this table.
>>
>> Since the table contains about 10^9 items and LEN is about 2KB, it is
>> impossible to hold all data in memory. Furthermore, some new item may
>> be inserted into the array. Therefore incremental indexing feature is
>> needed.
>
> I see, I thought the payload data was small as well. What about this idea:
> Build a hash table where the keys are the id from your Item structs and
> the value is the file seek offset of the Item 'record' in your original
> datafile. (although that might generate values of type long, which take
> more memory than int, so maybe we should use file_offset/sizeof(Item).
> This way you can just keep your original data file (you only have to
> scan it to build the hash table) and you will avoid a lengthy conversion
> process.
>
> If this hashtable still doesn't fit in memory use a sparse array
> implementation of some sort that is more efficient at storing simple
> integers, or just put it into a database solution mentioned in earlier
> responses.
>
> Another thing: I think that your requirement of 1e7 lookups per second
> is a bit steep for any solution where the dataset is not in core memory
> at once though.
>
> Irmen.
> --
> http://mail.python.org/mailman/listinfo/python-list
>


== 2 of 2 ==
Date: Thurs, Mar 25 2010 8:28 pm
From: John Nagle


甜瓜 wrote:
> Well, Database is not proper because 1. the table is very big (~10^9
> rows) 2. we should support very fast *simple* query that is to get
> value corresponding to single key (~10^7 queries / second).

Ah, crypto rainbow tables.

John Nagle

==============================================================================
TOPIC: U Want to Earn More Money - Join Now
http://groups.google.com/group/comp.lang.python/t/7a2efd127c2f81f6?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Mar 25 2010 11:15 pm
From: priya a


http://123maza.com/78/bookmark/


==============================================================================

You received this message because you are subscribed to the Google Groups "comp.lang.python"
group.

To post to this group, visit http://groups.google.com/group/comp.lang.python?hl=en

To unsubscribe from this group, send email to comp.lang.python+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/comp.lang.python/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate