Re: Reading xpath value error - Lxml
On Sun, Apr 14, 2013 at 10:29 AM, <bubufff@gmail.com> wrote:
> Hi all,
>
> I am trying to crawl the information from this link
>
> http://muaban.net/mua-ban-nha-quan-thu-duc-l5924-c32/quan-thu-duc-ban-nha1lau-2mt-truoc-sau-dg-ng-cong-tru-p-hiep-phu-q9-dt-4x21-5m--id15946781
>
> and this is the code I use
>
>> link =
>> "http://muaban.net/mua-ban-nha-quan-thu-duc-l5924-c32/quan-thu-duc-ban-nha1lau-2mt-truoc-sau-dg-ng-cong-tru-p-hiep-phu-q9-dt-4x21-5m--id15946781"
>> xPath = "id('pC_DV_tableHeader')/x:tbody/x:tr[4]/x:td[3]"
>> namespace = {'x': 'http://www.w3.org/1999/xhtml'}
>>
>> tree = lxml.html.parse(link)
>> arrayContent = tree.xpath(xPath + "/text()", namespaces=namespace)
>>
>> if len(arrayContent):
>> content = cgi.escape(arrayContent[0].encode("utf-8"))
>
>
> I use xPath checker add-on of firefox to read the xPath value and the
> namespace. However, when running the code, I always get the content empty.
> How can I solve this ?
>
Are you sure your xpath is correct? I'm not sure about that "id()" syntax. Try:
//x:table[@id="'pC_DV_tableHeader"]//x:tr[4]/x:td[3]
Another thing to note, the DOM presented by Firefox is the result of
Firefox parsing and potentially fixing up the HTML code. For instance,
there is no <tbody> in the actual HTML for that table, Firefox always
inserts a <tbody> if it is missing when parsing a table. Does lxml
also insert a <tbody> if there is not one? If it doesn't, then your
xpath would never work.
Cheers
Tom
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home