Tag: lxml

Parsing a XML child element back as a string

I’m trying to parse a complex XML and xpath isn’t behaving like I thought it would. Here’s my sample xml: Here’s my python code: I get the output: when I expected: What am I doing wrong? Answer This XPath will get text and elements as expected Printing found nodes as OP requested Result with_tail argument prevents tail text to be

Problems installing lxml on M1 mac

So, I’m having the classic trouble install lxml. Initially I was just pip installing, but when I tried to free up memory using Element.clear() I was getting the following error: I thought this must be because lxml is using the system’s libxml2 which is probably out of date. So I used homebrew to install libxml2 and libxlt, and I force

Scrapy Python can‘t extract links with more stable xpath

I‘m Building a scraper for this website. I‘m using Python and scrapy Shell to extract the data that I want: xpath would be: //a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“ Using response.xpath(‘//a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“‘ returns [] I tried using contains(@class,“sb-card-company“) with the same result. Using other containers in the same way, changed nothing. Using a different page also had no effect. Using

Xpath: How to check if a tag comes before text or after text?

Assume I have the following two example pieces of HTML: <p>This is some text: <b>ABCD12345</b></p> <p><b>Name:</b> John Doe</p> I’m able to separate the <b> and non-<b> parts, but I (also) want to know how to determine whether the <b> part is at the start or at the end of the text (in other words; whether it has text before or

How to solve module problem in Windows 10?

I have a code as below. I run this code using python 3.7 idle, which runs successfully. But when I save it as and run it from using cmd, it pops import module error. My code: Error : cannot import name ‘html’ from ‘lxml’ I can not understand when both are running on the […]