Skip to content
Advertisement

Tag: lxml

Python and LXML: extremely slow, more efficient code?

I’m processing XML documents like the following. I’m using the following code to change the attributes of certain elements whenever certain conditions are met. The code works as expected and I’m getting the output I want. However the time it takes to process all the files seems way too much. If you notice, I have some print statements that allow

Get NA for empty slots in lxml xpath()

I have a big xml (that one): of which I am providing a sample here: I now want to pull out all biospecimen and concentration_value and be able to associate them with each other in the end. I am trying to do it like this: The output csv should look like this: In reality I also pull out many other

Parsing a XML child element back as a string

I’m trying to parse a complex XML and xpath isn’t behaving like I thought it would. Here’s my sample xml: Here’s my python code: I get the output: when I expected: What am I doing wrong? Answer This XPath will get text and elements as expected Printing found nodes as OP requested Result with_tail argument prevents tail text to be

Problems installing lxml on M1 mac

So, I’m having the classic trouble install lxml. Initially I was just pip installing, but when I tried to free up memory using Element.clear() I was getting the following error: I thought this must be because lxml is using the system’s libxml2 which is probably out of date. So I used homebrew to install libxml2 and libxlt, and I force

Scrapy Python can‘t extract links with more stable xpath

I‘m Building a scraper for this website. I‘m using Python and scrapy Shell to extract the data that I want: xpath would be: //a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“ Using response.xpath(‘//a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“‘ returns [] I tried using contains(@class,“sb-card-company“) with the same result. Using other containers in the same way, changed nothing. Using a different page also had no effect. Using

Xpath: How to check if a tag comes before text or after text?

Assume I have the following two example pieces of HTML: <p>This is some text: <b>ABCD12345</b></p> <p><b>Name:</b> John Doe</p> I’m able to separate the <b> and non-<b> parts, but I (also) want to know how to determine whether the <b> part is at the start or at the end of the text (in other words; whether it has text before or

Python lxml find text efficiently

Using python lxml I want to test if a XML document contains EXPERIMENT_TYPE, and if it exists, extract the <VALUE>. Example: Is there a faster way than iterating through all elements? That attempt is also getting messy when I want to extract the <VALUE>. Answer Preferably you do this with XPath which is bound to be incredibly fast. My sugestion

Getting error about bad escape during start of Arelle

I am trying to get Arelle working on Ubuntu linux 18.04 with Python 3.6.9. Step-1: (Download Arelle software): git clone https://github.com/Arelle/Arelle.git -b lxml Step-2 Install Python LXML: apt-get install -y python-lxml Step-3 Install Python tk: Due to error: ‘No module named tkinter’ …I install: apt-get install python3-tk When it’s time to start Arelle from terminal, I use: I then get

I want to replace the html code with my own

I am using lxml and beautifulsoup library, actually my goal is to translate text of the specific tags out of the whole html code, what I want is, I want to replace the text of specific tags with the translated text. I want to set a loop for the specific xpath in which all the translated text should be inserted

Advertisement