Tag: lxml

Python and LXML: extremely slow, more efficient code?

I’m processing XML documents like the following. I’m using the following code to change the attributes of certain elements whenever certain conditions are met. The code works as expected and I’m getting the output I want. However the time it takes to process all the files seems way too much.…

Get NA for empty slots in lxml xpath()

lxml python xml

I have a big xml (that one): of which I am providing a sample here: I now want to pull out all biospecimen and concentration_value and be able to associate them with each other in the end. I am trying to do it like this: The output csv should look like this: In reality I also pull out many other

Parsing a XML child element back as a string

elementtree lxml python xml xpath

I’m trying to parse a complex XML and xpath isn’t behaving like I thought it would. Here’s my sample xml: Here’s my python code: I get the output: when I expected: What am I doing wrong? Answer This XPath will get text and elements as expected Printing found nodes as OP requested Resul…

Problems installing lxml on M1 mac

apple-m1 libxml2 lxml pip python

So, I’m having the classic trouble install lxml. Initially I was just pip installing, but when I tried to free up memory using Element.clear() I was getting the following error: I thought this must be because lxml is using the system’s libxml2 which is probably out of date. So I used homebrew to i…

Scrapy Python can‘t extract links with more stable xpath

lxml python scrapy xpath

I‘m Building a scraper for this website. I‘m using Python and scrapy Shell to extract the data that I want: xpath would be: //a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“ Using response.xpath(‘//a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“‘ returns [] I tried using …

Xpath: How to check if a tag comes before text or after text?

html lxml python xml xpath

Assume I have the following two example pieces of HTML: This is some text: ABCD12345 Name: John Doe I’m able to separate the and non- parts, but I (also) want to know how to determine whether the par…

Objectify xml string with dashes in tags and attributes names

lxml lxml.objectify python xml

I am using lxml to objectify xml string with dashes in the tags. For example: After this step, the elements’ names come with dashes. I can’t access foo-foo due to dashes in the name. How can I remove dashes from tags name as well as from attribute names? Answer It’s hacky, but you could do s…

Python lxml find text efficiently

lxml python xml xpath

Using python lxml I want to test if a XML document contains EXPERIMENT_TYPE, and if it exists, extract the <VALUE>. Example: Is there a faster way than iterating through all elements? That attempt is also getting messy when I want to extract the <VALUE>. Answer Preferably you do this with XPath wh…

Getting error about bad escape during start of Arelle

arelle lxml python ubuntu ubuntu-18.04

I am trying to get Arelle working on Ubuntu linux 18.04 with Python 3.6.9. Step-1: (Download Arelle software): git clone https://github.com/Arelle/Arelle.git -b lxml Step-2 Install Python LXML: apt-get install -y python-lxml Step-3 Install Python tk: Due to error: ‘No module named tkinter’ ……

I want to replace the html code with my own

beautifulsoup lxml python tags

I am using lxml and beautifulsoup library, actually my goal is to translate text of the specific tags out of the whole html code, what I want is, I want to replace the text of specific tags with the translated text. I want to set a loop for the specific xpath in which all the translated text should be inserte…