I’m processing XML documents like the following. I’m using the following code to change the attributes of certain elements whenever certain conditions are met. The code works as expected and I’m getting the output I want. However the time it takes to process all the files seems way too much.…
Tag: lxml
Get NA for empty slots in lxml xpath()
I have a big xml (that one): of which I am providing a sample here: I now want to pull out all biospecimen and concentration_value and be able to associate them with each other in the end. I am trying to do it like this: The output csv should look like this: In reality I also pull out many other
Parsing a XML child element back as a string
I’m trying to parse a complex XML and xpath isn’t behaving like I thought it would. Here’s my sample xml: Here’s my python code: I get the output: when I expected: What am I doing wrong? Answer This XPath will get text and elements as expected Printing found nodes as OP requested Resul…
Problems installing lxml on M1 mac
So, I’m having the classic trouble install lxml. Initially I was just pip installing, but when I tried to free up memory using Element.clear() I was getting the following error: I thought this must be because lxml is using the system’s libxml2 which is probably out of date. So I used homebrew to i…
Scrapy Python can‘t extract links with more stable xpath
I‘m Building a scraper for this website. I‘m using Python and scrapy Shell to extract the data that I want: xpath would be: //a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“ Using response.xpath(‘//a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“‘ returns [] I tried using …
Xpath: How to check if a tag comes before text or after text?
Assume I have the following two example pieces of HTML: <p>This is some text: <b>ABCD12345</b></p> <p><b>Name:</b> John Doe</p> I’m able to separate the <b> and non-<b> parts, but I (also) want to know how to determine whether the <b> par…
Objectify xml string with dashes in tags and attributes names
I am using lxml to objectify xml string with dashes in the tags. For example: After this step, the elements’ names come with dashes. I can’t access foo-foo due to dashes in the name. How can I remove dashes from tags name as well as from attribute names? Answer It’s hacky, but you could do s…
Python lxml find text efficiently
Using python lxml I want to test if a XML document contains EXPERIMENT_TYPE, and if it exists, extract the <VALUE>. Example: Is there a faster way than iterating through all elements? That attempt is also getting messy when I want to extract the <VALUE>. Answer Preferably you do this with XPath wh…
Getting error about bad escape during start of Arelle
I am trying to get Arelle working on Ubuntu linux 18.04 with Python 3.6.9. Step-1: (Download Arelle software): git clone https://github.com/Arelle/Arelle.git -b lxml Step-2 Install Python LXML: apt-get install -y python-lxml Step-3 Install Python tk: Due to error: ‘No module named tkinter’ ……
I want to replace the html code with my own
I am using lxml and beautifulsoup library, actually my goal is to translate text of the specific tags out of the whole html code, what I want is, I want to replace the text of specific tags with the translated text. I want to set a loop for the specific xpath in which all the translated text should be inserte…