Python lxml find text efficiently

Question

Using python lxml I want to test if a XML document contains EXPERIMENT_TYPE, and if it exists, extract the . Example: Is there a faster way than iterating through all elements? That attempt is also getting messy when I want to extract the . Answer Preferably you do this with XPath wh…

Accepted Answer

Preferably you do this with XPath which is bound to be incredibly fast. My sugestion (tested and working). It will return a (possible empty) list of VALUE elements from which you can extra the text.PS: do not use “special” words such as all as variable names. Bad practice and may lead to unexpected bugs.import lxml.etree as ETfrom lxml.etree import Elementfrom typing import Listxml_str = """ WGBS (whole genome bisulfite sequencing) analysis of SomeSampleA (library: SomeLibraryA). EXPERIMENT_TYPEDNA Methylation EXPERIMENT_ONTOLOGY_URIhttp://purl.obolibrary.org/obo/OBI_0001863 EXPERIMENT_ONTOLOGY_CURIEobi:0001863 MOLECULEgenomic DNA """tree = ET.ElementTree(ET.fromstring(xml_str))vals: List[Element] = tree.xpath(".//EXPERIMENT_ATTRIBUTE/TAG[text()='EXPERIMENT_TYPE']/following-sibling::VALUE")print(vals[0].text)# DNA MethylationAn alternative XPath declaration was provided below by Michael Kay, which is identical to the answer by Martin Honnen..//EXPERIMENT_ATTRIBUTE[TAG='EXPERIMENT_TYPE']/VALUE

Advertisement

Answer