Using python lxml I want to test if a XML document contains EXPERIMENT_TYPE, and if it exists, extract the <VALUE>.
Example:
JavaScript
x
13
13
1
<EXPERIMENT_SET>
2
<EXPERIMENT center_name="BCCA" alias="Experiment-pass_2.0">
3
<TITLE>WGBS (whole genome bisulfite sequencing) analysis of SomeSampleA (library: SomeLibraryA).</TITLE>
4
<STUDY_REF accession="SomeStudy" refcenter="BCCA"/>
5
<EXPERIMENT_ATTRIBUTES>
6
<EXPERIMENT_ATTRIBUTE><TAG>EXPERIMENT_TYPE</TAG><VALUE>DNA Methylation</VALUE></EXPERIMENT_ATTRIBUTE>
7
<EXPERIMENT_ATTRIBUTE><TAG>EXPERIMENT_ONTOLOGY_URI</TAG><VALUE>http://purl.obolibrary.org/obo/OBI_0001863</VALUE></EXPERIMENT_ATTRIBUTE>
8
<EXPERIMENT_ATTRIBUTE><TAG>EXPERIMENT_ONTOLOGY_CURIE</TAG><VALUE>obi:0001863</VALUE></EXPERIMENT_ATTRIBUTE>
9
<EXPERIMENT_ATTRIBUTE><TAG>MOLECULE</TAG><VALUE>genomic DNA</VALUE></EXPERIMENT_ATTRIBUTE>
10
</EXPERIMENT_ATTRIBUTES>
11
</EXPERIMENT>
12
</EXPERIMENT_SET>
13
Is there a faster way than iterating through all elements?
JavaScript
1
6
1
all = etree.findall('EXPERIMENT/EXPERIMENT_ATTRIBUTES/EXPERIMENT_ATTRIBUTE/TAG')
2
3
for e in all:
4
if e.text == 'EXPERIMENT_TYPE':
5
print("Found")
6
That attempt is also getting messy when I want to extract the <VALUE>.
Advertisement
Answer
Preferably you do this with XPath which is bound to be incredibly fast. My sugestion (tested and working). It will return a (possible empty) list of VALUE elements from which you can extra the text
.
PS: do not use “special” words such as all
as variable names. Bad practice and may lead to unexpected bugs.
JavaScript
1
25
25
1
import lxml.etree as ET
2
from lxml.etree import Element
3
from typing import List
4
5
xml_str = """
6
<EXPERIMENT_SET>
7
<EXPERIMENT center_name="BCCA" alias="Experiment-pass_2.0">
8
<TITLE>WGBS (whole genome bisulfite sequencing) analysis of SomeSampleA (library: SomeLibraryA).</TITLE>
9
<STUDY_REF accession="SomeStudy" refcenter="BCCA"/>
10
<EXPERIMENT_ATTRIBUTES>
11
<EXPERIMENT_ATTRIBUTE><TAG>EXPERIMENT_TYPE</TAG><VALUE>DNA Methylation</VALUE></EXPERIMENT_ATTRIBUTE>
12
<EXPERIMENT_ATTRIBUTE><TAG>EXPERIMENT_ONTOLOGY_URI</TAG><VALUE>http://purl.obolibrary.org/obo/OBI_0001863</VALUE></EXPERIMENT_ATTRIBUTE>
13
<EXPERIMENT_ATTRIBUTE><TAG>EXPERIMENT_ONTOLOGY_CURIE</TAG><VALUE>obi:0001863</VALUE></EXPERIMENT_ATTRIBUTE>
14
<EXPERIMENT_ATTRIBUTE><TAG>MOLECULE</TAG><VALUE>genomic DNA</VALUE></EXPERIMENT_ATTRIBUTE>
15
</EXPERIMENT_ATTRIBUTES>
16
</EXPERIMENT>
17
</EXPERIMENT_SET>
18
"""
19
20
21
tree = ET.ElementTree(ET.fromstring(xml_str))
22
vals: List[Element] = tree.xpath(".//EXPERIMENT_ATTRIBUTE/TAG[text()='EXPERIMENT_TYPE']/following-sibling::VALUE")
23
print(vals[0].text)
24
# DNA Methylation
25
An alternative XPath declaration was provided below by Michael Kay, which is identical to the answer by Martin Honnen.
JavaScript
1
2
1
.//EXPERIMENT_ATTRIBUTE[TAG='EXPERIMENT_TYPE']/VALUE
2