I have data in xml file and I am reading 3 columns : price , name , calories
xml data
<?xml version='1.0' encoding='utf-8'?> <data> <row> <index>0</index> <price>$5.95</price> <name>Belgian Waffles</name> <desc>Two of our famous Belgian Waffles with plenty of real maple syrup</desc> <calories>650</calories> </row> <row> <index>1</index> <price>$7.95</price> <name>Strawberry Belgian Waffles</name> <desc>Light Belgian waffles covered with strawberries and whipped cream</desc> <calories>900</calories> </row> <row> <index>2</index> <price>$8.95</price> <name>Berry-Berry Belgian Waffles</name> <desc>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</desc> <calories>900</calories> </row> <row> <index>3</index> <price>$4.50</price> <name>French Toast</name> <desc>Thick slices made from our homemade sourdough bread</desc> <calories>600</calories> </row> <row> <index>4</index> <price>$6.95</price> <name>Homestyle Breakfast</name> <desc>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</desc> <calories>950</calories> </row> </data>
Code :
import xml.etree.ElementTree as ET parse_xml = ET.parse('/content/sample_data/xyz.xml') get_root_element = parse_xml.getroot() for data in get_root_element.findall('row'): prc = data.find('price') nm = data.find('name') cal = data.find('calories') temp = prc.text + ',' + nm.text + ',' + cal.text print(temp)
The above code gives me data but need to store this data to csv file
How do I need to write logic for this. Is it possible to do with pandas / csv
Need to add my headers as well to that csv file
Headers : price , name , calories
Advertisement
Answer
Solution of @kiric8494 is good enough, you can stay with it. You can also implement it using csv.DictWriter
which will be quite shorter:
import xml.etree.ElementTree as ET from csv import DictWriter parse_xml = ET.parse(r"/content/sample_data/xyz.xml") root = parse_xml.getroot() with open(r"/content/sample_data/abc.csv", "w", newline="") as f: writer = DictWriter(f, fieldnames=("price", "name", "calories"), extrasaction="ignore") writer.writeheader() writer.writerows({e.tag: e.text for e in row} for row in root)
Basically we set up DictWriter
to ignore all fields except price
, name
and calories
and then pass generator to .writerows()
which construct dictionary of all child nodes of <row>
where key is tag and value is text.