I am trying to parse an xml that looks like this. I want to extract information regarding the katagorie i.e ID, parent ID etc:
<?xml version="1.0" encoding="UTF-8"?> <test timestamp="20210113"> <kategorien> <kategorie id="1" parent_id="0"> Sprache </kategorie> </kategorien> </test>
I am trying this
fields = ['id', 'parent_id'] with open('output.csv', 'wb') as fp: writer = csv.writer(fp) writer.writerow(fields) tree = ET.parse('./file.xml') # from your example Locations is the root and Location is the first level for elem in tree.getroot(): writer.writerow([(elem.get(name) or '').encode('utf-8') for name in fields])
but I get this error:
in <module> writer.writerow(fields) TypeError: a bytes-like object is required, not 'str'
even though I am already using encode('utf-8')
in my code. How can I get rid of this error?
Advertisement
Answer
EDIT 2 If want to find regarding nested attributes or sub-classes, there are two ways:
- You can use a nested loop:
for elem in root: for child in elem: print([(child.attrib.get(name) or 'c') for name in fields])
Output:
[‘1’, ‘0’]
Here, it can also return for classes which have id
and parent_id
but not the name kategorie
.
- If you want to perform the task with a bit more performance and less memory:
for elem in root.iter('kategorie'): print([(elem.attrib.get(name) or 'c') for name in fields])
Output:
[‘1’, ‘0’]
For this method, it will return for every class and sub-class named kategorie
.
EDIT 1: For the issue in comments:
<?xml version="1.0"?> <kategorien> <kategorie id="1" parent_id="0"> Sprache </kategorie> </kategorien>
For the above xml
file, the code seems to work perfectly:
fields = ['id', 'parent_id'] for elem in tree.getroot(): print([(elem.attrib.get(name) or 'c') for name in fields])
Output:
[‘1’, ‘0’]
Original Answer: Looks like you are looking at the wrong location for the error. The error is actually occurring at
writer.writerow(fields)
fields
is a list containing str
and not byte
, that is why it is giving you the error. I would have recommended you to change the write type from wb
to w
, but looking at the rest of the code, it looks like you want to write in byte
.
writer.writerow([x.encode('utf-8') for x in fields])
encode()
just converts your data to byte
form.