Skip to content
Advertisement

a bytes-like object is required, not ‘str’ while parsing XML files

I am trying to parse an xml that looks like this. I want to extract information regarding the katagorie i.e ID, parent ID etc:

<?xml version="1.0" encoding="UTF-8"?>
<test timestamp="20210113">
<kategorien>
    <kategorie id="1" parent_id="0">
        Sprache
    </kategorie>
</kategorien>
</test>

I am trying this

fields = ['id', 'parent_id']

with open('output.csv', 'wb') as fp:
    writer = csv.writer(fp)
    writer.writerow(fields)
    tree = ET.parse('./file.xml')
    # from your example Locations is the root and Location is the first level
    for elem in tree.getroot():
        writer.writerow([(elem.get(name) or '').encode('utf-8') 
            for name in fields])

but I get this error:

in <module>
    writer.writerow(fields)
TypeError: a bytes-like object is required, not 'str'

even though I am already using encode('utf-8') in my code. How can I get rid of this error?

Advertisement

Answer

EDIT 2 If want to find regarding nested attributes or sub-classes, there are two ways:

  1. You can use a nested loop:
for elem in root:
    for child in elem:
        print([(child.attrib.get(name) or 'c') for name in fields])

Output:

[‘1’, ‘0’]

Here, it can also return for classes which have id and parent_id but not the name kategorie.

  1. If you want to perform the task with a bit more performance and less memory:
for elem in root.iter('kategorie'):
    print([(elem.attrib.get(name) or 'c') for name in fields])

Output:

[‘1’, ‘0’]

For this method, it will return for every class and sub-class named kategorie.

EDIT 1: For the issue in comments:

<?xml version="1.0"?>
<kategorien>
    <kategorie id="1" parent_id="0">
        Sprache
    </kategorie>
</kategorien>

For the above xml file, the code seems to work perfectly:

fields = ['id', 'parent_id']

for elem in tree.getroot():
    print([(elem.attrib.get(name) or 'c') for name in fields])

Output:

[‘1’, ‘0’]

Original Answer: Looks like you are looking at the wrong location for the error. The error is actually occurring at

writer.writerow(fields)

fields is a list containing str and not byte, that is why it is giving you the error. I would have recommended you to change the write type from wb to w, but looking at the rest of the code, it looks like you want to write in byte.

writer.writerow([x.encode('utf-8') for x in fields])

encode() just converts your data to byte form.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement