Skip to content
Advertisement

Group branches in an XML tree with Python on a common field

I have a list of order details in a CSV, and want to join all items from the lines together on one order.

Example date is:

Order|Line|Item|Price
123456789|1|IK123456|199.99
654987321|1|MASGE12385|29.95
654987321|2|KLEAN458792|9.99
654987321|3|LP12489|1959.95

I want everything to be listed in an XML with the root as the Order Number, Child as the Line Number and Sub-Children as Item and Price.

I want the output to look like:

<Order number = "123456789">
    <Line number = "1">
        <Item>IK123456</Item>
        <Price>199.99</Price>
    </Line>
</Order>
<Order number = "654987321">
    <Line = "1">
        <Item>MASGE12385</Item>
        <Price>29.95</Price>
    </Line>
    <Line = "2">
        <Item>KLEAN458792</Item>
        <Price>9.99</Price>
    </Line>
    <Line = "3">
        <Item>LP12489</Item>
        <Price>1959.95</Price>
    </Line>
</Order>

Here is my code:

import csv
import xml.etree.ElementTree as ET

file = 'C:/github.txt'

with open (file, 'r') as f:
        reader = csv.reader(f, delimiter = '|')
        header = next(reader)
        
        order_num = reader[0]
        
        root = ET.Element("Order")   #BUILD A ROOT FOR THE XML TREE
        root.set('number', order_num)            #ADD ATTRIBUTE

        for row in reader:                     #ITERATE THROUGH EACH ROW AND POPULATE DATA IN BRANCHES OF XML TREE
            line = ET.SubElement(root, 'line', number= reader[1])
            item = ET.SubElement(line, 'item code')
            item.text = reader[2]
            price = ET.SubElement(line, 'price')
            price.text = reader[3]
            
            tree = ET.ElementTree(root)

tree.write('C:/github.xml', encoding = 'utf-8', xml_declaration = True)

(NOTE: I moved something and got an error, but not sure what happened)

Advertisement

Answer

During loop, consider keeping a tracker on Number to conditionally decide to create an element and keep related underlying items together. Additionally, consider csv.DictReader to iterate csv rows as a dictionary which takes first row headers as keys. Finally, use the built-in minidom to pretty print output. Below will incorporate all XML items under the single <Orders> root:

import csv
import xml.etree.ElementTree as ET
import xml.dom.minidom as mn

file = 'C:/github.txt'
curr_order = None

with open (file, 'r') as f:
    reader = csv.DictReader(f, delimiter = '|')

    # BUILD A ROOT FOR THE XML TREE
    root = ET.Element("Orders")                    
     
    # ITERATE THROUGH EACH ROW AS DICTIONARY
    for d in reader:                               
        # CONDITIONALLY BUILD ORDER ELEMENT
        if curr_order != str(d['Order']):
            orderElem = ET.SubElement(root, "Order")        
        curr_order = str(d['Order']) 
          
        # CREATE DESCENDANTS OF ORDER
        orderElem.set('number', str(d['Order']))    
        line = ET.SubElement(orderElem, 'line', number = str(d['Line']))
        ET.SubElement(line, 'item_code').text = str(d['Item'])
        ET.SubElement(line, 'price').text = str(d['Price'])

# PRETTY PRINT OUTPUT
dom = mn.parseString(ET.tostring(root, encoding = 'utf-8'))

with open('C:/github.xml', 'wb') as f:
    f.write(dom.toprettyxml(indent="   ", encoding = 'utf-8'))

Online Demo

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement