im new to XML and i want to know how to create a dataframe in python from this XML file.
<EXTENDEDPROPERTIES> <DEBTCONFIGURATION> <row Key="guid" Value="2018438038"/> <row Key="status" Value="0"/> <row Key="forma_pago" Value="DEBITO A CUENTA"/> <row Key="monto" Value="23699.1"/> <row Key="monto_abono" Value="360.55"/> <row Key="entidad" Value="BANCO CAPRICHOSO S.A."/> <row Key="tipo" Value="PREST. AUTO"/> <row Key="balance" Value="19617.5"/> <row Key="KIND_ID" Value="PRINCIPAL"/> <row Key="TYPE_ID" Value="CEDULA_IDENTIDAD"/> <row Key="CUSTOMER_ID" Value="777-555-888"/> <row Key="MEMBER_TYPE" Value="DEUDOR"/> </DEBTCONFIGURATION>
I have the following code, it creates the DataFrame but when i tried to append the value of the row, i dont know why it keeps coming “None”.
I dont know if i have to change de calling argument i.e Attrib.get.
Also i tried changing the attrib.get to find(“value”).text but it give me the error that it dosnt have the a text attribute.
import pandas as pd import xml.etree.ElementTree as ET xtree = ET.parse("davi_apc.xml") xroot = xtree.getroot() df_cols = ["guid", "status", "forma_pago", "monto", "monto_abono", "entidad", "tipo", "balance","KIND_ID", "TYPE_ID", "CUSTOMER_ID", "MEMBER_TYPE"] rows = [] for node in xroot: s_guid = node.attrib.get("guid") s_status = node.attrib.get("status") s_formapago = node.attrib.get("forma_pago") s_monto = node.attrib.get("monto") s_monto_abono = node.attrib.get("monto_abono") s_entidad = node.attrib.get("entidad") s_tipo = node.attrib.get("tipo") s_balance = node.attrib.get("balance") s_kind_id = node.attrib.get("KIND_ID") s_type_id = node.attrib.get("TYPE_ID") s_customer_id = node.attrib.get("CUSTOMER_ID") s_mebder_type = node.attrib.get("MEMBER_TYPE") rows.append({ "guid" : s_guid, "status" : s_status, "forma_pago" : s_formapago, "monto" : s_monto, "monto_abono" : s_monto_abono, "entidad" : s_entidad, "tipo" : s_tipo, "balance" : s_balance, "KIND_ID" : s_kind_id, "TYPE_ID" : s_type_id, "CUSTOMER_ID" : s_customer_id, "MEMBER_TYPE" : s_mebder_type }) out_df = pd.DataFrame(rows, columns = df_cols)
this is the printout of print(rows) [{‘guid’: None, ‘status’: None, ‘forma_pago’: None, ‘monto’: None, ‘monto_abono’: None, ‘entidad’: None, ‘tipo’: None, ‘balance’: None, ‘KIND_ID’: None, ‘TYPE_ID’: None, ‘CUSTOMER_ID’: None, ‘MEMBER_TYPE’: None}]
and this is the printout of the dataframe
guid status forma_pago monto monto_abono entidad tipo balance KIND_ID
0 None None None None None None None None None
TYPE_ID CUSTOMER_ID MEMBER_TYPE
0 None None None
Advertisement
Answer
Here is a working solution:
1/ remove top line from xml file, I am unsure if the first tag is xml compliant ?
<DEBTCONFIGURATION> <row Key="guid" Value="2018438038"/> <row Key="status" Value="0"/> <row Key="forma_pago" Value="DEBITO A CUENTA"/> <row Key="monto" Value="23699.1"/> <row Key="monto_abono" Value="360.55"/> <row Key="entidad" Value="BANCO CAPRICHOSO S.A."/> <row Key="tipo" Value="PREST. AUTO"/> <row Key="balance" Value="19617.5"/> <row Key="KIND_ID" Value="PRINCIPAL"/> <row Key="TYPE_ID" Value="CEDULA_IDENTIDAD"/> <row Key="CUSTOMER_ID" Value="777-555-888"/> <row Key="MEMBER_TYPE" Value="DEUDOR"/> </DEBTCONFIGURATION>
2/ code:
import pandas as pd import xml.etree.ElementTree as ET xtree = ET.parse("davi_apc.xml") xroot = xtree.getroot() rows = [{}] for node in xroot: print(node.attrib) rows[0].update({node.attrib['Key']:node.attrib['Value']}) out_df = pd.DataFrame(rows)
3/ output for out_df:
out_df.head(10) guid status ... CUSTOMER_ID MEMBER_TYPE 0 2018438038 0 ... 777-555-888 DEUDOR