I have an XML file autogenerated with Informatica BDM, it´s very complex for me to edit values I made several attempts with xml.etree.ElementTree but I do not get results. This is an extract from the file:
<?xml version="1.0" encoding="UTF-8"?> <root xmlns="http://www.informatica.com/Parameterization/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema" version="2.0"><!--Specify deployed application specific parameters here.--><!-- <application name="app_2"> <mapping name="M_kafka_hdfs"/> </application>--><project name="V2"> <folder name="Streaming"> <mapping name="M_kafka_hdfs"> <parameter name="P_s_spark_executor_cores">4</parameter> <parameter name="P_s_spark_executor_memory">8G</parameter> <parameter name="P_s_spark_sql_shuffle_partitions">108</parameter> <parameter name="P_s_spark_network_timeout">180000</parameter> <parameter name="P_s_spark_executor_heartbeatInterval">6000</parameter> <parameter name="P_i_maximum_rows_read">0</parameter> <parameter name="P_s_checkpoint_directory">checkpoint</parameter> </mapping> </folder> </project> </root>
My idea would be to be able to change the parameters, for example: <parameter name="P_s_spark_executor_memory">8G</parameter>
to <parameter name="P_s_spark_executor_memory">16G</parameter>
I can only access the values, but not their content and I can’t edit them either:
import xml.etree.ElementTree as ET treexml = ET.parse('autogenerated.xml') for element in treexml.iter(): dict_keys={} if element.keys(): for name, value in element.items(): dict_keys[name]=value print(dict_keys[name])
The idea would be to be able to overwrite any parameter:
xml["parameter"]["P_s_spark_sql_shuffle_partitions"] = 64
and that it is changed in the file by <parameter name="P_s_spark_sql_shuffle_partitions">64</parameter>
Advertisement
Answer
Try this code:
import xml.etree.ElementTree as ET name_space = 'http://www.informatica.com/Parameterization/1.0' ET.register_namespace('', name_space) treexml = ET.parse(r'c:testtest.xml') # get all elements with 'parameter' tags (it is necessary to specify the namespace prefix) params = treexml.getroot().findall(f'.//{{{name_space}}}parameter') # make the dict with names as keys and previously found elements as value xml = {el.attrib['name']: el for el in params} # set the text of the "P_s_spark_sql_shuffle_partitions" xml["P_s_spark_sql_shuffle_partitions"].text = str(64) # write out the xml treexml.write(r'c:testtestOut.xml')
Output c:testtestOut.xml
<root xmlns="http://www.informatica.com/Parameterization/1.0" version="2.0"><project name="V2"> <folder name="Streaming"> <mapping name="M_kafka_hdfs"> <parameter name="P_s_spark_executor_cores">4</parameter> <parameter name="P_s_spark_executor_memory">8G</parameter> <parameter name="P_s_spark_sql_shuffle_partitions">64</parameter> <parameter name="P_s_spark_network_timeout">180000</parameter> <parameter name="P_s_spark_executor_heartbeatInterval">6000</parameter> <parameter name="P_i_maximum_rows_read">0</parameter> <parameter name="P_s_checkpoint_directory">checkpoint</parameter> </mapping> </folder> </project> </root>