I have an XML file autogenerated with Informatica BDM, it´s very complex for me to edit values I made several attempts with xml.etree.ElementTree but I do not get results. This is an extract from the file:
JavaScript
x
21
21
1
<?xml version="1.0" encoding="UTF-8"?>
2
<root xmlns="http://www.informatica.com/Parameterization/1.0"
3
xmlns:xsi="http://www.w3.org/2001/XMLSchema"
4
version="2.0"><!--Specify deployed application specific parameters here.--><!--
5
<application name="app_2">
6
<mapping name="M_kafka_hdfs"/>
7
</application>--><project name="V2">
8
<folder name="Streaming">
9
<mapping name="M_kafka_hdfs">
10
<parameter name="P_s_spark_executor_cores">4</parameter>
11
<parameter name="P_s_spark_executor_memory">8G</parameter>
12
<parameter name="P_s_spark_sql_shuffle_partitions">108</parameter>
13
<parameter name="P_s_spark_network_timeout">180000</parameter>
14
<parameter name="P_s_spark_executor_heartbeatInterval">6000</parameter>
15
<parameter name="P_i_maximum_rows_read">0</parameter>
16
<parameter name="P_s_checkpoint_directory">checkpoint</parameter>
17
</mapping>
18
</folder>
19
</project>
20
</root>
21
My idea would be to be able to change the parameters, for example: <parameter name="P_s_spark_executor_memory">8G</parameter>
to <parameter name="P_s_spark_executor_memory">16G</parameter>
I can only access the values, but not their content and I can’t edit them either:
JavaScript
1
10
10
1
import xml.etree.ElementTree as ET
2
3
treexml = ET.parse('autogenerated.xml')
4
for element in treexml.iter():
5
dict_keys={}
6
if element.keys():
7
for name, value in element.items():
8
dict_keys[name]=value
9
print(dict_keys[name])
10
The idea would be to be able to overwrite any parameter:
JavaScript
1
2
1
xml["parameter"]["P_s_spark_sql_shuffle_partitions"] = 64
2
and that it is changed in the file by <parameter name="P_s_spark_sql_shuffle_partitions">64</parameter>
Advertisement
Answer
Try this code:
JavaScript
1
15
15
1
import xml.etree.ElementTree as ET
2
3
name_space = 'http://www.informatica.com/Parameterization/1.0'
4
ET.register_namespace('', name_space)
5
treexml = ET.parse(r'c:testtest.xml')
6
# get all elements with 'parameter' tags (it is necessary to specify the namespace prefix)
7
params = treexml.getroot().findall(f'.//{{{name_space}}}parameter')
8
9
# make the dict with names as keys and previously found elements as value
10
xml = {el.attrib['name']: el for el in params}
11
# set the text of the "P_s_spark_sql_shuffle_partitions"
12
xml["P_s_spark_sql_shuffle_partitions"].text = str(64)
13
# write out the xml
14
treexml.write(r'c:testtestOut.xml')
15
Output c:testtestOut.xml
JavaScript
1
15
15
1
<root xmlns="http://www.informatica.com/Parameterization/1.0" version="2.0"><project name="V2">
2
<folder name="Streaming">
3
<mapping name="M_kafka_hdfs">
4
<parameter name="P_s_spark_executor_cores">4</parameter>
5
<parameter name="P_s_spark_executor_memory">8G</parameter>
6
<parameter name="P_s_spark_sql_shuffle_partitions">64</parameter>
7
<parameter name="P_s_spark_network_timeout">180000</parameter>
8
<parameter name="P_s_spark_executor_heartbeatInterval">6000</parameter>
9
<parameter name="P_i_maximum_rows_read">0</parameter>
10
<parameter name="P_s_checkpoint_directory">checkpoint</parameter>
11
</mapping>
12
</folder>
13
</project>
14
</root>
15