I am having trouble parsing data in this manner for an XML file which is converted to a CSV:
For the first column, I would like to get the general name tag (recordingSystem, Ports, etc) and concatenate it with the subNames in the row tags (closedFileCount, processedFileCount, etc)
The tag where the subName is located keeps changing, it could be a “usage”, “lwGuage”, “hwGauge” and so on. I also need to collect those and put it in the column beside it.
Please see the sample XML below:
<?xml version="1.0" encoding="UTF-8"?> <omGroups xmlns="urn:nortel:namespaces:mcp:oms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:oms OMSchema.xsd" > <group> <name>RecordingSystem</name> <row> <package>com.nortelnetworks.mcp.ne.base.recsystem.fw.system</package> <class>RecSysFileOMRow</class> <usage name="closedFileCount" hasThresholds="true"> <measures> closed file count </measures> <description> This register counts the number of closed files in the spool directory of a particular stream and a particular system. Files in the spool directory store the raw OAM records where they are sent to the Element Manager for formatting. </description> <notes> Minor and major alarms when the value of closedFileCount exceeds certain thresholds. Configure the threshold values for minor and major alarms for this OM through engineering parameters for minorBackLogCount and majorBackLogCount, respectively. These engineering parameters are grouped under the parameter group of Log, OM, and Accounting for the logs’ corresponding system. </notes> </usage> <usage name="processedFileCount" hasThresholds="true"> <measures> Processed file count </measures> <description> The register counts the number of processed files in the spool directory of a particular stream and a particular system. Files in the spool directory store the raw OAM records and then send the records to the Element Manager for formatting. </description> </usage> </row> <documentation> <description> Rows of this OM group provide a count of the number of files contained within the directory (which is the OM row key value). </description> <rowKey> The full name of the directory containing the files counted by this row. </rowKey> </documentation> <generatedOn> <all/> </generatedOn> </group> <group traffic="true"> <name>Ports</name> <row> <package>com.nortelnetworks.ims.cap.mediaportal.host</package> <class>PortsOMRow</class> <usage name="rtpMpPortUsage"> <measures> BCP port usage </measures> <description> Meter showing number of ports in use. </description> </usage> <lwGauge name="connMapEntriesLWM"> <measures> Lowest simultaneous port usage </measures> <description> Lowest number of simultaneous ports detected to be in use during the collection interval </description> </lwGauge> <hwGauge name="connMapEntriesHWM"> <measures> Highest simultaneous port usage </measures> <description> Highest number of simultaneous ports detected to be in use during the collection interval. </description> </hwGauge> <waterMark name="connMapEntries"> <measures> Connections map entries </measures> <description> Meter showing the number of connections in the host CPU connection map. </description> <bwg lwref="connMapEntriesLWM" hwref="connMapEntriesHWM"/> </waterMark> <counter name="portUsageSampleCnt"> <measures> Usage sample count </measures> <description> The number of 100-second samples taken during the collection interval contributing to the average report. </description> </counter> <counter name="sampledRtpMpPortUsage"> <measures> In-use ports usage </measures> <description> Provides the sum of the in-use ports every 100 seconds. </description> </counter> <precollector> <package>com.nortelnetworks.ims.cap.mediaportal.host</package> <class>PortsOMCenturyPrecollector</class> <collector>centurySecond</collector> </precollector> </row> <documentation> <description> </description> <rowKey> </rowKey> </documentation> <generatedOn> <list> <ne>sessmgr</ne> <ne>rtpportal</ne> </list> </generatedOn> </group> </omGroups>
The code below is supposed to get the GeneralName and display it in the csv file the correct number of times but I can not get it to display anything.
from xml.etree import ElementTree import csv from copy import copy import lxml.etree tree = ElementTree.parse('OM.xml') sitescope_data = open('OMFileConverted.csv', 'w', newline='', encoding='utf-8') csvwriter = csv.writer(sitescope_data) #Create all needed columns here in order and writes them to excel file col_names = ['name', 'OMRegister'] csvwriter.writerow(col_names) def recurse(root, props): for child in root: if child.tag == '{urn:nortel:namespaces:mcp:oms}group': p2 = copy(props) for event in root.findall('{urn:nortel:namespaces:mcp:oms}group'): event_id = event.find('{urn:nortel:namespaces:mcp:oms}name') if event_id != None: p2['name'] = event_id.text recurse(child, p2) else: recurse(child, props) for event in root.findall('{urn:nortel:namespaces:mcp:oms}group'): event_data = [props.get('name')] csvwriter.writerow(event_data) root = tree.getroot() recurse(root,{}) #root + empty dictionary sitescope_data.close()
Advertisement
Answer
If xml_string
is your XML snippet from the question, then this script:
import csv from bs4 import BeautifulSoup soup = BeautifulSoup(xml_string, 'html.parser') with open('data.csv', 'w', newline='') as f_out: writer = csv.writer(f_out) writer.writerow(['General name:SpecificName', 'RegisterType']) for item in soup.select('row [name]'): writer.writerow([item.find_previous('name').text + ':' + item['name'], item.name])
Produces data.csv
(screenshot from LibreOffice):
Edit: To get measures
tag into a column, you can do:
import csv from bs4 import BeautifulSoup soup = BeautifulSoup(xml_string, 'html.parser') with open('data.csv', 'w', newline='') as f_out: writer = csv.writer(f_out) writer.writerow(['General name:SpecificName', 'RegisterType', 'Measures']) for item in soup.select('row [name]'): writer.writerow([item.find_previous('name').text + ':' + item['name'], item.name, item.find('measures').get_text(strip=True)])
Produces: