I have this string :
'<Section xml:space="preserve" HasTrailingParagraphBreakOnPaste="False" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"><Paragraph FontSize="11" FontFamily="Portable User Interface" Foreground="#FF000000" FontWeight="Normal" FontStyle="Normal" FontStretch="Normal" CharacterSpacing="0" Typography.AnnotationAlternates="0" Typography.EastAsianExpertForms="False" Typography.EastAsianLanguage="Normal" Typography.EastAsianWidths="Normal" Typography.StandardLigatures="True" Typography.ContextualLigatures="True" Typography.DiscretionaryLigatures="False" Typography.HistoricalLigatures="False" Typography.StandardSwashes="0" Typography.ContextualSwashes="0" Typography.ContextualAlternates="True" Typography.StylisticAlternates="0" Typography.StylisticSet1="False" Typography.StylisticSet2="False" Typography.StylisticSet3="False" Typography.StylisticSet4="False" Typography.StylisticSet5="False" Typography.StylisticSet6="False" Typography.StylisticSet7="False" Typography.StylisticSet8="False" Typography.StylisticSet9="False" Typography.StylisticSet10="False" Typography.StylisticSet11="False" Typography.StylisticSet12="False" Typography.StylisticSet13="False" Typography.StylisticSet14="False" Typography.StylisticSet15="False" Typography.StylisticSet16="False" Typography.StylisticSet17="False" Typography.StylisticSet18="False" Typography.StylisticSet19="False" Typography.StylisticSet20="False" Typography.Capitals="Normal" Typography.CapitalSpacing="False" Typography.Kerning="True" Typography.CaseSensitiveForms="False" Typography.HistoricalForms="False" Typography.Fraction="Normal" Typography.NumeralStyle="Normal" Typography.NumeralAlignment="Normal" Typography.SlashedZero="False" Typography.MathematicalGreek="False" Typography.Variants="Normal" TextOptions.TextHintingMode="Fixed" TextOptions.TextFormattingMode="Ideal" TextOptions.TextRenderingMode="Auto" TextAlignment="Left" LineHeight="0" LineStackingStrategy="MaxHeight"><Run>Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour unnm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.</Run></Paragraph></Section>'
My goal is to extract Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour unnm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.
so the text between <run>
and </run>
I did it with regular expression but it doesn’t work with some xml string so I tried with xml.etree.ElementTree
but I didn’t succeed to access the sting nested in <run></run>
How can I extract this text using XML parser?
Advertisement
Answer
Here a simple way to get the data:
xmlstr = '<Section xml:space="preserve" HasTrailingParagraphBreakOnPaste="False" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"><Paragraph FontSize="11" FontFamily="Portable User Interface" Foreground="#FF000000" FontWeight="Normal" FontStyle="Normal" FontStretch="Normal" CharacterSpacing="0" Typography.AnnotationAlternates="0" Typography.EastAsianExpertForms="False" Typography.EastAsianLanguage="Normal" Typography.EastAsianWidths="Normal" Typography.StandardLigatures="True" Typography.ContextualLigatures="True" Typography.DiscretionaryLigatures="False" Typography.HistoricalLigatures="False" Typography.StandardSwashes="0" Typography.ContextualSwashes="0" Typography.ContextualAlternates="True" Typography.StylisticAlternates="0" Typography.StylisticSet1="False" Typography.StylisticSet2="False" Typography.StylisticSet3="False" Typography.StylisticSet4="False" Typography.StylisticSet5="False" Typography.StylisticSet6="False" Typography.StylisticSet7="False" Typography.StylisticSet8="False" Typography.StylisticSet9="False" Typography.StylisticSet10="False" Typography.StylisticSet11="False" Typography.StylisticSet12="False" Typography.StylisticSet13="False" Typography.StylisticSet14="False" Typography.StylisticSet15="False" Typography.StylisticSet16="False" Typography.StylisticSet17="False" Typography.StylisticSet18="False" Typography.StylisticSet19="False" Typography.StylisticSet20="False" Typography.Capitals="Normal" Typography.CapitalSpacing="False" Typography.Kerning="True" Typography.CaseSensitiveForms="False" Typography.HistoricalForms="False" Typography.Fraction="Normal" Typography.NumeralStyle="Normal" Typography.NumeralAlignment="Normal" Typography.SlashedZero="False" Typography.MathematicalGreek="False" Typography.Variants="Normal" TextOptions.TextHintingMode="Fixed" TextOptions.TextFormattingMode="Ideal" TextOptions.TextRenderingMode="Auto" TextAlignment="Left" LineHeight="0" LineStackingStrategy="MaxHeight"><Run>Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour unnm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.</Run></Paragraph></Section>' from xml.etree import cElementTree as ET results = [] root = ET.fromstring(xmlstr) for p in list(root): for r in list(p): print(r.text) results.append(r.text)
Result:
Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un m² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.
If you run the code in a python interactive prompt, at the end you can use results:
>>> results ['Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour unnm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.'] >>> results[0] 'Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour unnm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.'