Replacing String Text That Contains Double Quotes

I have a number series contained in a string, and I want to remove everything but the number series. But the double quotes are giving me errors. Here are examples of the strings and a sample command that I have used. All I want is 127.60-02-15, 127.60-02-16, etc.

<span id="lblTaxMapNum">127.60-02-15</span>
<span id="lblTaxMapNum">127.60-02-16</span>

JavaScript
​x
 
<span id="lblTaxMapNum">127.60-02-15</span>
<span id="lblTaxMapNum">127.60-02-16</span>
​

I have tried all sorts of methods (e.g., triple double quotes, single quotes, quotes with backslashes, etc.). Here is one inelegant way that still isn’t working because it’s still leaving “>:

text = text.replace("<span id=", "")
text = text.replace(""lblTaxMapNum"", "")
text = text.replace("</span>", "")

JavaScript
 
text = text.replace("<span id=", "")
text = text.replace(""lblTaxMapNum"", "")
text = text.replace("</span>", "")
​

Here is what I am working with (more specific code). I’m retrieving the data from an CSV and just trying to clean it up.

text = open("outputA.csv", "r")
text = ''.join([i for i in text])
text = text.replace("<span id=", "")
text = text.replace(""lblTaxMapNum"", "")
text = text.replace("</span>", "")
outputB = open("outputB.csv", "w")
outputB.writelines(text)
outputB.close()

JavaScript
 
text = open("outputA.csv", "r")
text = ''.join([i for i in text])
text = text.replace("<span id=", "")
text = text.replace(""lblTaxMapNum"", "")
text = text.replace("</span>", "")
outputB = open("outputB.csv", "w")
outputB.writelines(text)
outputB.close()
​

Answer

If you add a > in the second replace it is still not elegant but it works:

text = text.replace("<span id=", "")
text = text.replace(""lblTaxMapNum">", "")
text = text.replace("</span>", "")

JavaScript
 
text = text.replace("<span id=", "")
text = text.replace(""lblTaxMapNum">", "")
text = text.replace("</span>", "")
​

Alternatively, you could use a regex:

import re

text = "<span id="lblTaxMapNum">127.60-02-16</span>"

pattern = r".*>(d*.d*-d*-d*)D*"  # the pattern in the brackets matches the number
match = re.search(pattern, text)  # this searches for the pattern in the text

print(match.group(1))  # this prints out only the number

JavaScript
 
import re
​
text = "<span id="lblTaxMapNum">127.60-02-16</span>"
​
pattern = r".*>(d*.d*-d*-d*)D*"  # the pattern in the brackets matches the number
match = re.search(pattern, text)  # this searches for the pattern in the text
​
print(match.group(1))  # this prints out only the number
​

Advertisement

Answer