I have a number series contained in a string, and I want to remove everything but the number series. But the double quotes are giving me errors. Here are examples of the strings and a sample command that I have used. All I want is 127.60-02-15, 127.60-02-16, etc.
<span id="lblTaxMapNum">127.60-02-15</span> <span id="lblTaxMapNum">127.60-02-16</span>
I have tried all sorts of methods (e.g., triple double quotes, single quotes, quotes with backslashes, etc.). Here is one inelegant way that still isn’t working because it’s still leaving “>:
text = text.replace("<span id=", "") text = text.replace(""lblTaxMapNum"", "") text = text.replace("</span>", "")
Here is what I am working with (more specific code). I’m retrieving the data from an CSV and just trying to clean it up.
text = open("outputA.csv", "r") text = ''.join([i for i in text]) text = text.replace("<span id=", "") text = text.replace(""lblTaxMapNum"", "") text = text.replace("</span>", "") outputB = open("outputB.csv", "w") outputB.writelines(text) outputB.close()
Advertisement
Answer
If you add a >
in the second replace
it is still not elegant but it works:
text = text.replace("<span id=", "") text = text.replace(""lblTaxMapNum">", "") text = text.replace("</span>", "")
Alternatively, you could use a regex:
import re text = "<span id="lblTaxMapNum">127.60-02-16</span>" pattern = r".*>(d*.d*-d*-d*)D*" # the pattern in the brackets matches the number match = re.search(pattern, text) # this searches for the pattern in the text print(match.group(1)) # this prints out only the number