I am replacing special character with some asci code and ignoring html tags with the help of below regex
text_list = re.findall(r'>([Ss]*?)<', html)
So it is ignoring all html tags as we want it but is not ignoring html comment closing tag “–>”.
Any help appreciated. What should I changed in regex.
Attached screenshot for your reference.
Advertisement
Answer
Please try whil read the file please pass the multiple encoding parameters