Replacing HTML but saving the word sticking at the end

Question

I was working with text data, I want to remove anything HTML code that is things with “<” and “>”. For example << HTML > < p style=”text-align:justify” >Labour Solutions Australia (LSA) is a national labour hire and sourcing ` So I use the following cod…

Accepted Answer

import redef remove_html(data): return re.sub('<[^>]+>', '', data).strip()test_case = '< HTML > < p style="text-align:justify" >Labour Solutions Australia (LSA) is a national labour hire and sourcing'print(remove_html(test_case))Output:Labour Solutions Australia (LSA) is a national labour hire and sourcing

Advertisement

Answer