I am trying to modify a saved html webpage. More specifically, I would like to highlight specific sentences in the page and save as a new html page.
I thought the code below would work but it does not
import re #download https://en.wikipedia.org/wiki/HTML to disk using chrome / save as complete html with open(r"C:UsersDownloadswebpage.html", mode='rt', encoding='utf-8') as f: mytext = f.read() #highlight "The HyperText Markup Language, or HTML" in red re.sub("The HyperText Markup Language, or HTML", mytext, '<span style="color: red">{}</span>'.format(r'/1')) mytext.write(r"C:UsersDownloadswebpage_modif.html") File "<ipython-input-9-f7f9195da80f>", line 5, in <module> mytext.write(r"C:UsersDownloadswebpage_modif.html") AttributeError: 'str' object has no attribute 'write'
Any ideas? Thanks!
Advertisement
Answer
Here is how you can open html file, edit using bs4
and write to the new file. I assume you are trying to add style
attribute to the span
tag:
import requests from bs4 import BeautifulSoup from xml.sax.saxutils import unescape url = 'https://en.wikipedia.org/wiki/HTML' res = requests.get(url).content soup = BeautifulSoup(res, 'html.parser') text_to_be_highlighted = "The HyperText Markup Language, or HTML" highlighed_text = f'<span style="color: red">{text_to_be_highlighted}</span>' # grab all tags with specified text tags = [tag for tag in soup.find_all(lambda tag: text_to_be_highlighted in tag.text)] for tag in tags: new_text = tag.text.replace(text_to_be_highlighted, highlighed_text) tag.string = new_text with open("new.html", "w", encoding = 'utf-8') as f: f.write(unescape(soup.prettify()))
Explanation: Grab all tags contains specified text with the help find_all
method and lambda
function. Get whole text and replace specified text with new tag that highlights that text. Finally, write modified soup
to the new file