I am trying to modify a saved html webpage. More specifically, I would like to highlight specific sentences in the page and save as a new html page.
I thought the code below would work but it does not
import re
#download https://en.wikipedia.org/wiki/HTML to disk using chrome / save as complete html
with open(r"C:UsersDownloadswebpage.html", mode='rt', encoding='utf-8') as f:
mytext = f.read()
#highlight "The HyperText Markup Language, or HTML" in red
re.sub("The HyperText Markup Language, or HTML", mytext,
'<span style="color: red">{}</span>'.format(r'/1'))
mytext.write(r"C:UsersDownloadswebpage_modif.html")
File "<ipython-input-9-f7f9195da80f>", line 5, in <module>
mytext.write(r"C:UsersDownloadswebpage_modif.html")
AttributeError: 'str' object has no attribute 'write'
Any ideas? Thanks!
Advertisement
Answer
Here is how you can open html file, edit using bs4 and write to the new file. I assume you are trying to add style attribute to the span tag:
import requests
from bs4 import BeautifulSoup
from xml.sax.saxutils import unescape
url = 'https://en.wikipedia.org/wiki/HTML'
res = requests.get(url).content
soup = BeautifulSoup(res, 'html.parser')
text_to_be_highlighted = "The HyperText Markup Language, or HTML"
highlighed_text = f'<span style="color: red">{text_to_be_highlighted}</span>'
# grab all tags with specified text
tags = [tag for tag in soup.find_all(lambda tag: text_to_be_highlighted in tag.text)]
for tag in tags:
new_text = tag.text.replace(text_to_be_highlighted, highlighed_text)
tag.string = new_text
with open("new.html", "w", encoding = 'utf-8') as f:
f.write(unescape(soup.prettify()))
Explanation: Grab all tags contains specified text with the help find_all method and lambda function. Get whole text and replace specified text with new tag that highlights that text. Finally, write modified soup to the new file