I am trying to modify a saved html webpage. More specifically, I would like to highlight specific sentences in the page and save as a new html page.
I thought the code below would work but it does not
JavaScript
x
17
17
1
import re
2
3
#download https://en.wikipedia.org/wiki/HTML to disk using chrome / save as complete html
4
with open(r"C:UsersDownloadswebpage.html", mode='rt', encoding='utf-8') as f:
5
6
mytext = f.read()
7
8
#highlight "The HyperText Markup Language, or HTML" in red
9
re.sub("The HyperText Markup Language, or HTML", mytext,
10
'<span style="color: red">{}</span>'.format(r'/1'))
11
mytext.write(r"C:UsersDownloadswebpage_modif.html")
12
13
File "<ipython-input-9-f7f9195da80f>", line 5, in <module>
14
mytext.write(r"C:UsersDownloadswebpage_modif.html")
15
16
AttributeError: 'str' object has no attribute 'write'
17
Any ideas? Thanks!
Advertisement
Answer
Here is how you can open html file, edit using bs4
and write to the new file. I assume you are trying to add style
attribute to the span
tag:
JavaScript
1
23
23
1
import requests
2
from bs4 import BeautifulSoup
3
from xml.sax.saxutils import unescape
4
5
url = 'https://en.wikipedia.org/wiki/HTML'
6
res = requests.get(url).content
7
8
soup = BeautifulSoup(res, 'html.parser')
9
text_to_be_highlighted = "The HyperText Markup Language, or HTML"
10
highlighed_text = f'<span style="color: red">{text_to_be_highlighted}</span>'
11
12
13
# grab all tags with specified text
14
tags = [tag for tag in soup.find_all(lambda tag: text_to_be_highlighted in tag.text)]
15
16
for tag in tags:
17
new_text = tag.text.replace(text_to_be_highlighted, highlighed_text)
18
tag.string = new_text
19
20
21
with open("new.html", "w", encoding = 'utf-8') as f:
22
f.write(unescape(soup.prettify()))
23
Explanation: Grab all tags contains specified text with the help find_all
method and lambda
function. Get whole text and replace specified text with new tag that highlights that text. Finally, write modified soup
to the new file