Skip to content
Advertisement

how to modify a saved html page?

I am trying to modify a saved html webpage. More specifically, I would like to highlight specific sentences in the page and save as a new html page.

I thought the code below would work but it does not

import re

#download https://en.wikipedia.org/wiki/HTML to disk using chrome / save as complete html
with open(r"C:UsersDownloadswebpage.html", mode='rt', encoding='utf-8') as f:

    mytext = f.read()

    #highlight "The HyperText Markup Language, or HTML" in red
    re.sub("The HyperText Markup Language, or HTML", mytext,
           '<span style="color: red">{}</span>'.format(r'/1'))
    mytext.write(r"C:UsersDownloadswebpage_modif.html")

  File "<ipython-input-9-f7f9195da80f>", line 5, in <module>
    mytext.write(r"C:UsersDownloadswebpage_modif.html")

AttributeError: 'str' object has no attribute 'write'

Any ideas? Thanks!

Advertisement

Answer

Here is how you can open html file, edit using bs4 and write to the new file. I assume you are trying to add style attribute to the span tag:

import requests
from bs4 import BeautifulSoup
from xml.sax.saxutils import unescape

url = 'https://en.wikipedia.org/wiki/HTML'
res = requests.get(url).content

soup = BeautifulSoup(res, 'html.parser')
text_to_be_highlighted = "The HyperText Markup Language, or HTML"
highlighed_text = f'<span style="color: red">{text_to_be_highlighted}</span>'


# grab all tags with specified text
tags = [tag for tag in soup.find_all(lambda tag: text_to_be_highlighted in tag.text)]

for tag in tags:
    new_text = tag.text.replace(text_to_be_highlighted, highlighed_text)
    tag.string = new_text


with open("new.html", "w", encoding = 'utf-8') as f:
    f.write(unescape(soup.prettify()))

Explanation: Grab all tags contains specified text with the help find_all method and lambda function. Get whole text and replace specified text with new tag that highlights that text. Finally, write modified soup to the new file

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement