I have a collection of HTML files. I wish to iterate over them, one by one, editing the mark-up of a particular class. The code I wish to edit is of the following form, using the following class names :
<td class='thisIsMyClass' colspan=4> <a id='123' class='thisIsMyOtherClass' href='123'>Put me Elsewhere</a>
This can occur multiple times in the same document, with different text instead of “Put me Elsewhere”, but always the same classes.
I want to change this to be of the form :
<font SIZE="3" COLOR="#333333" FACE="Verdana" STYLE="background-color:#ffffff;font-weight: bold;"> <h2>Put Me Elsewhere</h2> </font>
import os for filename in os.listdir('dirname'): replace(filename) def replace(filename): tags = soup.find_all(attrs={"thisIsMyClass"})
What can I try after this, and how can I deal with the tags array?
Advertisement
Answer
Much better and more beautiful would be to prepare a replacement HTML string with a placeholder, find all td
tags with thisIsMyClass
class and use .replace_with()
to replace each:
from bs4 import BeautifulSoup data = """ <table> <tr> <td class='thisIsMyClass' colspan=4> <a id='123' class='thisIsMyOtherClass' href='123'>Put me Elsewhere</a> </td> </tr> </table> """ replacement = """ <font SIZE="3" COLOR="#333333" FACE="Verdana" STYLE="background-color:#ffffff;font-weight: bold;"> <h2>{text}</h2> </font> """ soup = BeautifulSoup(data, 'html.parser') for td in soup.select('td.thisIsMyClass'): td.replace_with(BeautifulSoup(replacement.format(text=td.a.text), 'html.parser')) print soup.prettify()
Prints:
<table> <tr> <font color="#333333" face="Verdana" size="3" style="background-color:#ffffff;font-weight: bold;"> <h2> Put me Elsewhere </h2> </font> </tr> </table>