I have a collection of HTML files. I wish to iterate over them, one by one, editing the mark-up of a particular class. The code I wish to edit is of the following form, using the following class names :
JavaScript
x
3
1
<td class='thisIsMyClass' colspan=4>
2
<a id='123' class='thisIsMyOtherClass' href='123'>Put me Elsewhere</a>
3
This can occur multiple times in the same document, with different text instead of “Put me Elsewhere”, but always the same classes.
I want to change this to be of the form :
JavaScript
1
4
1
<font SIZE="3" COLOR="#333333" FACE="Verdana" STYLE="background-color:#ffffff;font-weight: bold;">
2
<h2>Put Me Elsewhere</h2>
3
</font>
4
JavaScript
1
7
1
import os
2
for filename in os.listdir('dirname'):
3
replace(filename)
4
5
def replace(filename):
6
tags = soup.find_all(attrs={"thisIsMyClass"})
7
What can I try after this, and how can I deal with the tags array?
Advertisement
Answer
Much better and more beautiful would be to prepare a replacement HTML string with a placeholder, find all td
tags with thisIsMyClass
class and use .replace_with()
to replace each:
JavaScript
1
24
24
1
from bs4 import BeautifulSoup
2
3
data = """
4
<table>
5
<tr>
6
<td class='thisIsMyClass' colspan=4>
7
<a id='123' class='thisIsMyOtherClass' href='123'>Put me Elsewhere</a>
8
</td>
9
</tr>
10
</table>
11
"""
12
13
replacement = """
14
<font SIZE="3" COLOR="#333333" FACE="Verdana" STYLE="background-color:#ffffff;font-weight: bold;">
15
<h2>{text}</h2>
16
</font>
17
"""
18
19
soup = BeautifulSoup(data, 'html.parser')
20
for td in soup.select('td.thisIsMyClass'):
21
td.replace_with(BeautifulSoup(replacement.format(text=td.a.text), 'html.parser'))
22
23
print soup.prettify()
24
Prints:
JavaScript
1
10
10
1
<table>
2
<tr>
3
<font color="#333333" face="Verdana" size="3" style="background-color:#ffffff;font-weight: bold;">
4
<h2>
5
Put me Elsewhere
6
</h2>
7
</font>
8
</tr>
9
</table>
10