Skip to content
Advertisement

Replacing tags of one kind with tags of another in BeautifulSoup

I have a collection of HTML files. I wish to iterate over them, one by one, editing the mark-up of a particular class. The code I wish to edit is of the following form, using the following class names :

<td class='thisIsMyClass' colspan=4>
  <a id='123' class='thisIsMyOtherClass' href='123'>Put me Elsewhere</a> 

This can occur multiple times in the same document, with different text instead of “Put me Elsewhere”, but always the same classes.

I want to change this to be of the form :

<font SIZE="3"  COLOR="#333333"  FACE="Verdana"  STYLE="background-color:#ffffff;font-weight: bold;">
  <h2>Put Me Elsewhere</h2>
</font>
import os
for filename in os.listdir('dirname'):
 replace(filename)

def replace(filename):
 tags = soup.find_all(attrs={"thisIsMyClass"})

What can I try after this, and how can I deal with the tags array?

Advertisement

Answer

Much better and more beautiful would be to prepare a replacement HTML string with a placeholder, find all td tags with thisIsMyClass class and use .replace_with() to replace each:

from bs4 import BeautifulSoup

data = """
<table>
    <tr>
        <td class='thisIsMyClass' colspan=4>
          <a id='123' class='thisIsMyOtherClass' href='123'>Put me Elsewhere</a>
        </td>
    </tr>
</table>
"""

replacement = """
<font SIZE="3"  COLOR="#333333"  FACE="Verdana"  STYLE="background-color:#ffffff;font-weight: bold;">
  <h2>{text}</h2>
</font>
"""

soup = BeautifulSoup(data, 'html.parser')
for td in soup.select('td.thisIsMyClass'):
    td.replace_with(BeautifulSoup(replacement.format(text=td.a.text), 'html.parser'))

print soup.prettify()

Prints:

<table>
    <tr>
        <font color="#333333" face="Verdana" size="3" style="background-color:#ffffff;font-weight: bold;">
            <h2>
            Put me Elsewhere
            </h2>
        </font>
    </tr>
</table>
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement