Skip to content
Advertisement

Modify HTML with BeautifulSoup using data from Pandas table

My understanding is that BeautifulSoup is more for getting data rather than modifying, though it can perform that. I have a skeleton HTML tree called ‘tree’, and want to insert data from a database query to modify the HTML. The amount of data inserted is variable. I’m aware of the method BeautifulSoup.new_tag() but am not sure how to integrate with multiple data ponits.

tree

      <tbody>
        <tr>
        </tr>
      </tbody>

Modify to:

      <tbody>
        <tr>
          <td>
            <a href="www.example.com">A</a>
          </td>

          <td>
            X <time>(4)</time>,
            Y <time>(6)</time>,
            Z <time>(7)</time>,
          </td>
        </tr>
      </tbody>

They are added according the table df:

|   Group  |       Data       | time | 
| ------   | -----------------| -----|
| A        | X                |   3  |  
| A        | Y                |   4  |
| A        | Z                |   8  | 
| B        | X                |   5  |
| B        | Y                |   9  | 
| B        | Z                |   10 |

From the HTML, there are 2 rows to add to the tag. In this case, let’s say I only want to add group A (though I’d want to add all groups generally). Using Pandas, I can groupby to create a new table called ‘grouped’ with Group as the index.

grouped

|   Group  |       Data_time    | 
| ------   | -------------------|
| A        | {X: 3, Y: 4, Z: 8} |
| B        | {X: 5, Y: 9, Z: 10}|

So My psuedocode would be do something like this. Let soup =

soup = BeautifulSoup(tree, 'html.parser')
old_tag = soup.tbody.tr
for index, row in grouped.iterrows():
   add <td> tag to old_tag
   add <a href="www.example.com">$index = A$</a> to above <td> tag
   add another <td> tag to old_tag
   for key, value in  grouped.loc[index][data_time]:
      add '$key$ <time>($value$)</time>' to above <td> tag

I understand the above logic, except the internals of how to add new tags with BeautifulSoup.

Advertisement

Answer

You can use pandas.groupby and GroupBy.apply to group your column Group and then create the td tags by looping over the rows.

Say you have the dataframe :

  Group Data time
0     A    X    3
1     A    Y    4
2     A    Z    8
3     B    X    5
4     B    Y    9
5     B    Z   10

On applying the groupby function we get

df.groupby('Group').apply(lambda x : f'''
<td>
    <a href="www.example.com">{x.name}</a>
</td>
''' + "<td>n" + ''.join([f"{row.Data}<time>({row.time})</time>,n" for name, row in x.iterrows()]) + "</td>n")

Output

enter image description here

enter image description here

Now, say you want to concat the result for all groups all you need to do is :

html_tag = df.groupby('Group').apply(lambda x : f'''
<td>
    <a href="www.example.com">{x.name}</a>
</td>
''' + "<td>n" + ''.join([f"{row.Data}<time>({row.time})</time>,n" for name, row in x.iterrows()]) + "</td>n")
print(html_tag.str.cat(sep=''))

which gives us the expected output :

<td>
    <a href="www.example.com">A</a></td>
<td>
X<time>(3)</time>,
Y<time>(4)</time>,
Z<time>(8)</time>,
</td>

<td>
    <a href="www.example.com">B</a></td>
<td>
X<time>(5)</time>,
Y<time>(9)</time>,
Z<time>(10)</time>,
</td>

Now that you got the string, you can insert it to your html using python formatted string or concat

<tbody>
    <tr>
    </tr>
</tbody>
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement