My understanding is that BeautifulSoup is more for getting data rather than modifying, though it can perform that. I have a skeleton HTML tree called ‘tree’, and want to insert data from a database query to modify the HTML. The amount of data inserted is variable. I’m aware of the method BeautifulSoup.new_tag() but am not sure how to integrate with multiple data ponits.
tree
<tbody> <tr> </tr> </tbody>
Modify to:
<tbody> <tr> <td> <a href="www.example.com">A</a> </td> <td> X <time>(4)</time>, Y <time>(6)</time>, Z <time>(7)</time>, </td> </tr> </tbody>
They are added according the table df:
| Group | Data | time | | ------ | -----------------| -----| | A | X | 3 | | A | Y | 4 | | A | Z | 8 | | B | X | 5 | | B | Y | 9 | | B | Z | 10 |
From the HTML, there are 2 rows to add to the tag. In this case, let’s say I only want to add group A (though I’d want to add all groups generally). Using Pandas, I can groupby to create a new table called ‘grouped’ with Group as the index.
grouped
| Group | Data_time | | ------ | -------------------| | A | {X: 3, Y: 4, Z: 8} | | B | {X: 5, Y: 9, Z: 10}|
So My psuedocode would be do something like this. Let soup =
soup = BeautifulSoup(tree, 'html.parser') old_tag = soup.tbody.tr for index, row in grouped.iterrows(): add <td> tag to old_tag add <a href="www.example.com">$index = A$</a> to above <td> tag add another <td> tag to old_tag for key, value in grouped.loc[index][data_time]: add '$key$ <time>($value$)</time>' to above <td> tag
I understand the above logic, except the internals of how to add new tags with BeautifulSoup.
Advertisement
Answer
You can use pandas.groupby
and GroupBy.apply
to group your column Group
and then create the td
tags by looping over the rows.
Say you have the dataframe :
Group Data time 0 A X 3 1 A Y 4 2 A Z 8 3 B X 5 4 B Y 9 5 B Z 10
On applying the groupby function we get
df.groupby('Group').apply(lambda x : f''' <td> <a href="www.example.com">{x.name}</a> </td> ''' + "<td>n" + ''.join([f"{row.Data}<time>({row.time})</time>,n" for name, row in x.iterrows()]) + "</td>n")
Output
Now, say you want to concat the result for all groups all you need to do is :
html_tag = df.groupby('Group').apply(lambda x : f''' <td> <a href="www.example.com">{x.name}</a> </td> ''' + "<td>n" + ''.join([f"{row.Data}<time>({row.time})</time>,n" for name, row in x.iterrows()]) + "</td>n") print(html_tag.str.cat(sep=''))
which gives us the expected output :
<td> <a href="www.example.com">A</a></td> <td> X<time>(3)</time>, Y<time>(4)</time>, Z<time>(8)</time>, </td> <td> <a href="www.example.com">B</a></td> <td> X<time>(5)</time>, Y<time>(9)</time>, Z<time>(10)</time>, </td>
Now that you got the string, you can insert it to your html using python formatted string or concat
<tbody> <tr> </tr> </tbody>