I am trying to convert HTML table to json using beautifulsoup() function python, I was able to convert but the data coming in wrong json format.
JavaScript
x
42
42
1
from bs4 import BeautifulSoup
2
import json
3
4
reading_table = """
5
<table>
6
<tbody>
7
<tr>
8
<td><span class="customlabel">Energy Source</span></td>
9
<td><span class="custominput">EB</span></td>
10
<td><span class="customlabel">Grid Reading </span></td>
11
<td><span class="custominput">2666.2</span></td>
12
<td><span class="customlabel">DG Reading </span></td>
13
<td><span class="custominput">15.5</span></td>
14
</tr>
15
<tr>
16
<td><span class="customlabel">Power Factor</span></td>
17
<td><span class="custominput">0.844</span></td>
18
<td><span class="customlabel">Total Kw</span></td>
19
<td><span class="custominput">0.273</span></td>
20
<td><span class="customlabel">Total KVA</span></td>
21
<td><span class="custominput">0.34</span></td>
22
</tr>
23
<tr>
24
<td><span class="customlabel">Average Voltage</span></td>
25
<td><span class="custominput">241.7</span></td>
26
<td><span class="customlabel">Total Current</span></td>
27
<td><span class="custominput">1.54</span></td>
28
<td><span class="customlabel">Frequency Hz</span></td>
29
<td><span class="custominput">50</span></td>
30
</tr>
31
</tbody>
32
</table>
33
"""
34
35
reading_table_data = [
36
[cell.text for cell in row("td")]
37
for row in BeautifulSoup(reading_table, features="html.parser")("tr")
38
]
39
40
print(reading_table_data)
41
42
The above code prints JSON in the below format.
JavaScript
1
3
1
[['Energy Source', 'EB', 'Grid Reading ', '2666.2', 'DG Reading ', '15.5'], ['Power Factor', '0.844', 'Total Kw', '0.273', 'Total KVA', '0.34'], ['Average Voltage', '241.7', 'Total Current', '1.54', 'Frequency Hz', '50']]
2
3
I would like to get it in below format
JavaScript
1
12
12
1
[
2
'Energy Source': 'EB',
3
'Grid Reading ': '2666.2'
4
'DG Reading ', '15.5',
5
'Power Factor', '0.844',
6
'Total Kw', '0.273',
7
'Total KVA', '0.34',
8
'Average Voltage', '241.7',
9
'Total Current', '1.54',
10
'Frequency Hz', '50'
11
]
12
Some help is appreciated
Advertisement
Answer
The output you want is not a valid format, so you can print it after converting the dict to string and replacing the braces.
Here is the working code:
JavaScript
1
11
11
1
tds = BeautifulSoup(reading_table, features="html.parser").findAll("td")
2
data = {}
3
for td in tds:
4
if "customlabel" in td.span.get("class"):
5
attr_key = td.span.text
6
data[attr_key] = ""
7
if "custominput" in td.span.get("class"):
8
attr_value = td.span.text
9
data[attr_key] = attr_value
10
print(json.dumps(data).replace("{", "[").replace("}", "]"))
11