I am trying to convert HTML table to json using beautifulsoup() function python, I was able to convert but the data coming in wrong json format.
from bs4 import BeautifulSoup import json reading_table = """ <table> <tbody> <tr> <td><span class="customlabel">Energy Source</span></td> <td><span class="custominput">EB</span></td> <td><span class="customlabel">Grid Reading </span></td> <td><span class="custominput">2666.2</span></td> <td><span class="customlabel">DG Reading </span></td> <td><span class="custominput">15.5</span></td> </tr> <tr> <td><span class="customlabel">Power Factor</span></td> <td><span class="custominput">0.844</span></td> <td><span class="customlabel">Total Kw</span></td> <td><span class="custominput">0.273</span></td> <td><span class="customlabel">Total KVA</span></td> <td><span class="custominput">0.34</span></td> </tr> <tr> <td><span class="customlabel">Average Voltage</span></td> <td><span class="custominput">241.7</span></td> <td><span class="customlabel">Total Current</span></td> <td><span class="custominput">1.54</span></td> <td><span class="customlabel">Frequency Hz</span></td> <td><span class="custominput">50</span></td> </tr> </tbody> </table> """ reading_table_data = [ [cell.text for cell in row("td")] for row in BeautifulSoup(reading_table, features="html.parser")("tr") ] print(reading_table_data)
The above code prints JSON in the below format.
[['Energy Source', 'EB', 'Grid Reading ', '2666.2', 'DG Reading ', '15.5'], ['Power Factor', '0.844', 'Total Kw', '0.273', 'Total KVA', '0.34'], ['Average Voltage', '241.7', 'Total Current', '1.54', 'Frequency Hz', '50']]
I would like to get it in below format
[ 'Energy Source': 'EB', 'Grid Reading ': '2666.2' 'DG Reading ', '15.5', 'Power Factor', '0.844', 'Total Kw', '0.273', 'Total KVA', '0.34', 'Average Voltage', '241.7', 'Total Current', '1.54', 'Frequency Hz', '50' ]
Some help is appreciated
Advertisement
Answer
The output you want is not a valid format, so you can print it after converting the dict to string and replacing the braces.
Here is the working code:
tds = BeautifulSoup(reading_table, features="html.parser").findAll("td") data = {} for td in tds: if "customlabel" in td.span.get("class"): attr_key = td.span.text data[attr_key] = "" if "custominput" in td.span.get("class"): attr_value = td.span.text data[attr_key] = attr_value print(json.dumps(data).replace("{", "[").replace("}", "]"))