I’m getting error when trying to convert Html to JSON using python with beautiful soup

I’m trying to scrap data from this website “https://quranromanurdu.com/chapter/1” , I want only text or content from id-contentpara and return that content in JSON format, this below code gives html content but i want that to convert to JSON. I tried to convert but I’m getting error , please somebody help me to clear this error

python code :

import requests
from bs4 import BeautifulSoup
import json
import codecs

URL = "https://quranromanurdu.com/chapter/1"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
table = soup.findAll('div',attrs={"id":"contentpara"})

data0 = json.loads(table)
print(data0)

JavaScript
​x
 
import requests
from bs4 import BeautifulSoup
import json
import codecs
​
URL = "https://quranromanurdu.com/chapter/1"
page = requests.get(URL)
​
soup = BeautifulSoup(page.content, "html.parser")
table = soup.findAll('div',attrs={"id":"contentpara"})
​
data0 = json.loads(table)
print(data0)
​

Error

line 24, in <module>
    data0 = json.loads(table)
  File "C:UsersarbazalxAppDataLocalProgramsPythonPython310libjson__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not ResultSet

JavaScript
 
line 24, in <module>
    data0 = json.loads(table)
  File "C:UsersarbazalxAppDataLocalProgramsPythonPython310libjson__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not ResultSet
​

Answer

You can do like this,

... your code ...
table = soup.findAll('div',attrs={"id":"contentpara"})

values = list(filter(None, table[0].text.split('n')))
values = list(filter(None, [value.replace("xa0", "") for value in values[1:]]))

d = {}
for item in values:
    key, value = item.split('.', maxsplit=1)
    d[key] = value

JavaScript
 
... your code ...
table = soup.findAll('div',attrs={"id":"contentpara"})
​
values = list(filter(None, table[0].text.split('n')))
values = list(filter(None, [value.replace("xa0", "") for value in values[1:]]))
​
d = {}
for item in values:
    key, value = item.split('.', maxsplit=1)
    d[key] = value
​

Output:

{'1': ' Allah ke naam se jo Rehman o Raheem hai.',
 '2': ' Tareef Allah hi ke liye hai jo tamaam qayinaat ka Rubb hai.',
 '3': ' Rehman aur Raheem hai.',
 '4': ' Roz e jaza ka maalik hai.',
 '5': ' Hum teri hi ibadat karte hain aur tujh hi se madad maangte hain.',
 '6': ' Humein seedha raasta dikha.',
 '7': ' Un logon ka raasta jinpar tu nay inam farmaya, jo maatoob nahin huey (na unka jinpar tera gazab hota raha) , jo bhatke huey nahin hain.'}

JavaScript
 
{'1': ' Allah ke naam se jo Rehman o Raheem hai.',
 '2': ' Tareef Allah hi ke liye hai jo tamaam qayinaat ka Rubb hai.',
 '3': ' Rehman aur Raheem hai.',
 '4': ' Roz e jaza ka maalik hai.',
 '5': ' Hum teri hi ibadat karte hain aur tujh hi se madad maangte hain.',
 '6': ' Humein seedha raasta dikha.',
 '7': ' Un logon ka raasta jinpar tu nay inam farmaya, jo maatoob nahin huey (na unka jinpar tera gazab hota raha) , jo bhatke huey nahin hain.'}
​

Advertisement

Answer