I’m trying to scrap data from this website “https://quranromanurdu.com/chapter/1” , I want only text or content from id-contentpara and return that content in JSON format, this below code gives html content but i want that to convert to JSON. I tried to convert but I’m getting error , please somebody help me to clear this error
python code :
import requests
from bs4 import BeautifulSoup
import json
import codecs
URL = "https://quranromanurdu.com/chapter/1"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
table = soup.findAll('div',attrs={"id":"contentpara"})
data0 = json.loads(table)
print(data0)
Error
line 24, in <module>
    data0 = json.loads(table)
  File "C:UsersarbazalxAppDataLocalProgramsPythonPython310libjson__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not ResultSet
Advertisement
Answer
You can do like this,
... your code ...
table = soup.findAll('div',attrs={"id":"contentpara"})
values = list(filter(None, table[0].text.split('n')))
values = list(filter(None, [value.replace("xa0", "") for value in values[1:]]))
d = {}
for item in values:
    key, value = item.split('.', maxsplit=1)
    d[key] = value
Output:
{'1': ' Allah ke naam se jo Rehman o Raheem hai.',
 '2': ' Tareef Allah hi ke liye hai jo tamaam qayinaat ka Rubb hai.',
 '3': ' Rehman aur Raheem hai.',
 '4': ' Roz e jaza ka maalik hai.',
 '5': ' Hum teri hi ibadat karte hain aur tujh hi se madad maangte hain.',
 '6': ' Humein seedha raasta dikha.',
 '7': ' Un logon ka raasta jinpar tu nay inam farmaya, jo maatoob nahin huey (na unka jinpar tera gazab hota raha) , jo bhatke huey nahin hain.'}