I’m trying to scrap data from this website “https://quranromanurdu.com/chapter/1” , I want only text or content from id-contentpara and return that content in JSON format, this below code gives html content but i want that to convert to JSON. I tried to convert but I’m getting error , please somebody help me to clear this error
python code :
import requests from bs4 import BeautifulSoup import json import codecs URL = "https://quranromanurdu.com/chapter/1" page = requests.get(URL) soup = BeautifulSoup(page.content, "html.parser") table = soup.findAll('div',attrs={"id":"contentpara"}) data0 = json.loads(table) print(data0)
Error
line 24, in <module> data0 = json.loads(table) File "C:UsersarbazalxAppDataLocalProgramsPythonPython310libjson__init__.py", line 339, in loads raise TypeError(f'the JSON object must be str, bytes or bytearray, ' TypeError: the JSON object must be str, bytes or bytearray, not ResultSet
Advertisement
Answer
You can do like this,
... your code ... table = soup.findAll('div',attrs={"id":"contentpara"}) values = list(filter(None, table[0].text.split('n'))) values = list(filter(None, [value.replace("xa0", "") for value in values[1:]])) d = {} for item in values: key, value = item.split('.', maxsplit=1) d[key] = value
Output:
{'1': ' Allah ke naam se jo Rehman o Raheem hai.', '2': ' Tareef Allah hi ke liye hai jo tamaam qayinaat ka Rubb hai.', '3': ' Rehman aur Raheem hai.', '4': ' Roz e jaza ka maalik hai.', '5': ' Hum teri hi ibadat karte hain aur tujh hi se madad maangte hain.', '6': ' Humein seedha raasta dikha.', '7': ' Un logon ka raasta jinpar tu nay inam farmaya, jo maatoob nahin huey (na unka jinpar tera gazab hota raha) , jo bhatke huey nahin hain.'}