I’m trying to scrap data from this website “https://quranromanurdu.com/chapter/1” , I want only text or content from id-contentpara and return that content in JSON format, this below code gives html content but i want that to convert to JSON. I tried to convert but I’m getting error , please somebody help me to clear this error
python code :
JavaScript
x
14
14
1
import requests
2
from bs4 import BeautifulSoup
3
import json
4
import codecs
5
6
URL = "https://quranromanurdu.com/chapter/1"
7
page = requests.get(URL)
8
9
soup = BeautifulSoup(page.content, "html.parser")
10
table = soup.findAll('div',attrs={"id":"contentpara"})
11
12
data0 = json.loads(table)
13
print(data0)
14
Error
JavaScript
1
6
1
line 24, in <module>
2
data0 = json.loads(table)
3
File "C:UsersarbazalxAppDataLocalProgramsPythonPython310libjson__init__.py", line 339, in loads
4
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
5
TypeError: the JSON object must be str, bytes or bytearray, not ResultSet
6
Advertisement
Answer
You can do like this,
JavaScript
1
11
11
1
your code
2
table = soup.findAll('div',attrs={"id":"contentpara"})
3
4
values = list(filter(None, table[0].text.split('n')))
5
values = list(filter(None, [value.replace("xa0", "") for value in values[1:]]))
6
7
d = {}
8
for item in values:
9
key, value = item.split('.', maxsplit=1)
10
d[key] = value
11
Output:
JavaScript
1
8
1
{'1': ' Allah ke naam se jo Rehman o Raheem hai.',
2
'2': ' Tareef Allah hi ke liye hai jo tamaam qayinaat ka Rubb hai.',
3
'3': ' Rehman aur Raheem hai.',
4
'4': ' Roz e jaza ka maalik hai.',
5
'5': ' Hum teri hi ibadat karte hain aur tujh hi se madad maangte hain.',
6
'6': ' Humein seedha raasta dikha.',
7
'7': ' Un logon ka raasta jinpar tu nay inam farmaya, jo maatoob nahin huey (na unka jinpar tera gazab hota raha) , jo bhatke huey nahin hain.'}
8