I am trying to scrape a table from url. The table seems to have no name. I have scraped the links and its text to csv using below code.
JavaScript
x
27
27
1
from bs4 import BeautifulSoup
2
import requests
3
import re
4
5
url = 'https://www.sbp.org.pk/smefd/circulars/2020/index.htm'
6
7
r = requests.get(url)
8
9
soup = BeautifulSoup(r.content, 'html.parser')
10
11
12
anchor = soup.find_all('a',href = re.compile('2020'))
13
14
15
file_name = "Circulars.csv"
16
f = open(file_name,'w')
17
header = 'Cir_Name ,Linksn'
18
f.write(header)
19
20
for link in anchor:
21
href = link.get('href')
22
text = link.getText()
23
text1 = text.replace("n", "")
24
f.write(text1.replace(',','|') + "," + href.replace("," ,"|") +"n")
25
26
print("Done")
27
What I need is to scrape the table as it is. currently I am able to get just links. I need other columns too.
I have tried following code but failed. I am able to get to the table but thats not enough
JavaScript
1
22
22
1
from bs4 import BeautifulSoup
2
import requests
3
import re
4
5
url = 'https://www.sbp.org.pk/smefd/circulars/2020/index.htm'
6
7
r = requests.get(url)
8
9
soup = BeautifulSoup(r.content, 'html.parser')
10
11
12
table = soup.find_all('table',attrs={"width": "95%",'border':'0','cellpadding':"1"})
13
14
15
16
tablee = soup.find_all('tr',attrs={'table',attrs={"width": "95%",'border':'0','cellpadding':"1"}})
17
18
19
20
print(tablee)
21
22
Advertisement
Answer
To save data to CSV, you can use this example:
JavaScript
1
20
20
1
import requests
2
import pandas as pd
3
from bs4 import BeautifulSoup
4
5
6
url = 'https://www.sbp.org.pk/smefd/circulars/2020/index.htm'
7
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
8
9
table = soup.select_one('table[width="95%"]:not(:has(table))')
10
11
all_data = []
12
for row in table.select('tr:not(:has(td[colspan]))'):
13
tds = [td.get_text(strip=True).replace('n', ' ').replace('t', ' ') for td in row.select('td') if td.get_text(strip=True)]
14
tds += [row.find_previous('td', {'colspan': '4'}).get_text(strip=True).replace('n', ' '), row.a['href']]
15
all_data.append(tds)
16
17
df = pd.DataFrame(all_data)
18
print(df)
19
df.to_csv('data.csv', index=False, header=False)
20
Saves data.csv
(screenshot from LibreOffice):