A div
of class="tableBody"
has many div
s as children. I want to get all its div
child and get the string which I have highlighted in this picture.
JavaScript
x
8
1
import bs4 as bs
2
import urllib.request
3
source = urllib.request.urlopen("https://www.ungm.org/Public/Notice").read()
4
soup = bs.BeautifulSoup(source,'lxml')
5
6
t_body = soup.find("div", class_="tableBody")
7
t_divs = t_body.find_all("div")
8
the above code returns me a empty list.
I am trying to learn BS4. I appreciate it if you could help me with the code.
Advertisement
Answer
The data you see on the page is loaded dynamically via JavaScript. You can use requests
module to simulate it.
For example:
JavaScript
1
38
38
1
import requests
2
from bs4 import BeautifulSoup
3
4
5
url = 'https://www.ungm.org/Public/Notice/Search'
6
7
payload = {
8
"PageIndex": 0,
9
"PageSize": 15,
10
"Title": "",
11
"Description": "",
12
"Reference": "",
13
"PublishedFrom": "",
14
"PublishedTo": "12-Jul-2020",
15
"DeadlineFrom": "12-Jul-2020",
16
"DeadlineTo": "",
17
"Countries": [],
18
"Agencies": [],
19
"UNSPSCs": [],
20
"NoticeTypes": [],
21
"SortField": "DatePublished",
22
"SortAscending": False,
23
"isPicker": False,
24
"NoticeTASStatus": [],
25
"IsSustainable": False,
26
"NoticeDisplayType": None,
27
"NoticeSearchTotalLabelId": "noticeSearchTotal",
28
"TypeOfCompetitions": []
29
}
30
31
soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' )
32
33
for row in soup.select('.tableRow'):
34
cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')]
35
print(cells[1])
36
print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:]))
37
print('-'*80)
38
Prints:
JavaScript
1
46
46
1
Supply and delivery of 78 smartphones
2
13-Jul-2020 11:00 (GMT 2.00) 11-Jul-2020 FAO Request for quotation 2020/FRMLW/FRMLW/106096 Malawi
3
--------------------------------------------------------------------------------
4
Supply of LEGUMES SEEDS for rainfed season
5
23-Jul-2020 14:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/FRMLW/FRMLW/106051 Malawi
6
--------------------------------------------------------------------------------
7
Supply of MAIZE SEEDS for rainfed season
8
22-Jul-2020 14:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/FRMLW/FRMLW/106050 Malawi
9
--------------------------------------------------------------------------------
10
Procurement of Supply and Installation of Outdoor Metal Furniture for Rooftop Terrace at FAO Headquarters in Rome, Italy
11
10-Aug-2020 12:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/CSAPC/CSDID/105286 Italy
12
--------------------------------------------------------------------------------
13
Procurement of Silo for Emergency Project
14
13-Jul-2020 13:00 (GMT 5.00) 11-Jul-2020 FAO Invitation to bid 2020/FABGD/FABGD/106145 Bangladesh
15
--------------------------------------------------------------------------------
16
Procurement of Concentrate Ruminant Feed
17
13-Jul-2020 13:00 (GMT 5.00) 11-Jul-2020 FAO Invitation to bid 2020/FABGD/FABGD/106064 Bangladesh
18
--------------------------------------------------------------------------------
19
Purchase of Waste Collection Vehicles - (Two Tractors)
20
22-Jul-2020 06:30 (GMT 0.00) 11-Jul-2020 UNOPS Request for quotation RFQ/2020/15298 Sri Lanka
21
--------------------------------------------------------------------------------
22
Procurement of Laboratory Equipment and Material
23
24-Jul-2020 22:23 (GMT -1.00) 11-Jul-2020 FAO Invitation to bid 2020/FRGAM/FRGAM/106143 Gambia
24
--------------------------------------------------------------------------------
25
Compra de chalecos para promotores comunitarios para la Oficina de Unicef Bolivar - LRFQ-2020-9159352
26
16-Jul-2020 23:59 (GMT -3.00) 11-Jul-2020 UNICEF Request for proposal LRFQ-2020-9159352 Venezuela
27
--------------------------------------------------------------------------------
28
Call for Proposals Quality Based Fixed Budget (CFPFB):
29
26-Jul-2020 17:00 (GMT 3.00) 11-Jul-2020 UNDP Request for proposal UNDP-SYR-RPA-051-20 Syrian Arab Republic
30
--------------------------------------------------------------------------------
31
Innovation and Design Specialist
32
27-Jul-2020 00:00 (GMT -5.00) 11-Jul-2020 UNDP Not set Innovation and Design Specialist Turkey
33
--------------------------------------------------------------------------------
34
(RFI) from national and/or international CSOs/NGOs for potential partnership with UNDP and its pooled funding mechanism, the Darfur Community Peace and Stability Fund (DCPSF),
35
26-Jul-2020 08:00 (GMT -7.00) 11-Jul-2020 UNDP Request for information RFI-SDN-20-002 Sudan
36
--------------------------------------------------------------------------------
37
IRAQ-LRPS-017-2020-9159660 Rehabilitation of 3 water projects at Avrek, Grey Basi and Sarsenk in Duhok
38
26-Jul-2020 12:00 (GMT 3.00) 11-Jul-2020 UNICEF Request for proposal 9159660 Iraq
39
--------------------------------------------------------------------------------
40
106142 INVITACIÓN A COTIZAR PARA LA ADQUISICIÓN DE FERTILIZANTES, HERRAMIENTAS Y MATERIALES PARA ECA DE CACAO
41
21-Jul-2020 22:00 (GMT -5.00) 10-Jul-2020 FAO Request for quotation 2020/FLCOL/FLCOL/106142 Colombia
42
--------------------------------------------------------------------------------
43
Achat de tablettes, de GPS et batteries rechargeable (206 tablettes, 68 GPS, et 181 pack chargeurs et batteries rechargeables) à livrer sur Dakar
44
28-Jul-2020 12:00 (GMT 0.00) 10-Jul-2020 FAO Invitation to bid 2020/FRSEN/FRSEN/106093 United Kingdom
45
--------------------------------------------------------------------------------
46
EDIT: To get all pages, filter out only ‘Afghanistan’ country and save to CSV, you can use this example:
JavaScript
1
59
59
1
import csv
2
import requests
3
from bs4 import BeautifulSoup
4
5
6
url = 'https://www.ungm.org/Public/Notice/Search'
7
8
payload = {
9
"PageIndex": 0,
10
"PageSize": 15,
11
"Title": "",
12
"Description": "",
13
"Reference": "",
14
"PublishedFrom": "",
15
"PublishedTo": "12-Jul-2020",
16
"DeadlineFrom": "12-Jul-2020",
17
"DeadlineTo": "",
18
"Countries": [],
19
"Agencies": [],
20
"UNSPSCs": [],
21
"NoticeTypes": [],
22
"SortField": "DatePublished",
23
"SortAscending": False,
24
"isPicker": False,
25
"NoticeTASStatus": [],
26
"IsSustainable": False,
27
"NoticeDisplayType": None,
28
"NoticeSearchTotalLabelId": "noticeSearchTotal",
29
"TypeOfCompetitions": []
30
}
31
32
page, all_data = 0, []
33
while True:
34
print('Page {}...'.format(page))
35
36
payload['PageIndex'] = page
37
soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' )
38
rows = soup.select('.tableRow')
39
if not rows:
40
break
41
42
for row in rows:
43
cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')]
44
print(cells[1])
45
print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:]))
46
print('-'*80)
47
48
# we are only interested in Afghanistan:
49
if 'afghanistan' in cells[7].lower():
50
all_data.append([row['data-noticeid'], *cells[1:]])
51
52
page += 1
53
54
# write to csv file:
55
with open('data.csv', 'w', newline='') as csvfile:
56
csv_writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
57
for row in all_data:
58
csv_writer.writerow(row)
59
Saved data.csv
(screenshot from LibreOffice):