A div
of class="tableBody"
has many div
s as children. I want to get all its div
child and get the string which I have highlighted in this picture.
import bs4 as bs import urllib.request source = urllib.request.urlopen("https://www.ungm.org/Public/Notice").read() soup = bs.BeautifulSoup(source,'lxml') t_body = soup.find("div", class_="tableBody") t_divs = t_body.find_all("div")
the above code returns me a empty list.
I am trying to learn BS4. I appreciate it if you could help me with the code.
Advertisement
Answer
The data you see on the page is loaded dynamically via JavaScript. You can use requests
module to simulate it.
For example:
import requests from bs4 import BeautifulSoup url = 'https://www.ungm.org/Public/Notice/Search' payload = { "PageIndex": 0, "PageSize": 15, "Title": "", "Description": "", "Reference": "", "PublishedFrom": "", "PublishedTo": "12-Jul-2020", "DeadlineFrom": "12-Jul-2020", "DeadlineTo": "", "Countries": [], "Agencies": [], "UNSPSCs": [], "NoticeTypes": [], "SortField": "DatePublished", "SortAscending": False, "isPicker": False, "NoticeTASStatus": [], "IsSustainable": False, "NoticeDisplayType": None, "NoticeSearchTotalLabelId": "noticeSearchTotal", "TypeOfCompetitions": [] } soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' ) for row in soup.select('.tableRow'): cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')] print(cells[1]) print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:])) print('-'*80)
Prints:
Supply and delivery of 78 smartphones 13-Jul-2020 11:00 (GMT 2.00) 11-Jul-2020 FAO Request for quotation 2020/FRMLW/FRMLW/106096 Malawi -------------------------------------------------------------------------------- Supply of LEGUMES SEEDS for rainfed season 23-Jul-2020 14:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/FRMLW/FRMLW/106051 Malawi -------------------------------------------------------------------------------- Supply of MAIZE SEEDS for rainfed season 22-Jul-2020 14:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/FRMLW/FRMLW/106050 Malawi -------------------------------------------------------------------------------- Procurement of Supply and Installation of Outdoor Metal Furniture for Rooftop Terrace at FAO Headquarters in Rome, Italy 10-Aug-2020 12:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/CSAPC/CSDID/105286 Italy -------------------------------------------------------------------------------- Procurement of Silo for Emergency Project 13-Jul-2020 13:00 (GMT 5.00) 11-Jul-2020 FAO Invitation to bid 2020/FABGD/FABGD/106145 Bangladesh -------------------------------------------------------------------------------- Procurement of Concentrate Ruminant Feed 13-Jul-2020 13:00 (GMT 5.00) 11-Jul-2020 FAO Invitation to bid 2020/FABGD/FABGD/106064 Bangladesh -------------------------------------------------------------------------------- Purchase of Waste Collection Vehicles - (Two Tractors) 22-Jul-2020 06:30 (GMT 0.00) 11-Jul-2020 UNOPS Request for quotation RFQ/2020/15298 Sri Lanka -------------------------------------------------------------------------------- Procurement of Laboratory Equipment and Material 24-Jul-2020 22:23 (GMT -1.00) 11-Jul-2020 FAO Invitation to bid 2020/FRGAM/FRGAM/106143 Gambia -------------------------------------------------------------------------------- Compra de chalecos para promotores comunitarios para la Oficina de Unicef Bolivar - LRFQ-2020-9159352 16-Jul-2020 23:59 (GMT -3.00) 11-Jul-2020 UNICEF Request for proposal LRFQ-2020-9159352 Venezuela -------------------------------------------------------------------------------- Call for Proposals Quality Based Fixed Budget (CFPFB): 26-Jul-2020 17:00 (GMT 3.00) 11-Jul-2020 UNDP Request for proposal UNDP-SYR-RPA-051-20 Syrian Arab Republic -------------------------------------------------------------------------------- Innovation and Design Specialist 27-Jul-2020 00:00 (GMT -5.00) 11-Jul-2020 UNDP Not set Innovation and Design Specialist Turkey -------------------------------------------------------------------------------- (RFI) from national and/or international CSOs/NGOs for potential partnership with UNDP and its pooled funding mechanism, the Darfur Community Peace and Stability Fund (DCPSF), 26-Jul-2020 08:00 (GMT -7.00) 11-Jul-2020 UNDP Request for information RFI-SDN-20-002 Sudan -------------------------------------------------------------------------------- IRAQ-LRPS-017-2020-9159660 Rehabilitation of 3 water projects at Avrek, Grey Basi and Sarsenk in Duhok 26-Jul-2020 12:00 (GMT 3.00) 11-Jul-2020 UNICEF Request for proposal 9159660 Iraq -------------------------------------------------------------------------------- 106142 INVITACIÓN A COTIZAR PARA LA ADQUISICIÓN DE FERTILIZANTES, HERRAMIENTAS Y MATERIALES PARA ECA DE CACAO 21-Jul-2020 22:00 (GMT -5.00) 10-Jul-2020 FAO Request for quotation 2020/FLCOL/FLCOL/106142 Colombia -------------------------------------------------------------------------------- Achat de tablettes, de GPS et batteries rechargeable (206 tablettes, 68 GPS, et 181 pack chargeurs et batteries rechargeables) à livrer sur Dakar 28-Jul-2020 12:00 (GMT 0.00) 10-Jul-2020 FAO Invitation to bid 2020/FRSEN/FRSEN/106093 United Kingdom --------------------------------------------------------------------------------
EDIT: To get all pages, filter out only ‘Afghanistan’ country and save to CSV, you can use this example:
import csv import requests from bs4 import BeautifulSoup url = 'https://www.ungm.org/Public/Notice/Search' payload = { "PageIndex": 0, "PageSize": 15, "Title": "", "Description": "", "Reference": "", "PublishedFrom": "", "PublishedTo": "12-Jul-2020", "DeadlineFrom": "12-Jul-2020", "DeadlineTo": "", "Countries": [], "Agencies": [], "UNSPSCs": [], "NoticeTypes": [], "SortField": "DatePublished", "SortAscending": False, "isPicker": False, "NoticeTASStatus": [], "IsSustainable": False, "NoticeDisplayType": None, "NoticeSearchTotalLabelId": "noticeSearchTotal", "TypeOfCompetitions": [] } page, all_data = 0, [] while True: print('Page {}...'.format(page)) payload['PageIndex'] = page soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' ) rows = soup.select('.tableRow') if not rows: break for row in rows: cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')] print(cells[1]) print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:])) print('-'*80) # we are only interested in Afghanistan: if 'afghanistan' in cells[7].lower(): all_data.append([row['data-noticeid'], *cells[1:]]) page += 1 # write to csv file: with open('data.csv', 'w', newline='') as csvfile: csv_writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL) for row in all_data: csv_writer.writerow(row)
Saved data.csv
(screenshot from LibreOffice):