Skip to content
Advertisement

Iterating over table of divs using BeautifulSoup

A div of class="tableBody" has many divs as children. I want to get all its div child and get the string which I have highlighted in this picture.

import bs4 as bs
import urllib.request
source = urllib.request.urlopen("https://www.ungm.org/Public/Notice").read()
soup = bs.BeautifulSoup(source,'lxml')

t_body = soup.find("div", class_="tableBody")
t_divs = t_body.find_all("div")

the above code returns me a empty list. enter image description here

I am trying to learn BS4. I appreciate it if you could help me with the code.

Advertisement

Answer

The data you see on the page is loaded dynamically via JavaScript. You can use requests module to simulate it.

For example:

import requests
from bs4 import BeautifulSoup


url = 'https://www.ungm.org/Public/Notice/Search'

payload = {
  "PageIndex": 0,
  "PageSize": 15,
  "Title": "",
  "Description": "",
  "Reference": "",
  "PublishedFrom": "",
  "PublishedTo": "12-Jul-2020",
  "DeadlineFrom": "12-Jul-2020",
  "DeadlineTo": "",
  "Countries": [],
  "Agencies": [],
  "UNSPSCs": [],
  "NoticeTypes": [],
  "SortField": "DatePublished",
  "SortAscending": False,
  "isPicker": False,
  "NoticeTASStatus": [],
  "IsSustainable": False,
  "NoticeDisplayType": None,
  "NoticeSearchTotalLabelId": "noticeSearchTotal",
  "TypeOfCompetitions": []
}

soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' )

for row in soup.select('.tableRow'):
    cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')]
    print(cells[1])
    print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:]))
    print('-'*80)

Prints:

Supply and delivery of 78 smartphones
13-Jul-2020 11:00 (GMT 2.00)  11-Jul-2020    FAO            Request for quotation    2020/FRMLW/FRMLW/106096                      Malawi         
--------------------------------------------------------------------------------
Supply of LEGUMES SEEDS for rainfed season
23-Jul-2020 14:00 (GMT 2.00)  11-Jul-2020    FAO            Invitation to bid        2020/FRMLW/FRMLW/106051                      Malawi         
--------------------------------------------------------------------------------
Supply of MAIZE SEEDS for rainfed season
22-Jul-2020 14:00 (GMT 2.00)  11-Jul-2020    FAO            Invitation to bid        2020/FRMLW/FRMLW/106050                      Malawi         
--------------------------------------------------------------------------------
Procurement of Supply and Installation of Outdoor Metal Furniture for Rooftop Terrace at FAO Headquarters in Rome, Italy
10-Aug-2020 12:00 (GMT 2.00)  11-Jul-2020    FAO            Invitation to bid        2020/CSAPC/CSDID/105286                      Italy          
--------------------------------------------------------------------------------
Procurement of Silo for Emergency Project
13-Jul-2020 13:00 (GMT 5.00)  11-Jul-2020    FAO            Invitation to bid        2020/FABGD/FABGD/106145                      Bangladesh     
--------------------------------------------------------------------------------
Procurement of Concentrate Ruminant Feed
13-Jul-2020 13:00 (GMT 5.00)  11-Jul-2020    FAO            Invitation to bid        2020/FABGD/FABGD/106064                      Bangladesh     
--------------------------------------------------------------------------------
Purchase of Waste Collection Vehicles - (Two Tractors)
22-Jul-2020 06:30 (GMT 0.00)  11-Jul-2020    UNOPS          Request for quotation    RFQ/2020/15298                               Sri Lanka      
--------------------------------------------------------------------------------
Procurement of Laboratory Equipment and Material
24-Jul-2020 22:23 (GMT -1.00) 11-Jul-2020    FAO            Invitation to bid        2020/FRGAM/FRGAM/106143                      Gambia         
--------------------------------------------------------------------------------
Compra de chalecos para promotores comunitarios para la Oficina de Unicef Bolivar - LRFQ-2020-9159352
16-Jul-2020 23:59 (GMT -3.00) 11-Jul-2020    UNICEF         Request for proposal     LRFQ-2020-9159352                            Venezuela      
--------------------------------------------------------------------------------
Call for Proposals Quality Based Fixed Budget (CFPFB):
26-Jul-2020 17:00 (GMT 3.00)  11-Jul-2020    UNDP           Request for proposal     UNDP-SYR-RPA-051-20                          Syrian Arab Republic
--------------------------------------------------------------------------------
Innovation and Design Specialist
27-Jul-2020 00:00 (GMT -5.00) 11-Jul-2020    UNDP           Not set                  Innovation and Design Specialist             Turkey         
--------------------------------------------------------------------------------
(RFI) from national and/or international CSOs/NGOs for potential partnership with UNDP and its pooled funding mechanism, the Darfur Community Peace and Stability Fund (DCPSF),
26-Jul-2020 08:00 (GMT -7.00) 11-Jul-2020    UNDP           Request for information  RFI-SDN-20-002                               Sudan          
--------------------------------------------------------------------------------
IRAQ-LRPS-017-2020-9159660 Rehabilitation of 3 water projects at Avrek, Grey Basi and Sarsenk in Duhok
26-Jul-2020 12:00 (GMT 3.00)  11-Jul-2020    UNICEF         Request for proposal     9159660                                      Iraq           
--------------------------------------------------------------------------------
106142 INVITACIÓN A COTIZAR PARA LA ADQUISICIÓN DE FERTILIZANTES, HERRAMIENTAS Y MATERIALES PARA ECA DE CACAO
21-Jul-2020 22:00 (GMT -5.00) 10-Jul-2020    FAO            Request for quotation    2020/FLCOL/FLCOL/106142                      Colombia       
--------------------------------------------------------------------------------
Achat de tablettes, de GPS et batteries rechargeable (206 tablettes, 68 GPS, et 181 pack chargeurs et batteries rechargeables) à livrer sur  Dakar
28-Jul-2020 12:00 (GMT 0.00)  10-Jul-2020    FAO            Invitation to bid        2020/FRSEN/FRSEN/106093                      United Kingdom 
--------------------------------------------------------------------------------


EDIT: To get all pages, filter out only ‘Afghanistan’ country and save to CSV, you can use this example:

import csv
import requests
from bs4 import BeautifulSoup


url = 'https://www.ungm.org/Public/Notice/Search'

payload = {
  "PageIndex": 0,
  "PageSize": 15,
  "Title": "",
  "Description": "",
  "Reference": "",
  "PublishedFrom": "",
  "PublishedTo": "12-Jul-2020",
  "DeadlineFrom": "12-Jul-2020",
  "DeadlineTo": "",
  "Countries": [],
  "Agencies": [],
  "UNSPSCs": [],
  "NoticeTypes": [],
  "SortField": "DatePublished",
  "SortAscending": False,
  "isPicker": False,
  "NoticeTASStatus": [],
  "IsSustainable": False,
  "NoticeDisplayType": None,
  "NoticeSearchTotalLabelId": "noticeSearchTotal",
  "TypeOfCompetitions": []
}

page, all_data = 0, []
while True:
    print('Page {}...'.format(page))

    payload['PageIndex'] = page
    soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' )
    rows = soup.select('.tableRow')
    if not rows:
        break

    for row in rows:
        cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')]
        print(cells[1])
        print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:]))
        print('-'*80)

        # we are only interested in Afghanistan:
        if 'afghanistan' in cells[7].lower():
            all_data.append([row['data-noticeid'], *cells[1:]])

    page += 1

# write to csv file:
with open('data.csv', 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    for row in all_data:
        csv_writer.writerow(row)

Saved data.csv (screenshot from LibreOffice):

enter image description here

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement