I am trying to scrape this website, and this is my code thus far:
import click from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from bs4 import BeautifulSoup table_rows = [] url = 'https://www.iagco.agco.ca/prod/pub/en/Default.aspx?PossePresentation=PublicNoticeSearch' driver = webdriver.Chrome('/Applications/Python 3.9/chromedriver') driver.get(url) driver.find_element_by_xpath("/html/body/div[1]/form/div[3]/div[2]/div/div/div/div[5]/div/table/tbody/tr[5]/td[3]/span/select/option[2]").click() driver.implicitly_wait(1) driver.find_element_by_xpath("/html/body/div[1]/form/div[3]/div[2]/div/div/div/div[6]/div/table/tbody/tr/td/div/a").click() driver.implicitly_wait(1) soup = BeautifulSoup(driver.page_source, 'lxml') tables = soup.find_all('table') driver.implicitly_wait(2) for table in driver.find_elements_by_xpath('/html/body/div[1]/form/div[3]/div[2]/div/div/div/div[5]/div/table/tbody/tr/td'): data = [item.text for item in table.find_elements_by_xpath('/html/body/div[1]/form/div[3]/div[2]/div/div/div/div[5]/div/table/tbody/tr/td')] print(data)
What prints out is a long list, and I am trying to figure out how to get it into a table format.
['City Premises Deadline for Objections / Submissions File Number Application Type AreasnBRAMPTON Tweedn10010 MCLAUGHLIN RD NnBRAMPTON, ON L7A2X6 2021-09-24 1248893 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnBRAMPTON Tweedn1990 STEELES AVE WnBRAMPTON, ON L6Y0R4 2021-09-24 1250690 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnCAMPBELLVILLE Welcome Cannabis Campbellvillen6 MAIN ST NnCAMPBELLVILLE, ON L0P1B0 2021-09-29 1272273 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnCANNINGTON Bud Runners Cannabisn17 CAMERON ST W.nCANNINGTON, ON L0E 1E0 2021-09-23 1271708 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnETOBICOKE Cannabis 151n188 THE QUEENSWAYnETOBICOKE, ON M8Y1J3 2021-09-20 1157846 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnETOBICOKE Fire & Flower Cannabis Co.n764 THE QUEENSWAYnETOBICOKE, ON M8Z0E8 2021-09-30 1211417 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnGUELPH Tweedn138 COLLEGE AVE W UNIT AnGUELPH, ON N1G1S4 2021-09-28 1267278 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnHAMILTON HARVEST CANNABIS COn318 QUEENSTON RD UNIT GnHAMILTON, ON L8K1H5 2021-09-23 1283383 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnHAMILTON Lady Leafn372 KING ST E SUITE 101nHAMILTON, ON L8N1C3 2021-09-25 1188839 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnHAVELOCK Sunfish Cannabis Trainstationn30 ONTARIO STREET, EAST UNITnHAVELOCK, ON K0L1Z0 2021-10-02 1285465 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnKITCHENER THE CANNABIST SHOP – KING En325 KING ST EnKITCHENER, ON N2G2L2 2021-10-04 1297162 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnMORRISBURG Morrisburg cannabisn137 MAIN STREETnMORRISBURG, ON K0C1X0 2021-10-01 1196780 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnMORRISBURG, SOUTH DUNDAS The Oz Storen147 MAIN ST., UNIT #2nMORRISBURG, SOUTH DUNDAS, ON K0C 1X0 2021-09-30 1190679 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnMORRISTON Welcome Cannabisn3 BADENOCH STnMORRISTON, ON N0B 2C0 2021-10-01 1256018 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnNORTH YORK Taste Buds Cannabisn1193 LAWRENCE AVE WnNORTH YORK, ON M6A1E2 2021-09-25 1266788 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnOTTAWA BlueBird Cannabis Con27 YORK STnOTTAWA, ON K1N5S7 2021-10-03 1221838 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnOTTAWA Planet Earth Cannabisn1666 BANK ST SUITE 600nOTTAWA, ON K1V7Y6 2021-09-27 1215970 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnSTITTSVILLE SHINYBUD CANNABIS CO. STITTSVILLEn1261 MAIN ST UNIT 2nSTITTSVILLE, ON K2S2E4 2021-10-01 1167849 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnTORONTO BLACKSTAR CANNABIS SHUTERn985 DOVERCOURT RDnTORONTO, ON M6H2X6 2021-09-30 1277446 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnTORONTO Canna North Cannabis Storen117 YONGE STnTORONTO, ON M5C1W4 2021-10-02 1001902 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnTORONTO Olive Jarn554 ANNETTE STnTORONTO, ON M6S2C2 2021-10-02 1196618 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnWATERLOO The Cannabist Shop - Bridgeport Wn10 BRIDGEPORT RD WnWATERLOO, ON N2L2Y1 2021-10-04 1294285 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnWELLINGTON The Community Storen186 MAIN STREET UNIT 3nWELLINGTON, ON K0K 3L0 2021-09-23 1252887 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / SubmissionnWINCHESTER BlueBird Cannabis Co - Winchestern507 ST LAWRENCE ST LOWER LEVELnWINCHESTER, ON K0C2K0 2021-10-01 1299098 New ApplicationnCannabis Retail Store Authorization Indoor Area File Objection / Submission']
Advertisement
Answer
I’m getting the following output as a table format.
Code:
import click import time from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from bs4 import BeautifulSoup import pandas as pd data = [] driver = webdriver.Chrome('chromedriver.exe') driver.maximize_window() url = 'https://www.iagco.agco.ca/prod/pub/en/Default.aspx?PossePresentation=PublicNoticeSearch' driver.get(url) time.sleep(8) driver.find_element_by_xpath( "/html/body/div[1]/form/div[3]/div[2]/div/div/div/div[5]/div/table/tbody/tr[5]/td[3]/span/select/option[2]").click() driver.implicitly_wait(1) driver.find_element_by_xpath( "/html/body/div[1]/form/div[3]/div[2]/div/div/div/div[6]/div/table/tbody/tr/td/div/a").click() driver.implicitly_wait(1) soup = BeautifulSoup(driver.page_source, 'lxml') t = soup.find('table', class_='possegrid') # Get all the rows from the table trs = t.select('tr') for tr in trs: data.append(tr.stripped_strings) df = pd.DataFrame(data) print(df)
Output:
0 City Premises ... Indoor Area File Objection / Submission 1 City Premises ... None None 2 BRAMPTON Tweed ... None None 3 BRAMPTON Tweed ... None None 4 CAMPBELLVILLE Welcome Cannabis Campbellville ... None None 5 CANNINGTON Bud Runners Cannabis ... None None 6 ETOBICOKE Cannabis 151 ... None None 7 ETOBICOKE Fire & Flower Cannabis Co. ... None None 8 GUELPH Tweed ... None None 9 HAMILTON HARVEST CANNABIS CO ... None None 10 HAMILTON Lady Leaf ... None None 11 HAVELOCK Sunfish Cannabis Trainstation ... None None 12 KITCHENER THE CANNABIST SHOP – KING E ... None None 13 MORRISBURG Morrisburg cannabis ... None None 14 MORRISBURG, SOUTH DUNDAS The Oz Store ... None None 15 MORRISTON Welcome Cannabis ... None None 16 NORTH YORK Taste Buds Cannabis ... None None 17 OTTAWA BlueBird Cannabis Co ... None None 18 OTTAWA Planet Earth Cannabis ... None None 19 STITTSVILLE SHINYBUD CANNABIS CO. STITTSVILLE ... None None 20 TORONTO BLACKSTAR CANNABIS SHUTER ... None None 21 TORONTO Canna North Cannabis Store ... None None 22 TORONTO Olive Jar ... None None 23 WATERLOO The Cannabist Shop - Bridgeport W ... None None 24 WELLINGTON The Community Store ... None None 25 WINCHESTER BlueBird Cannabis Co - Winchester ... None None [26 rows x 246 columns]