Web-scraping return empty values: possible protected site

Question

I&#8217;m working with web-scraping from www.albumoftheyear.org, but in my code I can only get an empty df. I don&#8217;t know if the site is protected with some cloudflare and if this is a cause or I&#8217;m making a mistake with the selected tags. The basic idea is to iterate through the pages and collect t…

Accepted Answer

A working solution using selenium. Note you need to have the webdriver for your browser on your system. I am using Chrome and the chromedriver can be gotten from here. Yes you need both the browser and the driver.import pandas as pdfrom selenium import webdriverfrom selenium.webdriver.common.keys import Keysfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.common.exceptions import NoSuchElementExceptiondriver = webdriver.Chrome(executable_path=r'C:**YOUR PATH**chromedriver.exe')driver.get(r"https://www.albumoftheyear.org/list/1500-rolling-stones-500-greatest-albums-of-all-time-2020/{}")title_list = []date_list  = []genre_list = []try:    element = WebDriverWait(driver, 10).until(        EC.presence_of_element_located((By.ID, "centerContent"))    )    albumlistrow = element.find_elements_by_class_name('albumListRow')    for a in albumlistrow:        title = a.find_element_by_class_name('albumListTitle')        date = a.find_element_by_class_name('albumListDate')        try:            genre = a.find_element_by_class_name('albumListGenre')        except NoSuchElementException:            pass        title_list.append(title.text)        date_list.append(date.text)        genre_list.append(genre.text)finally:    driver.close()df = pd.DataFrame(list(zip(title_list,date_list,genre_list)), columns=['title', 'data','genre'])df.head()output    title                                               data                genre0   500. Arcade Fire - Funeral                          September 14, 2004  Indie Rock1   499. Rufus & Chaka Khan - Ask Rufus                 January 19, 1977    Soul2   498. Suicide - Suicide                              December 28, 1977   Synth Punk3   497. Various Artists - The Indestructible Beat...   January 1, 1985     Synth Punk4   496. Shakira - Dónde Están los Ladrones?            September 29, 1998  Pop RockIf you do not want the albumListRank change this line fromtitle_list.append(title.text)totitle_list.append(title.text[4:])

Advertisement

Answer