Skip to content
Advertisement

How to click on a button on a webpage and iterate through contents after clicking on button using python selenium

I am using Python Selenium to web scrape from https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL but I want to scrape the Quarterly data instead of the Annual after clicking on the “Quarterly” button on the top right. This is my code so far:

def readQuarterlyBSData(ticker):
    url = 'https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL'
    options = Options()
    options.add_argument('--headless')
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get(url)
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button'))).click()
    soup = BeautifulSoup(driver.page_source, 'lxml')
    ls= [] 
    # Trying to iterate through each div after clicking on the Quarterly button but content is still Annual Data
    for element in soup.find_all('div'): 
       ls.append(element.string) # add each element one by one to the list

I am able to get the button to click but when I iterate through the divs, I am still getting content that is from Annual data and not Quarterly data. Can someone show me how I can iterate through Quarterly data?

Advertisement

Answer

soup = BeautifulSoup(driver.page_source, 'lxml')

You don’t need to pass your driver.page_source to BS4, use Selenium itself to extract the data using driver.find_element function.

Here is the doc on that: https://selenium-python.readthedocs.io/locating-elements.html

Also, you are not waiting for the page source to be updated, so add a time delay after the click. You are just waiting for the button to appear, what happens after that? You immediately pass the page source that hasn’t been updated after the click. So wait,

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button'))).click()
time.sleep(10) # wait and see
soup = BeautifulSoup(driver.page_source, 'lxml')

Hope it helps :)

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement