I am using Python Selenium to web scrape from https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL but I want to scrape the Quarterly data instead of the Annual after clicking on the “Quarterly” button on the top right. This is my code so far:
def readQuarterlyBSData(ticker): url = 'https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL' options = Options() options.add_argument('--headless') driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options) driver.get(url) WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button'))).click() soup = BeautifulSoup(driver.page_source, 'lxml') ls= [] # Trying to iterate through each div after clicking on the Quarterly button but content is still Annual Data for element in soup.find_all('div'): ls.append(element.string) # add each element one by one to the list
I am able to get the button to click but when I iterate through the divs, I am still getting content that is from Annual data and not Quarterly data. Can someone show me how I can iterate through Quarterly data?
Advertisement
Answer
soup = BeautifulSoup(driver.page_source, 'lxml')
You don’t need to pass your driver.page_source
to BS4, use Selenium itself to extract the data using driver.find_element
function.
Here is the doc on that: https://selenium-python.readthedocs.io/locating-elements.html
Also, you are not waiting for the page source to be updated, so add a time delay after the click. You are just waiting for the button to appear, what happens after that? You immediately pass the page source that hasn’t been updated after the click. So wait,
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button'))).click() time.sleep(10) # wait and see soup = BeautifulSoup(driver.page_source, 'lxml')
Hope it helps :)