I am trying to scrape the h2 tag below from the apple page in the python 3.10.6 code further below. I can see the h2 tag on the page; but my python running on PyCharm 2022.1.4 is unable to scrape it. episode-shelf-header is a unique class in the html code on this page.
I did search for a solution to this but was unable to find one.
Can anyone help?
<div class="episode-shelf-header" id="{{@model.id}}-{{@shelf.id}}"> <h2 class="typ-headline-emph"> Season 1 </h2> </div>
from selenium import webdriver from selenium.webdriver.chrome.service import Service from webdriver_manager.chrome import ChromeDriverManager from bs4 import BeautifulSoup driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) driver.get('https://tv.apple.com/us/show/life-by-ella/umc.cmc.1suiyueh1ntwjtsstcwldofno?ctx_brand=tvs.sbd.4000') pageSource = driver.page_source soup = BeautifulSoup(pageSource, 'html.parser') div = soup.find('div', attrs={'class': 'episode-shelf-header'}) h2 = div.find('h2', attrs={'class': 'typ-headline-emph'})
Advertisement
Answer
- Value can be extracted directly from Selenium.
- You must wait for the page to fully load.
There is a sample code to extract the final value.
from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By from selenium.webdriver.support.wait import WebDriverWait from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) driver.get('https://tv.apple.com/us/show/life-by-ella/umc.cmc.1suiyueh1ntwjtsstcwldofno?ctx_brand=tvs.sbd.4000') x_path = '//*[@id="{{@model.id}}-{{@shelf.id}}"]/h2' element = WebDriverWait(driver, 10).until(lambda x: x.find_element(By.XPATH, x_path)) print(element.text)
note: selenium version:
selenium 4.3.0