I am trying to scrape the h2 tag below from the apple page in the python 3.10.6 code further below. I can see the h2 tag on the page; but my python running on PyCharm 2022.1.4 is unable to scrape it. episode-shelf-header is a unique class in the html code on this page.
I did search for a solution to this but was unable to find one.
Can anyone help?
JavaScript
x
6
1
<div class="episode-shelf-header" id="{{@model.id}}-{{@shelf.id}}">
2
<h2 class="typ-headline-emph">
3
Season 1
4
</h2>
5
</div>
6
JavaScript
1
13
13
1
from selenium import webdriver
2
from selenium.webdriver.chrome.service import Service
3
from webdriver_manager.chrome import ChromeDriverManager
4
from bs4 import BeautifulSoup
5
6
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
7
driver.get('https://tv.apple.com/us/show/life-by-ella/umc.cmc.1suiyueh1ntwjtsstcwldofno?ctx_brand=tvs.sbd.4000')
8
9
pageSource = driver.page_source
10
soup = BeautifulSoup(pageSource, 'html.parser')
11
div = soup.find('div', attrs={'class': 'episode-shelf-header'})
12
h2 = div.find('h2', attrs={'class': 'typ-headline-emph'})
13
Advertisement
Answer
- Value can be extracted directly from Selenium.
- You must wait for the page to fully load.
There is a sample code to extract the final value.
JavaScript
1
13
13
1
from selenium import webdriver
2
from selenium.webdriver.chrome.service import Service
3
from selenium.webdriver.common.by import By
4
from selenium.webdriver.support.wait import WebDriverWait
5
from webdriver_manager.chrome import ChromeDriverManager
6
7
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
8
driver.get('https://tv.apple.com/us/show/life-by-ella/umc.cmc.1suiyueh1ntwjtsstcwldofno?ctx_brand=tvs.sbd.4000')
9
x_path = '//*[@id="{{@model.id}}-{{@shelf.id}}"]/h2'
10
element = WebDriverWait(driver, 10).until(lambda x: x.find_element(By.XPATH, x_path))
11
12
print(element.text)
13
note: selenium version:
selenium 4.3.0