I am trying to grab the src attribute from the video tag from this webpage. This shows where I see the video tag when I am inspecting the image. The XPath for the tag in safari is “//*[@id=”player”]/div[2]/div[4]/video”
This is my code:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.by import By import os os.environ["SELENIUM_SERVER_JAR"] = "selenium-server-standalone-2.41.0.jar" browser = webdriver.Safari() browser.get("https://mplayer.me/default.php?id=MTc3ODc3") print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.TAG_NAME,"video"))).get_attribute("src")) browser.quit()
Using .text instead og .get_Attribute also returns an empty string. I have to use safari and not chrome to get the src link because chrome uses a blob storage design due to which scraping via chrome shows “blob:https://mplayer.me/d420cb30-ed6e-4772-b169-ed33a5d3ee9f” instead of “https://wwwx18.gogocdn.stream/videos/hls/6CjH7KUeu18L4Y7ls0ohCw/1668685924/177877/81aa0af3891f4ef11da3f67f0d43ade6/ep.1.1657688313.m3u8” which is the link I want to get.
Advertisement
Answer
You can get a link to m3u8
file in Chrome from logs using Desired Capabilities
Here is one of the possible solutions to do this:
import json from selenium import webdriver from selenium.webdriver import DesiredCapabilities from selenium.webdriver.chrome.service import Service options = webdriver.ChromeOptions() options.add_argument('--headless') capabilities = DesiredCapabilities.CHROME capabilities["goog:loggingPrefs"] = {"performance": "ALL"} options.add_experimental_option("excludeSwitches", ["enable-automation", "enable-logging"]) service = Service(executable_path="path/to/your/chromedriver.exe") driver = webdriver.Chrome(service=service, options=options, desired_capabilities=capabilities) driver.get('https://mplayer.me/default.php?id=MTc3ODc3') logs = driver.get_log('performance') for log in logs: data = json.loads(log['message'])['message']['params'].get('request') if data and data['url'].endswith('.m3u8'): print(data['url']) driver.quit()
Output:
https://wwwx18.gogocdn.stream/videos/hls/myv1spZ0483oSfvbo4bcbQ/1668706324/177877/81aa0af3891f4ef11da3f67f0d43ade6/ep.1.1657688313.m3u8
Tested on Win 10
, Python 3.9.10
, Selenium 4.5.0