Download “401 Unauthorized” video with selenium

Question

I'm trying to create a bot that will download videos from this site named "Sdarot" using selenium and python3. Each video (or episode) in the site has a unique page and URL. When you load an episode, you have to wait 30 seconds for the episode to "load", and only then the <video> tag appears in the HTML source file.

Accepted Answer

You are getting unauthorised error because they are using cookies to store some information related to your session. Specifically cookie named Sdarot. I have used requests library to download and save the video.Main point is when you open the url using selenium it works fine because selenium is using the same http client (the browser) which already has cookie details available, but when you call using urllib basically its different http client so its a new request for the server. To overcome this you will have to behave like the browser by providing enough session information, in this case maintained by cookies.Check how I have extracted value of Sdarot cookie and applied it in requests.get method. You can do this using urllib also.from selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECimport requestsdef load(driver, url):    driver.get(url)  # open the page in the browser    try:        # wait for the episode to "load"        # if something is wrong and the episode doesn't load after 45 seconds,        # the function will call itself again and try to load again.        continue_btn = WebDriverWait(driver, 45).until(            EC.element_to_be_clickable((By.ID, "proceed"))        )        continue_btn.click()    except:        load(driver,url) #corrected parameter errordef save_video(driver, filename):    video_element = driver.find_element_by_tag_name(        "video")  # get the video element    video_url = video_element.get_property('src')  # get the video url    cookies = driver.get_cookies()    #iterate all the cookies and extract cookie value named Sdarot    for entry in cookies:        if(entry["name"] == 'Sdarot'):            cookies = dict({entry["name"]:entry["value"]})            #set request with proper cookies             r = requests.get(video_url, cookies=cookies,stream = True)             # start download             with open(filename, 'wb') as f:                 for chunk in r.iter_content(chunk_size = 1024*1024):                     if chunk:                         f.write(chunk)                     def main():    URL = r'https://www.sdarot.dev/watch/339-%D7%94%D7%A4%D7%99%D7%92-%D7%9E%D7%95%D7%AA-ha-pijamot/season/1/episode/23'    DRIVER = webdriver.Chrome()    load(DRIVER, URL)    video_url = save_video(DRIVER, "video.mp4")if __name__ == "__main__":    main()

Advertisement

Answer