Skip to content
Advertisement

Download “401 Unauthorized” video with selenium

I’m trying to create a bot that will download videos from this site named “Sdarot” using selenium and python3.

Each video (or episode) in the site has a unique page and URL. When you load an episode, you have to wait 30 seconds for the episode to “load”, and only then the <video> tag appears in the HTML source file.

The problem is that the request for the video is encrypted or secured in one way or another (I don’t really understand how it works)! When I try to simply wait for the video tag to appear and then download the video with the urllib library (see code below), I get the following error: urllib.error.HTTPError: HTTP Error 401: Unauthorized

I should note that when I try to open the link of the download video in the selenium driver, it opens completely fine and I can download it manually.

How can I download the videos automatically? Thanks in advance!

Code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

import urllib.request


def load(driver, url):

    driver.get(url)  # open the page in the browser

    try:
        # wait for the episode to "load"
        # if something is wrong and the episode doesn't load after 45 seconds,
        # the function will call itself again and try to load again.
        continue_btn = WebDriverWait(driver, 45).until(
            EC.element_to_be_clickable((By.ID, "proceed"))
        )
    except:
        load(url)


def save_video(driver, filename):

    video_element = driver.find_element_by_tag_name(
        "video")  # get the video element
    video_url = video_element.get_property('src')  # get the video url
    # trying to download the video
    urllib.request.urlretrieve(video_url, filename)
    # ERROR: "urllib.error.HTTPError: HTTP Error 401: Unauthorized"


def main():

    URL = r'https://www.sdarot.dev/watch/339-%D7%94%D7%A4%D7%99%D7%92-%D7%9E%D7%95%D7%AA-ha-pijamot/season/1/episode/23'

    DRIVER = webdriver.Chrome()
    load(DRIVER, URL)
    video_url = save_video(DRIVER, "video.mp4")


if __name__ == "__main__":
    main()

Advertisement

Answer

You are getting unauthorised error because they are using cookies to store some information related to your session. Specifically cookie named Sdarot. I have used requests library to download and save the video.

Main point is when you open the url using selenium it works fine because selenium is using the same http client (the browser) which already has cookie details available, but when you call using urllib basically its different http client so its a new request for the server. To overcome this you will have to behave like the browser by providing enough session information, in this case maintained by cookies.

Check how I have extracted value of Sdarot cookie and applied it in requests.get method. You can do this using urllib also.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests


def load(driver, url):

    driver.get(url)  # open the page in the browser

    try:
        # wait for the episode to "load"
        # if something is wrong and the episode doesn't load after 45 seconds,
        # the function will call itself again and try to load again.
        continue_btn = WebDriverWait(driver, 45).until(
            EC.element_to_be_clickable((By.ID, "proceed"))
        )
        continue_btn.click()
    except:
        load(driver,url) #corrected parameter error


def save_video(driver, filename):

    video_element = driver.find_element_by_tag_name(
        "video")  # get the video element
    video_url = video_element.get_property('src')  # get the video url

    cookies = driver.get_cookies()
    #iterate all the cookies and extract cookie value named Sdarot
    for entry in cookies:
        if(entry["name"] == 'Sdarot'):
            cookies = dict({entry["name"]:entry["value"]})
            #set request with proper cookies 
            r = requests.get(video_url, cookies=cookies,stream = True) 

            # start download 
            with open(filename, 'wb') as f: 
                for chunk in r.iter_content(chunk_size = 1024*1024): 
                    if chunk: 
                        f.write(chunk) 
                    
def main():

    URL = r'https://www.sdarot.dev/watch/339-%D7%94%D7%A4%D7%99%D7%92-%D7%9E%D7%95%D7%AA-ha-pijamot/season/1/episode/23'

    DRIVER = webdriver.Chrome()
    load(DRIVER, URL)
    video_url = save_video(DRIVER, "video.mp4")


if __name__ == "__main__":
    main()
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement