Skip to content
Advertisement

Selenium bug: Message: invalid argument: ‘url’ must be a string

I have some simple selenium scraping code that returns all the search results, but when I run the for loop, it displays an error: Message: invalid argument: ‘url’ must be a string

(Session info: chrome=93.0.4577.82)

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

chrome_path = r'C:Windowschromedriver.exe'
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.youtube.com/results?search_query=python+course")
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
links = []
for i in user_data:
    links.append(i.get_attribute('href'))
print(links)

wait = WebDriverWait(driver, 10)
for x in links:
    driver.get(x)
    v_id = x.strip('https://www.youtube.com/watch?v=')
    #//*[@id="video-title"]/yt-formatted-string
    v_title = wait.until(EC.presence_of_element_located(
                           (By.CSS_SELECTOR,"h1.title yt-formatted-string"))).text

I would like to ask for some help. How to avoid this error? Thanks.

Advertisement

Answer

You are trying to get the “user_data”

user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')

immediately after opening the YouTube url

driver.get("https://www.youtube.com/results?search_query=python+course")

This causes “user_data” to be an empty list.
This is why when you trying to iterate over “links” with

for x in links:

to iterate over single “x” value of “NoneType” object, not a string.
To fix this you should add a wait/ delay between

driver.get("https://www.youtube.com/results?search_query=python+course")

and

user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')

The simplest way to do that is to add a delay there, like this:

driver.get("https://www.youtube.com/results?search_query=python+course")
time.sleep(8)
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')

However the recommended approach is to use explicit wait implemented by expected conditions, like this:

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

chrome_path = r'C:Windowschromedriver.exe'
driver = webdriver.Chrome(chrome_path)
wait = WebDriverWait(driver, 20)
driver.get("https://www.youtube.com/results?search_query=python+course")
wait.until(EC.visibility_of_element_located((By.XPATH, "//*[@id="video-title"]")))
#adding some more pause to make all the videos loaded
time.sleep(0.5)
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
links = []
for i in user_data:
    links.append(i.get_attribute('href'))
print(links)

for x in links:
    driver.get(x)
    v_id = x.strip('https://www.youtube.com/watch?v=')
    #//*[@id="video-title"]/yt-formatted-string
    v_title = wait.until(EC.visibility_of_element_located(
                           (By.CSS_SELECTOR,"h1.title yt-formatted-string"))).text

Also, you should use visibility_of_element_located instead of presence_of_element_located since presence_of_element_located waits only for element initial presence, element state while it’s content like texts etc. may still not be ready.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement