I have some simple selenium scraping code that returns all the search results, but when I run the for loop, it displays an error: Message: invalid argument: ‘url’ must be a string
(Session info: chrome=93.0.4577.82)
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC chrome_path = r'C:Windowschromedriver.exe' driver = webdriver.Chrome(chrome_path) driver.get("https://www.youtube.com/results?search_query=python+course") user_data = driver.find_elements_by_xpath('//*[@id="video-title"]') links = [] for i in user_data: links.append(i.get_attribute('href')) print(links) wait = WebDriverWait(driver, 10) for x in links: driver.get(x) v_id = x.strip('https://www.youtube.com/watch?v=') #//*[@id="video-title"]/yt-formatted-string v_title = wait.until(EC.presence_of_element_located( (By.CSS_SELECTOR,"h1.title yt-formatted-string"))).text
I would like to ask for some help. How to avoid this error? Thanks.
Advertisement
Answer
You are trying to get the “user_data”
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
immediately after opening the YouTube url
driver.get("https://www.youtube.com/results?search_query=python+course")
This causes “user_data” to be an empty list.
This is why when you trying to iterate over “links” with
for x in links:
to iterate over single “x” value of “NoneType” object, not a string.
To fix this you should add a wait/ delay between
driver.get("https://www.youtube.com/results?search_query=python+course")
and
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
The simplest way to do that is to add a delay there, like this:
driver.get("https://www.youtube.com/results?search_query=python+course") time.sleep(8) user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
However the recommended approach is to use explicit wait implemented by expected conditions, like this:
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC chrome_path = r'C:Windowschromedriver.exe' driver = webdriver.Chrome(chrome_path) wait = WebDriverWait(driver, 20) driver.get("https://www.youtube.com/results?search_query=python+course") wait.until(EC.visibility_of_element_located((By.XPATH, "//*[@id="video-title"]"))) #adding some more pause to make all the videos loaded time.sleep(0.5) user_data = driver.find_elements_by_xpath('//*[@id="video-title"]') links = [] for i in user_data: links.append(i.get_attribute('href')) print(links) for x in links: driver.get(x) v_id = x.strip('https://www.youtube.com/watch?v=') #//*[@id="video-title"]/yt-formatted-string v_title = wait.until(EC.visibility_of_element_located( (By.CSS_SELECTOR,"h1.title yt-formatted-string"))).text
Also, you should use visibility_of_element_located
instead of presence_of_element_located
since presence_of_element_located
waits only for element initial presence, element state while it’s content like texts etc. may still not be ready.