I am trying to scrape youtube videos from a channel by doing the following code below however, it seems that my element_titles don’t have a href attribute. This worked about a year ago and I am unsure why it doesn’t work now? Did youtube change the way we can get href?
#Scrape for videos # WARNING: Takes very long HOME = "https://www.youtube.com/user/theneedledrop/videos" driver = webdriver.Chrome("C:webdriverchromedriver.exe") driver.get(HOME) scroll() element_titles = driver.find_elements(By.ID,"video-title")
The following attribtues are what is found in the WebDriver objects
> element_titles[0].get_property('attributes')[0] {'ATTRIBUTE_NODE': 2, 'CDATA_SECTION_NODE': 4, 'COMMENT_NODE': 8, 'DOCUMENT_FRAGMENT_NODE': 11, 'DOCUMENT_NODE': 9, 'DOCUMENT_POSITION_CONTAINED_BY': 16, 'DOCUMENT_POSITION_CONTAINS': 8, 'DOCUMENT_POSITION_DISCONNECTED': 1, 'DOCUMENT_POSITION_FOLLOWING': 4, 'DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC': 32, 'DOCUMENT_POSITION_PRECEDING': 2, 'DOCUMENT_TYPE_NODE': 10, 'ELEMENT_NODE': 1, 'ENTITY_NODE': 6, 'ENTITY_REFERENCE_NODE': 5, 'NOTATION_NODE': 12, 'PROCESSING_INSTRUCTION_NODE': 7, 'TEXT_NODE': 3, '__shady_addEventListener': {}, '__shady_appendChild': {}, '__shady_childNodes': [], '__shady_cloneNode': {}, '__shady_contains': {}, '__shady_dispatchEvent': {}, '__shady_firstChild': None, '__shady_getRootNode': {}, '__shady_insertBefore': {}, '__shady_isConnected': False, '__shady_lastChild': None, '__shady_native_addEventListener': {}, '__shady_native_appendChild': {}, '__shady_native_childNodes': [], '__shady_native_cloneNode': {}, '__shady_native_contains': {}, '__shady_native_dispatchEvent': {}, '__shady_native_firstChild': None, '__shady_native_insertBefore': {}, '__shady_native_lastChild': None, '__shady_native_nextSibling': None, '__shady_native_parentElement': None, '__shady_native_parentNode': None, '__shady_native_previousSibling': None, '__shady_native_removeChild': {}, '__shady_native_removeEventListener': {}, '__shady_native_replaceChild': {}, '__shady_native_textContent': 'video-title', '__shady_nextSibling': None, '__shady_parentElement': None, '__shady_parentNode': None, '__shady_previousSibling': None, '__shady_removeChild': {}, '__shady_removeEventListener': {}, '__shady_replaceChild': {}, '__shady_textContent': 'video-title', 'addEventListener': {}, 'appendChild': {}, 'baseURI': 'https://www.youtube.com/user/theneedledrop/videos', 'childNodes': [], 'cloneNode': {}, 'compareDocumentPosition': {}, 'contains': {}, 'dispatchEvent': {}, 'firstChild': None, 'getRootNode': {}, 'hasChildNodes': {}, 'insertBefore': {}, 'isConnected': False, 'isDefaultNamespace': {}, 'isEqualNode': {}, 'isSameNode': {}, 'lastChild': None, 'localName': 'id', 'lookupNamespaceURI': {}, 'lookupPrefix': {}, 'name': 'id', 'namespaceURI': None, 'nextSibling': None, 'nodeName': 'id', 'nodeType': 2, 'nodeValue': 'video-title', 'normalize': {}, 'ownerDocument': <selenium.webdriver.remote.webelement.WebElement (session="906f0b2a91a96de78811a8b48c702ce9", element="4105d26d-55b3-49a1-b657-10bbbbf43c84")>, 'ownerElement': <selenium.webdriver.remote.webelement.WebElement (session="906f0b2a91a96de78811a8b48c702ce9", element="c0d38452-435c-489a-8cb8-858adc4828b9")>, 'parentElement': None, 'parentNode': None, 'prefix': None, 'previousSibling': None, 'removeChild': {}, 'removeEventListener': {}, 'replaceChild': {}, 'specified': True, 'textContent': 'video-title', 'value': 'video-title'}
I have tried exploring the web pages on youtube videos for the href however I am unable to find them
Advertisement
Answer
The below full working code will pull the required data here all the video links smoothly.
Example:
from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By import time import pandas as pd from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = webdriver.ChromeOptions() #All are optional #options.add_experimental_option("detach", True) options.add_argument("--disable-extensions") options.add_argument("--disable-notifications") options.add_argument("--disable-Advertisement") options.add_argument("--disable-popup-blocking") options.add_argument("start-maximized") s=Service('./chromedriver') driver= webdriver.Chrome(service=s,options=options) driver.get('https://www.youtube.com/user/theneedledrop/videos') time.sleep(3) item = [] SCROLL_PAUSE_TIME = 1 last_height = driver.execute_script("return document.documentElement.scrollHeight") item_count = 100 while item_count > len(item): driver.execute_script("window.scrollTo(0,document.documentElement.scrollHeight);") time.sleep(SCROLL_PAUSE_TIME) new_height = driver.execute_script("return document.documentElement.scrollHeight") if new_height == last_height: break last_height = new_height data = [] try: for e in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div#details'))): vurl = e.find_element(By.CSS_SELECTOR,'a#video-title-link').get_attribute('href') data.append({ 'video_url':vurl, }) except: pass item = data #print(item) #print(len(item)) df = pd.DataFrame(item).drop_duplicates() print(df.to_markdown())
Output:
| video_url | |----:|:--------------------------------------------| | 0 | https://www.youtube.com/watch?v=UZcSkasvj5c | | 1 | https://www.youtube.com/watch?v=9c8AXKAnp_E | | 2 | https://www.youtube.com/watch?v=KaLUHF7nQic | | 3 | https://www.youtube.com/watch?v=rxb2L0Bgp3U | | 4 | https://www.youtube.com/watch?v=z3L1wXvMN0Q | | 5 | https://www.youtube.com/watch?v=q7vqR74WVYc | | 6 | https://www.youtube.com/watch?v=Kb31OTOYYG8 | | 7 | https://www.youtube.com/watch?v=F-CaQbxwMZ0 | | 8 | https://www.youtube.com/watch?v=AWDWTyC0jls | | 9 | https://www.youtube.com/watch?v=LXWbnTgxeT4 | | 10 | https://www.youtube.com/watch?v=5KlHjDnefYQ | | 11 | https://www.youtube.com/watch?v=yfq8rdBcAMg | | 12 | https://www.youtube.com/watch?v=lATG1JBzVIU | | 13 | https://www.youtube.com/watch?v=SNmZfHDOHQw | | 14 | https://www.youtube.com/watch?v=IsQBbO_4EQI | | 15 | https://www.youtube.com/watch?v=wcSyXUOM63g | | 16 | https://www.youtube.com/watch?v=5hIaJZ9M8ZI | | 17 | https://www.youtube.com/watch?v=ikryWQEHsCE | | 18 | https://www.youtube.com/watch?v=5ARVgrao6E0 | | 19 | https://www.youtube.com/watch?v=_1q6-POT8sY | | 20 | https://www.youtube.com/watch?v=ycyxm3rgQG0 | | 21 | https://www.youtube.com/watch?v=InirkRGnC2w | | 22 | https://www.youtube.com/watch?v=nrvq5lY9oy0 | | 23 | https://www.youtube.com/watch?v=M1yGh3D_KI8 | | 24 | https://www.youtube.com/watch?v=Yn_4mtMYyXU | | 25 | https://www.youtube.com/watch?v=8vmm8x_Cq4s | | 26 | https://www.youtube.com/watch?v=Zfyojbr-cEQ | | 27 | https://www.youtube.com/watch?v=NqrVX-WOrc0 | | 28 | https://www.youtube.com/watch?v=Hx6k20LsAJ4 | | 29 | https://www.youtube.com/watch?v=OB6ZI5Bicww | | 30 | https://www.youtube.com/watch?v=uNMnIRKx0GE | | 31 | https://www.youtube.com/watch?v=U7w_MKl5_hE | | 32 | https://www.youtube.com/watch?v=KGi4Cpbh_Y0 | | 33 | https://www.youtube.com/watch?v=mQqRtaoyAdw | | 34 | https://www.youtube.com/watch?v=s3VzTy9oXXM | | 35 | https://www.youtube.com/watch?v=eCaojgO-ZWs | | 36 | https://www.youtube.com/watch?v=SeOLXwvu87E | | 37 | https://www.youtube.com/watch?v=IlZ6Y21rxTU | | 38 | https://www.youtube.com/watch?v=HxoRbEQFx3U | | 39 | https://www.youtube.com/watch?v=NDCAImW1o6o | | 40 | https://www.youtube.com/watch?v=gE778rR6-EM | | 41 | https://www.youtube.com/watch?v=cQ0eY9NJACQ | | 42 | https://www.youtube.com/watch?v=-x5Bx-leRWI | | 43 | https://www.youtube.com/watch?v=XQ0C_Dmf0hI | | 44 | https://www.youtube.com/watch?v=0eJ4JRNi4J8 | | 45 | https://www.youtube.com/watch?v=YczkDCv3GiM | | 46 | https://www.youtube.com/watch?v=GQmUsdUI20A | | 47 | https://www.youtube.com/watch?v=4CFnoywFia4 | | 48 | https://www.youtube.com/watch?v=A0Bzv8weX4s | | 49 | https://www.youtube.com/watch?v=YbxcaHn_d_o | | 50 | https://www.youtube.com/watch?v=GwUNT2k26mQ | | 51 | https://www.youtube.com/watch?v=zktcHftIhDs | | 52 | https://www.youtube.com/watch?v=_rY7Hvxe4x4 | | 53 | https://www.youtube.com/watch?v=rqB9gd4fbfE | | 54 | https://www.youtube.com/watch?v=oNPAhe7G3yg | | 55 | https://www.youtube.com/watch?v=37_aCQW98sU | | 56 | https://www.youtube.com/watch?v=GjA4fWIUv-A | | 57 | https://www.youtube.com/watch?v=8THBFF024ho | | 58 | https://www.youtube.com/watch?v=HLErXgsV3Nk | | 59 | https://www.youtube.com/watch?v=GsvdLIxY6Fg | | 60 | https://www.youtube.com/watch?v=iUU48DuTpl8 | | 61 | https://www.youtube.com/watch?v=5UluxcFJVx0 | | 62 | https://www.youtube.com/watch?v=5lOvAHg12uw | | 63 | https://www.youtube.com/watch?v=2UADjU66-4M | | 64 | https://www.youtube.com/watch?v=Qvr2labD_Es | | 65 | https://www.youtube.com/watch?v=qUWRnIn5oB0 | | 66 | https://www.youtube.com/watch?v=Qk7MPEyGhQ4 | | 67 | https://www.youtube.com/watch?v=bN7SDJFanS4 | | 68 | https://www.youtube.com/watch?v=6YoUjUGvHUk | | 69 | https://www.youtube.com/watch?v=NjiLz3HoWkM | | 70 | https://www.youtube.com/watch?v=rRdU7VhoWdI | | 71 | https://www.youtube.com/watch?v=zOm5n0OJLfc | | 72 | https://www.youtube.com/watch?v=z9jMFiSUe5Q | | 73 | https://www.youtube.com/watch?v=M6VLYjFnXMU | | 74 | https://www.youtube.com/watch?v=4iFEpKDQx-o | | 75 | https://www.youtube.com/watch?v=Zc1SE66DEYo | | 76 | https://www.youtube.com/watch?v=645qisC4slI | | 77 | https://www.youtube.com/watch?v=QeIRfgsVX5k | | 78 | https://www.youtube.com/watch?v=0jUr57dIMq4 | | 79 | https://www.youtube.com/watch?v=EjaTJGmoT_w | | 80 | https://www.youtube.com/watch?v=roXy5LA17fU | | 81 | https://www.youtube.com/watch?v=UeSwqepnAX0 | | 82 | https://www.youtube.com/watch?v=BDYSYypzhxE | | 83 | https://www.youtube.com/watch?v=iyBNxEnP7rk | | 84 | https://www.youtube.com/watch?v=YCUmI9f77qs | | 85 | https://www.youtube.com/watch?v=h21LYpHEfNU | | 86 | https://www.youtube.com/watch?v=LBQDuTn6T0c | | 87 | https://www.youtube.com/watch?v=le_0jyqCXFU | | 88 | https://www.youtube.com/watch?v=tGClvgTCrIY | | 89 | https://www.youtube.com/watch?v=969qt4RUx74 | | 90 | https://www.youtube.com/watch?v=XL8li__PnaA | | 91 | https://www.youtube.com/watch?v=RKf3ppfFUkg | | 92 | https://www.youtube.com/watch?v=xY5RyjaQJCE | | 93 | https://www.youtube.com/watch?v=6bjliN6hJTs | | 94 | https://www.youtube.com/watch?v=KcYBolH-j9c | | 95 | https://www.youtube.com/watch?v=nlsnpbRyvtU | | 96 | https://www.youtube.com/watch?v=AOWmL1eydWI | | 97 | https://www.youtube.com/watch?v=I8RPsF-hdXo | | 98 | https://www.youtube.com/watch?v=9NSOGd2p530 | | 99 | https://www.youtube.com/watch?v=8EdqpZu9lkM | | 100 | https://www.youtube.com/watch?v=a23wQEA4EAA | | 101 | https://www.youtube.com/watch?v=7g6TXGY-T6k | | 102 | https://www.youtube.com/watch?v=iXZNlGwOuWY | | 103 | https://www.youtube.com/watch?v=miR30bsSH4E | | 104 | https://www.youtube.com/watch?v=zb8-aHiTKL4 | | 105 | https://www.youtube.com/watch?v=rTEZmXq9K3k | | 106 | https://www.youtube.com/watch?v=OBeOJiolMug | | 107 | https://www.youtube.com/watch?v=fA0nxixnS-A | | 108 | https://www.youtube.com/watch?v=dMhpDlUTT_U | | 109 | https://www.youtube.com/watch?v=SgjDaPWjzuU | | 110 | https://www.youtube.com/watch?v=2lokqffmF2A | | 111 | https://www.youtube.com/watch?v=jmHZvGMe8pQ | | 112 | https://www.youtube.com/watch?v=KPYvMIMON9g |
… so on