My goal is to get each link
My code prints the href/link, however it also prints other junk which i do not want.
I only want the href/
from selenium import webdriver from bs4 import BeautifulSoup import pandas as pd import time import requests driver = webdriver.Chrome() productlink=[] for x in range (1,3): driver.get(f'https://meetinglibrary.asco.org/browse-meetings/2021%20Gastrointestinal%20Cancers%20Symposium?page={x}') time.sleep(3) page_source = driver.page_source soup = BeautifulSoup(page_source,'html.parser') productlist=soup.find_all('div',class_='session') for item in productlist: for link in item.find_all('a',class_='session__button ng-star-inserted',href=True): print(link)
Advertisement
Answer
Because href=True
means get those tags with href
attribute.There are still Tag
. To get the href
, you also need to use .get("href")
.Since there is only one button in each session
tag, you could use find
instead of find_all
,and don’t forget to join the baseURL
.Try code below:
from selenium import webdriver from bs4 import BeautifulSoup import pandas as pd import time import requests driver = webdriver.Chrome() productlink=[] baseURL = 'https://meetinglibrary.asco.org' for x in range (1,3): driver.get(f'https://meetinglibrary.asco.org/browse-meetings/2021%20Gastrointestinal%20Cancers%20Symposium?page={x}') time.sleep(3) page_source = driver.page_source soup = BeautifulSoup(page_source,'html.parser') productlist=soup.find_all('div',class_='session') for item in productlist: print(baseURL + item.find('a',class_='session__button ng-star-inserted',href=True).get("href"))
Print:
https://meetinglibrary.asco.org/session/13455 https://meetinglibrary.asco.org/session/13458 https://meetinglibrary.asco.org/session/13445 https://meetinglibrary.asco.org/session/13450 https://meetinglibrary.asco.org/session/13460 https://meetinglibrary.asco.org/session/13462 https://meetinglibrary.asco.org/session/13464 https://meetinglibrary.asco.org/session/13459 https://meetinglibrary.asco.org/session/13446 https://meetinglibrary.asco.org/session/13451 https://meetinglibrary.asco.org/session/13461 https://meetinglibrary.asco.org/session/13463 https://meetinglibrary.asco.org/session/13465 https://meetinglibrary.asco.org/session/13399 https://meetinglibrary.asco.org/session/13443 https://meetinglibrary.asco.org/session/13444 https://meetinglibrary.asco.org/session/13352 https://meetinglibrary.asco.org/session/13381 https://meetinglibrary.asco.org/session/13383 https://meetinglibrary.asco.org/session/13372 https://meetinglibrary.asco.org/session/13382 https://meetinglibrary.asco.org/session/13447 https://meetinglibrary.asco.org/session/13849 https://meetinglibrary.asco.org/session/13384 https://meetinglibrary.asco.org/session/13389 https://meetinglibrary.asco.org/session/13453 https://meetinglibrary.asco.org/session/13859 https://meetinglibrary.asco.org/session/13391 https://meetinglibrary.asco.org/session/13392 https://meetinglibrary.asco.org/session/13394 ....