My goal is to get each link
My code prints the href/link, however it also prints other junk which i do not want.
I only want the href/
JavaScript
x
17
17
1
from selenium import webdriver
2
from bs4 import BeautifulSoup
3
import pandas as pd
4
import time
5
import requests
6
driver = webdriver.Chrome()
7
productlink=[]
8
for x in range (1,3):
9
driver.get(f'https://meetinglibrary.asco.org/browse-meetings/2021%20Gastrointestinal%20Cancers%20Symposium?page={x}')
10
time.sleep(3)
11
page_source = driver.page_source
12
soup = BeautifulSoup(page_source,'html.parser')
13
productlist=soup.find_all('div',class_='session')
14
for item in productlist:
15
for link in item.find_all('a',class_='session__button ng-star-inserted',href=True):
16
print(link)
17
Advertisement
Answer
Because href=True
means get those tags with href
attribute.There are still Tag
. To get the href
, you also need to use .get("href")
.Since there is only one button in each session
tag, you could use find
instead of find_all
,and don’t forget to join the baseURL
.Try code below:
JavaScript
1
17
17
1
from selenium import webdriver
2
from bs4 import BeautifulSoup
3
import pandas as pd
4
import time
5
import requests
6
driver = webdriver.Chrome()
7
productlink=[]
8
baseURL = 'https://meetinglibrary.asco.org'
9
for x in range (1,3):
10
driver.get(f'https://meetinglibrary.asco.org/browse-meetings/2021%20Gastrointestinal%20Cancers%20Symposium?page={x}')
11
time.sleep(3)
12
page_source = driver.page_source
13
soup = BeautifulSoup(page_source,'html.parser')
14
productlist=soup.find_all('div',class_='session')
15
for item in productlist:
16
print(baseURL + item.find('a',class_='session__button ng-star-inserted',href=True).get("href"))
17
Print:
JavaScript
1
32
32
1
https://meetinglibrary.asco.org/session/13455
2
https://meetinglibrary.asco.org/session/13458
3
https://meetinglibrary.asco.org/session/13445
4
https://meetinglibrary.asco.org/session/13450
5
https://meetinglibrary.asco.org/session/13460
6
https://meetinglibrary.asco.org/session/13462
7
https://meetinglibrary.asco.org/session/13464
8
https://meetinglibrary.asco.org/session/13459
9
https://meetinglibrary.asco.org/session/13446
10
https://meetinglibrary.asco.org/session/13451
11
https://meetinglibrary.asco.org/session/13461
12
https://meetinglibrary.asco.org/session/13463
13
https://meetinglibrary.asco.org/session/13465
14
https://meetinglibrary.asco.org/session/13399
15
https://meetinglibrary.asco.org/session/13443
16
https://meetinglibrary.asco.org/session/13444
17
https://meetinglibrary.asco.org/session/13352
18
https://meetinglibrary.asco.org/session/13381
19
https://meetinglibrary.asco.org/session/13383
20
https://meetinglibrary.asco.org/session/13372
21
https://meetinglibrary.asco.org/session/13382
22
https://meetinglibrary.asco.org/session/13447
23
https://meetinglibrary.asco.org/session/13849
24
https://meetinglibrary.asco.org/session/13384
25
https://meetinglibrary.asco.org/session/13389
26
https://meetinglibrary.asco.org/session/13453
27
https://meetinglibrary.asco.org/session/13859
28
https://meetinglibrary.asco.org/session/13391
29
https://meetinglibrary.asco.org/session/13392
30
https://meetinglibrary.asco.org/session/13394
31
.
32