I am trying to parse a specific href link from the following website: https://www.murray-intl.co.uk/en/literature-library.
Element i seek to parse:
JavaScript
x
2
1
<a class="btn btn--naked btn--icon-left btn--block focus-within" href="https://www.aberdeenstandard.com/docs?editionId=9123afa2-5318-4715-9783-e07d08e2e7cc&_ga=2.12911351.1364356977.1629796255-1577053129.1629192717" target="blank">Portfolio Holding Summary<i class="material-icons btn__icon">library_books</i></a>
2
However, using BeautifulSoup I am unable to obtain the desired element, perhaps due to cookies acceptance.
JavaScript
1
10
10
1
from bs4 import BeautifulSoup
2
import urllib.request
3
import requests as rq
4
5
page = requests.get('https://www.murray-intl.co.uk/en/literature-library')
6
soup = BeautifulSoup(page.content, 'html.parser')
7
link = soup.find_all('a', class_='btn btn--naked btn--icon-left btn--block focus-within')
8
url = link[0].get('href')
9
url
10
I am still new at BS4, and hope someone can help me on the right course.
Thank you in advance!
Advertisement
Answer
To get correct tags, remove "focus-within"
class (it’s added later by JavaScript):
JavaScript
1
10
10
1
import requests
2
from bs4 import BeautifulSoup
3
4
url = "https://www.murray-intl.co.uk/en/literature-library"
5
soup = BeautifulSoup(requests.get(url).content, "html.parser")
6
7
links = soup.find_all("a", class_="btn btn--naked btn--icon-left btn--block")
8
for u in links:
9
print(u.get_text(strip=True), u.get("href", ""))
10
Prints:
JavaScript
1
6
1
2
3
Portfolio Holding Summarylibrary_books https://www.aberdeenstandard.com/docs?editionId=9123afa2-5318-4715-9783-e07d08e2e7cc
4
5
6
EDIT: To get only the specified link you can use for example CSS selector:
JavaScript
1
3
1
link = soup.select_one('a:-soup-contains("Portfolio Holding Summary")')
2
print(link["href"])
3
Prints:
JavaScript
1
2
1
https://www.aberdeenstandard.com/docs?editionId=9123afa2-5318-4715-9783-e07d08e2e7cc
2