See this page with ECB press releases. These go back to 1997, so it would be nice to automate getting all the links going back in time.
I found the tag that harbours the links ('//*[@id="lazyload-container"]'
), but it only gets the most recent links.
How to get the rest?
JavaScript
x
7
1
from bs4 import BeautifulSoup
2
from selenium import webdriver
3
driver = webdriver.Firefox(executable_path=r'/usr/local/bin/geckodriver')
4
driver.get(url)
5
element = driver.find_element_by_xpath('//*[@id="lazyload-container"]')
6
element = element.get_attribute('innerHTML')
7
Advertisement
Answer
The data is loaded via JavaScript from another URL. You can use this example how to load the releases from different years:
JavaScript
1
10
10
1
import requests
2
from bs4 import BeautifulSoup
3
4
url = "https://www.ecb.europa.eu/press/pr/date/{}/html/index_include.en.html"
5
6
for year in range(1997, 2023):
7
soup = BeautifulSoup(requests.get(url.format(year)).content, "html.parser")
8
for a in soup.select(".title a")[::-1]:
9
print(a.find_previous(class_="date").text, a.text)
10
Prints:
JavaScript
1
11
11
1
25 April 1997 "EUR" - the new currency code for the euro
2
1 July 1997 Change of presidency of the European Monetary Institute
3
2 July 1997 The security features of the euro banknotes
4
2 July 1997 The EMI's mandate with respect to banknotes
5
6
7
8
17 February 2022 Financial statements of the ECB for 2021
9
21 February 2022 Survey on credit terms and conditions in euro-denominated securities financing and over-the-counter derivatives markets (SESFOD) - December 2021
10
21 February 2022 Results of the December 2021 survey on credit terms and conditions in euro-denominated securities financing and over-the-counter derivatives markets (SESFOD)
11
EDIT: To print links:
JavaScript
1
14
14
1
import requests
2
from bs4 import BeautifulSoup
3
4
url = "https://www.ecb.europa.eu/press/pr/date/{}/html/index_include.en.html"
5
6
for year in range(1997, 2023):
7
soup = BeautifulSoup(requests.get(url.format(year)).content, "html.parser")
8
for a in soup.select(".title a")[::-1]:
9
print(
10
a.find_previous(class_="date").text,
11
a.text,
12
"https://www.ecb.europa.eu" + a["href"],
13
)
14
Prints:
JavaScript
1
7
1
2
3
15 December 1999 Monetary policy decisions https://www.ecb.europa.eu/press/pr/date/1999/html/pr991215.en.html
4
20 December 1999 Visit by the Finnish Prime Minister https://www.ecb.europa.eu/press/pr/date/1999/html/pr991220.en.html
5
6
7