Why do I run into trouble webscraping this website in Python?

Question

I am new to Python and I am trying to webscrape this website. What I am trying to do is to get just dates and articles&#8217; titles from this website. I follow a procedure I found on SO which is as follows: I got .title a , .date using SelectorGadget in the url I shared. However, print(movies) is empty. What

Accepted Answer

The content is not part of index.en.html but is loaded in by js fromhttps://www.ecb.europa.eu/press/inter/date/2021/html/index_include.en.htmlThen you can&#8217;t select pairs afaik, so you need to select for titles and dates separately:titles = soup.select(".title a")dates = soup.select(".date")pairs = list(zip(titles, dates))Then you can print them out like this:movies_titles = [pair[0].text for pair in pairs]print(movies_titles)movies_links = ["http://www.ecb.europa.eu" + pair[0]["href"] for pair in pairs]print(movies_links)Result:['Christine Lagarde:xa0Interview with CNBC', 'Fabio Panetta:xa0Interview with El País ', 'Isabel Schnabel:xa0Interview with Der Spiegel', 'Philip R. Lane:xa0Interview with CNBC', 'Frank Elderson:xa0Q&A on Twitter', 'Isabel Schnabel:xa0Interview with Les Echos ', 'Philip R. Lane:xa0Interview with the Financial Times', 'Luis de Guindos:xa0Interview with Público', 'Philip R. Lane:xa0Interview with Expansión', 'Isabel Schnabel:xa0Interview with LETA', 'Fabio Panetta:xa0Interview with Der Spiegel', 'Christine Lagarde:xa0Interview with Le Journal du Dimanche ', 'Philip R. Lane:xa0Interview with Süddeutsche Zeitung', 'Isabel Schnabel:xa0Interview with Deutschlandfunk', 'Philip R. Lane:xa0Interview with SKAI TV', 'Isabel Schnabel:xa0Interview with Der Standard']['http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210412~ccd1b7c9bf.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210411~44ade9c3b5.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210409~c8c348a12c.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210323~e4026c61d1.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210317_1~1d81212506.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210317~458636d643.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210316~930d09ce3c.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210302~c793ad7b68.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210226~79eba6f9fb.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210225~5f1be75a9f.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210209~af9c628e30.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210207~f6e34f3b90.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210131_1~650f5ce5f7.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210131~13d84cb9b2.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210127~9ad88eb038.en.html', 'http://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in210112~1c3f989acd.en.html']Full code:from bs4 import BeautifulSoupimport requestsurl = "https://www.ecb.europa.eu/press/inter/date/2021/html/index_include.en.html"res = requests.get(url)soup = BeautifulSoup(res.text)titles = soup.select(".title a")dates = soup.select(".date")pairs = list(zip(titles, dates))movies_titles = [pair[0].text for pair in pairs]print(movies_titles)movies_links = ["http://www.ecb.europa.eu" + pair[0]["href"] for pair in pairs]print(movies_links)

Advertisement

Answer