How to extract element from a webpage with special class name?

Question

I have a txt file filed with multiple urls, each url is an article with text and their corresponding SDG (example of one article 1) The text parts of an article are in balises &#8216;div.text.-normal.content&#8217; and then in &#8216;p&#8217; And the SDGs are in &#8216;div.tax-section.text.-normal.small&#8217…

Accepted Answer

To select the text you can go with:soup.select_one('div.text.-normal.content').get_text(strip=True)Think there is something wrong with the names of the classes, just chain them with a . for every whitespace between them.or:soup.select_one('div.c-single-content').get_text(strip=True)To get the topics as mentioned you can go with:'^^'.join([topic.get_text(strip=True) for topic in soup.select_one('div.tax-section.text.-normal.small').select('a')])

Advertisement

Answer