Skip to content
Advertisement

Python beautiful soup get only body content without header or footer data [closed]

In my code I need to get only the main text not the header or footer data. I also would like to filter out any html/css/js code that is received with the request. How would I do this? I have tried making a request with requests, looking through the data with beautiful soup and then printing the body content. The issue with this is that it is also picking up the footer and header contents. Thanks for any responses in advance!

Advertisement

Answer

Use the browser developer tools (Usually F12) to find out what element contains the content you are looking for. Usually content other than headers and footers will be in <section> or <article> elements.

You can then use something like soup.article.get_text() to retrieve text from the containing element.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement