Skip to content
Advertisement

Beautiful Soup Nested Tag Search

I am trying to write a python program that will count the words on a web page. I use Beautiful Soup 4 to scrape the page but I have difficulties accessing nested HTML tags (for example: <p class="hello"> inside <div>).

Every time I try finding such tag using page.findAll() (page is Beautiful Soup object containing the whole page) method it simply doesn’t find any, although there are. Is there any simple method or another way to do it?

Advertisement

Answer

Maybe I’m guessing what you are trying to do is first looking in a specific div tag and the search all p tags in it and count them or do whatever you want. For example:

soup = bs4.BeautifulSoup(content, 'html.parser') 

# This will get the div
div_container = soup.find('div', class_='some_class')  

# Then search in that div_container for all p tags with class "hello"
for ptag in div_container.find_all('p', class_='hello'):
    # prints the p tag content
    print(ptag.text)

Hope that helps

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement