I am trying to write a python program that will count the words on a web page. I use Beautiful Soup 4 to scrape the page but I have difficulties accessing nested HTML tags (for example: <p class="hello">
inside <div>
).
Every time I try finding such tag using page.findAll()
(page is Beautiful Soup object containing the whole page) method it simply doesn’t find any, although there are. Is there any simple method or another way to do it?
Advertisement
Answer
Maybe I’m guessing what you are trying to do is first looking in a specific div tag and the search all p tags in it and count them or do whatever you want. For example:
soup = bs4.BeautifulSoup(content, 'html.parser') # This will get the div div_container = soup.find('div', class_='some_class') # Then search in that div_container for all p tags with class "hello" for ptag in div_container.find_all('p', class_='hello'): # prints the p tag content print(ptag.text)
Hope that helps