Skip to content
Advertisement

Python – Extract string from website with Beautifulsoup

I would like to extract a string from a HTML source with only beautifulsoup. I am trying to extract: “1 van de maximaal 3 actieve reacties” from the following HTML:

<span class="titel ng-scope" translate="ReactiesTitel-Titel-actieve" translate-values="getTranslationValues()">1 van de maximaal 3 actieve reacties</span>

My current code retrieves the entire span class, but I cannot find out how I can only extract the string, without the use of .split or some sort of string manipulation.

Current code:

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
x = soup.find('span', {'class':'titel ng-scope'})
print(x)

Advertisement

Answer

from bs4 import BeautifulSoup

If you have:

html = '<span class="titel ng-scope" translate="ReactiesTitel-Titel-actieve" translate-values="getTranslationValues()">1 van de maximaal 3 actieve reacties</span>'
soup = BeautifulSoup(html, 'html.parser')

You can get 1 van de maximaal 3 actieve reacties by:

soup.text

A similar thread, where I got the idea from is: How to get text from span tag in BeautifulSoup.

Advertisement