I am on a scraping project and I am lookin to scrape from the following.
<div class="spec-subcat attributes-religion"> <span class="h5">Faith:</span> <span>Christian</span> <span>Islam</span> </div>
I want to extract only Christian, Islam as the output.(Without the ‘Faith:’).
This is my try:
faithdiv = soup.find('div', class_='spec-subcat attributes-religion') faith = faithdiv.find('span').text.strip()
How can I make this done?
Advertisement
Answer
There are several ways you can fix this, I would suggest the following – Find all <span>
in <div>
that have not the class="h5"
:
soup.select('div.spec-subcat.attributes-religion span:not(.h5)')
Example
import requests html_text = ''' <div class="spec-subcat attributes-religion"> <span class="h5">Faith:</span> <span>Christian</span> <span>Islam</span> </div> ''' soup = BeautifulSoup(html_text, 'lxml') ', '.join([x.get_text() for x in soup.select('div.spec-subcat.attributes-religion span:not(.h5)')])
Output
Christian, Islam