How to find text I am looking for
in the following HTML (line breaks marked with n
)?
... <tr> <td class="pos">n "Some text:"n <br>n <strong>some value</strong>n </td> </tr> <tr> <td class="pos">n "Fixed text:"n <br>n <strong>text I am looking for</strong>n </td> </tr> <tr> <td class="pos">n "Some other text:"n <br>n <strong>some other value</strong>n </td> </tr> ...
The code below returns first found value, so I need to filter by "Fixed text:"
somehow.
result = soup.find('td', {'class' :'pos'}).find('strong').text
UPDATE: If I use the following code:
title = soup.find('td', text = re.compile(ur'Fixed text:(.*)', re.DOTALL), attrs = {'class': 'pos'}) self.response.out.write(str(title.string).decode('utf8'))
then it returns just Fixed text:
, not the <strong>
-highlighted text in that same element.
Advertisement
Answer
You can pass a regular expression to the text parameter of findAll
, like so:
import BeautifulSoup import re columns = soup.findAll('td', text = re.compile('your regex here'), attrs = {'class' : 'pos'})