Skip to content
Advertisement

How to find tag with particular text with Beautiful Soup?

How to find text I am looking for in the following HTML (line breaks marked with n)?

...
<tr>
  <td class="pos">n
      "Some text:"n
      <br>n
      <strong>some value</strong>n
  </td>
</tr>
<tr>
  <td class="pos">n
      "Fixed text:"n
      <br>n
      <strong>text I am looking for</strong>n
  </td>
</tr>
<tr>
  <td class="pos">n
      "Some other text:"n
      <br>n
      <strong>some other value</strong>n
  </td>
</tr>
...

The code below returns first found value, so I need to filter by "Fixed text:" somehow.

result = soup.find('td', {'class' :'pos'}).find('strong').text

UPDATE: If I use the following code:

title = soup.find('td', text = re.compile(ur'Fixed text:(.*)', re.DOTALL), attrs = {'class': 'pos'})
self.response.out.write(str(title.string).decode('utf8'))

then it returns just Fixed text:, not the <strong>-highlighted text in that same element.

Advertisement

Answer

You can pass a regular expression to the text parameter of findAll, like so:

import BeautifulSoup
import re

columns = soup.findAll('td', text = re.compile('your regex here'), attrs = {'class' : 'pos'})
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement