How to find text I am looking for
in the following HTML (line breaks marked with n
)?
JavaScript
x
24
24
1
2
<tr>
3
<td class="pos">n
4
"Some text:"n
5
<br>n
6
<strong>some value</strong>n
7
</td>
8
</tr>
9
<tr>
10
<td class="pos">n
11
"Fixed text:"n
12
<br>n
13
<strong>text I am looking for</strong>n
14
</td>
15
</tr>
16
<tr>
17
<td class="pos">n
18
"Some other text:"n
19
<br>n
20
<strong>some other value</strong>n
21
</td>
22
</tr>
23
24
The code below returns first found value, so I need to filter by "Fixed text:"
somehow.
JavaScript
1
2
1
result = soup.find('td', {'class' :'pos'}).find('strong').text
2
UPDATE: If I use the following code:
JavaScript
1
3
1
title = soup.find('td', text = re.compile(ur'Fixed text:(.*)', re.DOTALL), attrs = {'class': 'pos'})
2
self.response.out.write(str(title.string).decode('utf8'))
3
then it returns just Fixed text:
, not the <strong>
-highlighted text in that same element.
Advertisement
Answer
You can pass a regular expression to the text parameter of findAll
, like so:
JavaScript
1
5
1
import BeautifulSoup
2
import re
3
4
columns = soup.findAll('td', text = re.compile('your regex here'), attrs = {'class' : 'pos'})
5