I am using beautifulsoup to extract html data. I need to extract the html tags along with the data if data is given as search item provided the tag can be anything.
As a sample considering the following html code
JavaScript
x
14
14
1
<h1>Hello</h1>
2
<h1>Python Program</h1>
3
4
<span class = true>Geeks</span>
5
<span class = false>Geeks New</span>
6
7
<li class = 1 >Python Program</li>
8
<li class = 2 >Python Code</li>
9
<li class = 3 >Hello</li>
10
11
<table>
12
<tr>Website</tr>
13
</table>
14
Using the following code if tag is known, then the entire tag with data is available
JavaScript
1
4
1
pattern = 'Hello'
2
text1 = soup.find_all('li', text = pattern)
3
print(text1)
4
This will give the
JavaScript
1
2
1
[<li class = 3 >Hello</li>]
2
But if I give ‘Hello’ as search item I need to get all the tags which contain ‘Hello’ like
JavaScript
1
2
1
[<h1>Hello</h1>, <li class = 3 >Hello</li>]
2
Advertisement
Answer
You could use a css selector
that checks if an element contains a string:
JavaScript
1
2
1
soup.select(':-soup-contains("Hello")')
2
Example
JavaScript
1
22
22
1
from bs4 import BeautifulSoup
2
html ='''
3
<h1>Hello</h1>
4
<h1>Python Program</h1>
5
6
<span class = true>Geeks</span>
7
<span class = false>Geeks New</span>
8
9
<li class = 1 >Python Program</li>
10
<li class = 2 >Python Code</li>
11
<li class = 3 >Hello</li>
12
13
<table>
14
<tr>Website</tr>
15
</table>
16
'''
17
18
pattern = 'Hello'
19
soup = BeautifulSoup(html, 'html.parser')
20
21
soup.select(f':-soup-contains({pattern})')
22
Output
JavaScript
1
2
1
[<h1>Hello</h1>, <li class="3">Hello</li>]
2