Extract data along with html tag when data is given as search item

I am using beautifulsoup to extract html data. I need to extract the html tags along with the data if data is given as search item provided the tag can be anything.

As a sample considering the following html code

    <h1>Hello</h1>
    <h1>Python Program</h1>
 
   <span class = true>Geeks</span>
   <span class = false>Geeks New</span>
 
   <li class = 1 >Python Program</li>
   <li class = 2 >Python Code</li>
   <li class = 3 >Hello</li>
 
   <table>
       <tr>Website</tr>
   </table>

JavaScript
​x
 
    <h1>Hello</h1>
    <h1>Python Program</h1>
 
   <span class = true>Geeks</span>
   <span class = false>Geeks New</span>
 
   <li class = 1 >Python Program</li>
   <li class = 2 >Python Code</li>
   <li class = 3 >Hello</li>
 
   <table>
       <tr>Website</tr>
   </table>
​

Using the following code if tag is known, then the entire tag with data is available

pattern = 'Hello'
text1 = soup.find_all('li', text = pattern)
print(text1)

JavaScript
 
pattern = 'Hello'
text1 = soup.find_all('li', text = pattern)
print(text1)
​

This will give the

[<li class = 3 >Hello</li>]

JavaScript
 
[<li class = 3 >Hello</li>]
​

But if I give ‘Hello’ as search item I need to get all the tags which contain ‘Hello’ like

[<h1>Hello</h1>, <li class = 3 >Hello</li>]

JavaScript
 
[<h1>Hello</h1>, <li class = 3 >Hello</li>]
​

Answer

You could use a css selector that checks if an element contains a string:

soup.select(':-soup-contains("Hello")')

JavaScript
 
soup.select(':-soup-contains("Hello")')
​

Example

from bs4 import BeautifulSoup
html ='''
<h1>Hello</h1>
    <h1>Python Program</h1>
 
   <span class = true>Geeks</span>
   <span class = false>Geeks New</span>
 
   <li class = 1 >Python Program</li>
   <li class = 2 >Python Code</li>
   <li class = 3 >Hello</li>
 
   <table>
       <tr>Website</tr>
   </table>
'''

pattern = 'Hello'
soup = BeautifulSoup(html, 'html.parser')

soup.select(f':-soup-contains({pattern})')

JavaScript
 
from bs4 import BeautifulSoup
html ='''
<h1>Hello</h1>
    <h1>Python Program</h1>
 
   <span class = true>Geeks</span>
   <span class = false>Geeks New</span>
 
   <li class = 1 >Python Program</li>
   <li class = 2 >Python Code</li>
   <li class = 3 >Hello</li>
 
   <table>
       <tr>Website</tr>
   </table>
'''
​
pattern = 'Hello'
soup = BeautifulSoup(html, 'html.parser')
​
soup.select(f':-soup-contains({pattern})')
​

Output

[<h1>Hello</h1>, <li class="3">Hello</li>]

JavaScript
 
[<h1>Hello</h1>, <li class="3">Hello</li>]
​

Advertisement

Answer

Example

Output