Skip to content
Advertisement

BeautifulSoup how to only return class objects

I have a html document that looks similar to this:

<div class='product'>
    <table>
        <tr>
            random stuff here
        </tr>
        <tr class='line1'>
            <td class='row'>
                <span>TEXT I NEED</span>
            </td>
        </tr>
        <tr class='line2'>
            <td class='row'>
                <span>MORE TEXT I NEED</span>
            </td>
        </tr>
        <tr class='line3'>
            <td class='row'>
                <span>EVEN MORE TEXT I NEED</span>
            </td>
        </tr>
    </table>
</div>

So i have used this code but i am getting the first text from the tr that’s not a class, and i need to ignore it:

soup.findAll('tr').text

Also, when I try to do just a class, this doesn’t seem to be valid python:

soup.findAll('tr', {'class'})

I would like some help extracting the text.

Advertisement

Answer

To get the desired output, use a CSS Selector to exclude the first <tr> tag, and select the rest:

from bs4 import BeautifulSoup


soup = BeautifulSoup(html, 'html.parser')

for tag in soup.select('.product tr:not(.product tr:nth-of-type(1))'):
    print(tag.text.strip())

Output :

TEXT I NEED
MORE TEXT I NEED
EVEN MORE TEXT I NEED
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement