I have a html document that looks similar to this: So i have used this code but i am getting the first text from the tr that's not a class, and i need to ignore it: Also, when I try to do just a class, this doesn't seem to be valid python: I would like some help extracting the text.

BeautifulSoup how to only return class objects

I have a html document that looks similar to this:

<div class='product'>
    <table>
        <tr>
            random stuff here
        </tr>
        <tr class='line1'>
            <td class='row'>
                <span>TEXT I NEED</span>
            </td>
        </tr>
        <tr class='line2'>
            <td class='row'>
                <span>MORE TEXT I NEED</span>
            </td>
        </tr>
        <tr class='line3'>
            <td class='row'>
                <span>EVEN MORE TEXT I NEED</span>
            </td>
        </tr>
    </table>
</div>

JavaScript
​x
 
<div class='product'>
    <table>
        <tr>
            random stuff here
        </tr>
        <tr class='line1'>
            <td class='row'>
                <span>TEXT I NEED</span>
            </td>
        </tr>
        <tr class='line2'>
            <td class='row'>
                <span>MORE TEXT I NEED</span>
            </td>
        </tr>
        <tr class='line3'>
            <td class='row'>
                <span>EVEN MORE TEXT I NEED</span>
            </td>
        </tr>
    </table>
</div>
​

So i have used this code but i am getting the first text from the tr that’s not a class, and i need to ignore it:

soup.findAll('tr').text

JavaScript
 
soup.findAll('tr').text
​

Also, when I try to do just a class, this doesn’t seem to be valid python:

soup.findAll('tr', {'class'})

JavaScript
 
soup.findAll('tr', {'class'})
​

I would like some help extracting the text.

Answer

To get the desired output, use a CSS Selector to exclude the first <tr> tag, and select the rest:

from bs4 import BeautifulSoup


soup = BeautifulSoup(html, 'html.parser')

for tag in soup.select('.product tr:not(.product tr:nth-of-type(1))'):
    print(tag.text.strip())

JavaScript
 
from bs4 import BeautifulSoup
​
​
soup = BeautifulSoup(html, 'html.parser')
​
for tag in soup.select('.product tr:not(.product tr:nth-of-type(1))'):
    print(tag.text.strip())
​

Output :

TEXT I NEED
MORE TEXT I NEED
EVEN MORE TEXT I NEED

JavaScript
 
TEXT I NEED
MORE TEXT I NEED
EVEN MORE TEXT I NEED
​

Advertisement

Answer