I have a html
document that looks similar to this:
<div class='product'> <table> <tr> random stuff here </tr> <tr class='line1'> <td class='row'> <span>TEXT I NEED</span> </td> </tr> <tr class='line2'> <td class='row'> <span>MORE TEXT I NEED</span> </td> </tr> <tr class='line3'> <td class='row'> <span>EVEN MORE TEXT I NEED</span> </td> </tr> </table> </div>
So i have used this code but i am getting the first text from the tr that’s not a class, and i need to ignore it:
soup.findAll('tr').text
Also, when I try to do just a class, this doesn’t seem to be valid python:
soup.findAll('tr', {'class'})
I would like some help extracting the text.
Advertisement
Answer
To get the desired output, use a CSS Selector to exclude the first <tr>
tag, and select the rest:
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') for tag in soup.select('.product tr:not(.product tr:nth-of-type(1))'): print(tag.text.strip())
Output :
TEXT I NEED MORE TEXT I NEED EVEN MORE TEXT I NEED