Skip to content
Advertisement

How to find a tag within the same parent that has the child I want?

Scraping a website that has multiple products on the same page, some that I don’t want to know the prices of. So I wanted to first see the product category to then get the price listed.

The website code looks like this:

<section class="products_results">
   <span something I don't want>...</span>
   <section class="category">
      <span>Clothes</span>
   <div something I don't want>...</div>
   <section class="search_result_price">
      <section>
         <span something I don't want>...</span>
         <span class="price">149.99</span>
      </section>
</section>

I already know how to get to the category part with my own code, but I’m completely stuck on the other part.

for products in soup.find_all(class_='category'):
   category = (products.text)
   if category == 'Clothes':
      price = (theoretical piece of code)

How can I get to the specific price tag within this parent <section> tag?

Advertisement

Answer

You are close to your goal but be aware that products.text will give you the whole section text, better use products.span.text to get the category text only.

To get the price info, simply find the span with class="price" and check if it is available or not to avoid errors:

price = products.find(class_='price').text if products.find('span', class_='price') else None

Example
from bs4 import BeautifulSoup

html='''
<section class="products_results">
   <span something I don't want>...</span>
   <section class="category">
      <span>Clothes</span>
   <div something I don't want>...</div>
   <section class="search_result_price">
      <section>
         <span something I don't want>...</span>
         <span class="price">149.99</span>
      </section>
</section>'''

soup = BeautifulSoup(html, 'html.parser')

for products in soup.find_all('section', class_='category'):
    category = products.span.text
    if category == 'Clothes':
        price = products.find(class_='price').text if products.find('span', class_='price') else None
        print(price)

Output
149.99


As alternative an approach that is more lean, creates a structured output that is easy to process and deals with a list of permitted categories:

from bs4 import BeautifulSoup

    html='''
    <section class="products_results">
       <span something I don't want>...</span>
       <section class="category">
          <span>Clothes</span>
       <div something I don't want>...</div>
       <section class="search_result_price">
          <section>
             <span something I don't want>...</span>
             <span class="price">149.99</span>
          </section>
       <span something I don't want>...</span>
       <section class="category">
          <span>Shoes</span>
       <div something I don't want>...</div>
       <section class="search_result_price">
          <section>
             <span something I don't want>...</span>
             <span class="price">90.99</span>
          </section>
    </section>'''
    
    soup = BeautifulSoup(html, 'html.parser')
    
    data = []
    
    c_list = ['Clothes','Shoes']
    
    for products in soup.select(f"section.category:-soup-contains({','.join(c_list)})"):
        data.append({
            'category' : products.span.text,
            'price' : products.find(class_='price').text if products.find('span', class_='price') else None
        })
    
    data

Output
[{'category': 'Clothes', 'price': '149.99'},
 {'category': 'Shoes', 'price': '90.99'}]

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement