Skip to content
Advertisement

Web Scraping find not moving on to next item

from bs4 import BeautifulSoup
import requests


def kijiji():
    source = requests.get('https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274').text
    soup = BeautifulSoup(source,'lxml')
    b = soup.find('div', class_='price')
    for link in soup.find_all('a',class_ = 'title'):
        a = link.get('href')
        fulllink = 'http://kijiji.ca'+a
        print(fulllink)
        b = soup.find('div', class_='price')
        print(b.prettify())
kijiji()

Usage of this is to sum up all the different kinds of items sold in kijiji and pair them up with a price. But I can’t seem to find anyway to increment what beautiful soup is finding with a class of price, and I’m stuck with the first price. Find_all doesn’t work either as it just prints out the whole blob instead of grouping it together with each item.

Advertisement

Answer

If you have Beautiful soup 4.7.1 or above you can use following css selector select() which is much faster.

code:

import requests
from bs4 import BeautifulSoup

res=requests.get("https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274").text
soup=BeautifulSoup(res,'html.parser')
for item in soup.select('.info-container'):
    fulllink = 'http://kijiji.ca' + item.find_next('a', class_='title')['href']
    print(fulllink)
    price=item.select_one('.price').text.strip()
    print(price)

Or to use find_all() use below code block

import requests
from bs4 import BeautifulSoup

res=requests.get("https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274").text
soup=BeautifulSoup(res,'html.parser')
for item in soup.find_all('div',class_='info-container'):
    fulllink = 'http://kijiji.ca' + item.find_next('a', class_='title')['href']
    print(fulllink)
    price=item.find_next(class_='price').text.strip()
    print(price)
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement