I’m trying to learn beatifulsoup (and python as a whole, pretty much still a beginner) and playing around with how to use it properly. I notice that when I scrape the website I’m testing for data from the search results, it lists it 3 times.
Specifically, I’m trying to output the title, link, and price of the real estate property from the website. The price doesn’t seem to duplicate while the title and link does. Can’t really figure out if it’s because of my code or something with the website itself.
import requests from bs4 import BeautifulSoup userSearch = input('Input search: ') link = 'https://www.lamudi.com.ph/buy/?q={}'.format(userSearch) page = requests.get(link) soup = BeautifulSoup(page.content, 'html.parser') titleList = soup.find_all("a", title=True) priceList = soup.find_all("span", class_="PriceSection-FirstPrice", text=True) for (i,j) in zip(titleList, priceList): print(i['title']) print(i['href']) print(j.get_text()) print("===============")
Output would be something like this where the price doesn’t match the listing because of the duplicated info:
Input Property search: manila Suntrust Solana in Ermita Manila 3 Bedroom Unit for Sale https://www.lamudi.com.ph/suntrust-solana-in-ermita-manila-3-bedroom-unit-for-sale.html ₱ 6,672,197 =============== Suntrust Solana in Ermita Manila 3 Bedroom Unit for Sale https://www.lamudi.com.ph/suntrust-solana-in-ermita-manila-3-bedroom-unit-for-sale.html ₱ 6,888,800 =============== Suntrust Solana in Ermita Manila 3 Bedroom Unit for Sale https://www.lamudi.com.ph/suntrust-solana-in-ermita-manila-3-bedroom-unit-for-sale.html ₱ 168,000,000 =============== 3 Bedroom Unit For Sale in Suntrust Solana Manila- SOLA-01-21-H https://www.lamudi.com.ph/3-bedroom-unit-for-sale-in-suntrust-solana-manila-sola-01-21-h.html ₱ 53,000,000 =============== 3 Bedroom Unit For Sale in Suntrust Solana Manila- SOLA-01-21-H https://www.lamudi.com.ph/3-bedroom-unit-for-sale-in-suntrust-solana-manila-sola-01-21-h.html ₱ 53,000,000 =============== 3 Bedroom Unit For Sale in Suntrust Solana Manila- SOLA-01-21-H https://www.lamudi.com.ph/3-bedroom-unit-for-sale-in-suntrust-solana-manila-sola-01-21-h.html ₱ 46,500,000
Advertisement
Answer
You should iterate row-by-row. It’s safer than using zip()
.
To get all titles, links + prices you can use next example:
import requests from bs4 import BeautifulSoup userSearch = "manila" link = "https://www.lamudi.com.ph/buy/?q={}".format(userSearch) soup = BeautifulSoup(requests.get(link).content, "html.parser") for row in soup.select(".ListingCell-row"): title = row.h2.get_text(strip=True) link = row.a["href"] price = row.select_one( ".PriceSection-FirstPrice, .PriceSection-NoPrice" ).get_text(strip=True) print(title) print(link) print(price) print("=" * 80)
Prints:
Suntrust Solana in Ermita Manila 3 Bedroom Unit for Sale https://www.lamudi.com.ph/suntrust-solana-in-ermita-manila-3-bedroom-unit-for-sale.html ₱ 6,672,197 ================================================================================ 3 Bedroom Unit For Sale in Suntrust Solana Manila- SOLA-01-21-H https://www.lamudi.com.ph/3-bedroom-unit-for-sale-in-suntrust-solana-manila-sola-01-21-h.html ₱ 6,888,800 ================================================================================ Rare Sale!!Prime Commercial Property strategically in Paco, Manila https://www.lamudi.com.ph/rare-sale-prime-commercial-property-strategically-in-paco-manila.html ₱ 168,000,000 ================================================================================ A Luxurious Unit E Townhouse For Sale in A Peaceful Neighborhood in Paco Manila https://www.lamudi.com.ph/a-luxurious-unit-e-townhouse-for-sale-in-a-peaceful-neighborhood-in-paco-manila.html ₱ 53,000,000 ================================================================================ For Sale Luxurious 4-Bedroom Townhouse in a Peaceful Neighborhood in Paco Manila https://www.lamudi.com.ph/for-sale-luxurious-4-bedroom-townhouse-in-a-peaceful-neighborhood-in-paco-manila.html ₱ 53,000,000 ================================================================================ ...