I’m trying to scrape the below href from a site. There are several hrefs on the site which I intend to scrape and so I am looping through the site in order to store them all in one list. Below is an example of one of the hrefs.
<div class="col-md-4 h-gutter"> <div class="product box" data-productid="2111214"> <a href="/products/examples/product1/"> <h3>Product 1</h3> <div class="product-small-text">
Here is the section of my code in question. Commented out is my attempt to gather just the hrefs. As this is not working, for now I’m attempting to scrape the entirety of “col-md-4 h-gutter”
for product in soup.select('div.product.box'): link.append(product) #link.append(product.a['href']) print(link)
Below is what is being printed to terminal. As you can see the hrefs are hidden behind a placeholder.
</div>, <div class="product placeholder-container box"> <h3><span class="placeholder-text--long"></span></h3> <div class="product-small-text"> <span class="placeholder-text--short"></span> </div>
How do I print out the value of href?
Advertisement
Answer
It’s much easier to use the json response. If you need it in a table form, just feed that into pandas:
import requests import pandas as pd url = 'https://www.masterofmalt.com/api/v2/lightningdeals/?isVatableCountry=1&deliveryCountryId=464&filter=nodrams&_=1617024330709&format=json' headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'} jsonData = requests.get(url, headers=headers).json() df = pd.DataFrame(jsonData['lightningDeals'])
Output: first 5 rows of 43 rows
print(df.head(5).to_string()) productUrl productImageUrl productRating productReviewCount productVolume productAbv categories endDateUtc productId productName dealPrice previousPrice timeRemaining saving percentageClaimed isActive dailyDeal 0 /whiskies/tobermory/tobermory-12-year-old-whisky/ /whiskies/p-IMAGEPRESET/tobermory/tobermory-12-year-old-whisky.jpg 5.0 17 70 46.3 [Whiskies, Single Malt] 2021-04-04T22:57:00.0000000 87989 Tobermory 12 Year Old 34.85 39.85 550379 5.0 0.669725 True False 1 /whiskies/elements-of-islay/peat-pure-islay-elements-of-islay-whisky/ /whiskies/p-IMAGEPRESET/elements-of-islay/peat-pure-islay-elements-of-islay-whisky.jpg 0.0 0 50 45.0 [Whiskies, Blended Malt] 2021-04-04T22:59:00.0000000 58061 Peat Pure Islay 23.94 28.94 550499 5.0 0.625000 True False 2 /mezcal/ilegal/ilegal-reposado-mezcal/ /mezcal/p-IMAGEPRESET/ilegal/ilegal-reposado-mezcal.jpg 5.0 3 70 40.0 [Mezcal, Reposado] 2021-04-04T22:59:00.0000000 9277 Ilegal Reposado 53.40 59.40 550499 6.0 0.500000 True False 3 /whiskies/nikka/nikka-coffey-grain-whisky-70cl/ /whiskies/p-IMAGEPRESET/nikka/nikka-coffey-grain-whisky-70cl.jpg 4.5 40 70 45.0 [Whiskies, Grain] 2021-04-04T22:57:00.0000000 32316 Nikka Coffey Grain 70cl 49.83 54.83 550379 5.0 0.410256 True False 4 /rum/satchmo/satchmo-mojito-spirited-rum/ /rum/p-IMAGEPRESET/satchmo/satchmo-mojito-spirited-rum.jpg 5.0 14 70 37.5 [Rum, Spiced] 2021-04-04T22:58:00.0000000 106576 Satchmo Rum 34.95 39.95 550439 5.0 0.338710 True False