Skip to content
Advertisement

Extracting product URLs from a search query on a website

If I were for example looking to track the price changes of MIDI keyboards on https://www.gear4music.com/Studio-MIDI-Controllers. I would need to extract all the URLs of the products pictured from the search and then loop through the URLs of the products and extract price info for each product. I can obtain the price data of an individual product by hard coding the URL but I cannot find a way to automate getting the URLs of multiple products.

So far I have tried this,

from bs4 import BeautifulSoup
import requests

url = "https://www.gear4music.com/Studio-MIDI- Controllers"

response = requests.get(url)

data = response.text

soup = BeautifulSoup(data, 'lxml')

tags = soup.find_all('a')

for tag in tags:
    print(tag.get('href'))

This does produce a list of URLs but I cannot make out which ones relate specifically to the MIDI keyboards in that search query that I want to obtain the price product info of. Is there a better more specific way to obtain the URLs of the products only and not everything within the HTML file?

Advertisement

Answer

There are many ways how to obtain product links. One way could be select all <a> tags which have data-g4m-inv= attribute:

import requests
from bs4 import BeautifulSoup

url = "https://www.gear4music.com/Studio-MIDI-Controllers"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for a in soup.select("a[data-g4m-inv]"):
    print("https://www.gear4music.com" + a["href"])

Prints:

https://www.gear4music.com/Recording-and-Computers/SubZero-MiniPad-MIDI-Controller/P6E
https://www.gear4music.com/Recording-and-Computers/SubZero-MiniControl-MIDI-Controller/P6D
https://www.gear4music.com/Keyboards-and-Pianos/SubZero-MiniKey-25-Key-MIDI-Controller/JMR
https://www.gear4music.com/Keyboards-and-Pianos/Nektar-SE25/2XWA
https://www.gear4music.com/Keyboards-and-Pianos/Korg-nanoKONTROL2-USB-MIDI-Controller-Black/G8L
https://www.gear4music.com/Recording-and-Computers/SubZero-ControlKey25-MIDI-Keyboard/221Y
https://www.gear4music.com/Keyboards-and-Pianos/SubZero-CommandKey25-Universal-MIDI-Controller/221X

...
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement