Webscrape a product website like thingiverse

Question

I am very new in webscraping and I am trying to do a small project where I can scrape a website like Thingiverse or similar where different CAD (or similar) files are shown. I am trying to for a particular Search keyword obtain a list of all the results. When I inspect the website the different products are h…

Accepted Answer

For the original question:Class is passed as a dictionary items. Therefore change code to soup.find_all('div', { 'class' :'SearchResult__searchResultItem--c4VZk'})This demo BeautifulSoup scraping the html:from bs4 import BeautifulSouphtml = '''

Test

'''soup=BeautifulSoup(html,'html.parser')Result_list = soup.find_all('div', { 'class' :'SearchResult__searchResultItem--c4VZk'})print(Result_list)Output:[

Test

]For your edited question:BeautifulSoup(page, "lxml") this passes in your response object and not your HTML. The response object will contain HTTP status, headers and all sorts of information. To get the HTML try html = page.read().The website is loading html tags via JavaScript. Therefore urllib.request / BeautifulSoup will not be able to extract the data. You can test this by printing out the html using print(soup.prettify()). To get around this issue you can use some sort of web automation tool like selenium.Had the website returned the HTML as expected. The scrape code would look something like:from urllib.request import urlopenfrom bs4 import BeautifulSoupwith urlopen("https://www.thingiverse.com/search?q=vader&type=things&sort=relevant") as response: html = page.read() soup = BeautifulSoup(html, "lxml") print(soup.prettify()) # The HTML tag does not appear as it's generate by JavaScript. product_list = soup.find_all({ 'class' :'SearchResult__searchResultItem--c4VZk'}) print(product_list)

Advertisement

Answer