I am trying to gather the first two pages products names on Amazon based on seller name. When I request the page, it has all elements I need ,however, when I use BeautifulSoup – they are not being listed. Here is my code: The links of products are not listed. If the Amazon API gives this information, I …
Tag: web-scraping
Scraping Data from a website which uses Power BI – retrieving data from Power BI on a website
I want to scrape data from this page (and pages similar to it): https://cereals.ahdb.org.uk/market-data-centre/historical-data/feed-ingredients.aspx This page uses Power BI. Unfortunately, finding a way to scrape Power BI is hard, because everyone wants to scrape using/into Power BI, not from it. The closest …
How can I get the text clean with scrapy shell
I’m trying the following command on scrapy shell which returns this result: The thing is, I want to extract only the word “Ajax” that is is between <strong> tags. Answer You need to add <strong> tag to your selector
Web Scraping – URL extraction from Lazada ecommerce platform
I am currently trying to scrape the products URLs from Lazada ecommerce platform, however i am getting random links from the website rather than the products links. https://www.lazada.com.my/oldtown-white-coffee/?langFlag=en&q=All-Products&from=wangpu&pageTypeId=2 My code below: The result I am ge…
Selenium unable to locate “app-id-title” element when trying to load google play page
I am trying to run this code to scrape reviews from the google play store – but I keep getting the following error: I suspect it has something to do with the id-app-title in Could someone point out where I would find that Id for the app I am interested in OR help me identify where I am going wrong.
Python, extract urls from xml sitemap that contain a certain word
I’m trying to extract all urls from a sitemap that contain the word foo in the url. I’ve managed to extract all the urls but can’t figure out how to only get the ones I want. So in the below example I only want the urls for apples and pears returned. Answer I modify the xml to valid format (…
How to get all external links found on a page using BeautifulSoup?
I’m reading the book, Web Scraping with Python which has the following function to retrieve external links found on a page: The problem is that it does not work the way it should. When i run it using the URL: http://www.oreilly.com, it returns this: Output: Question: Why are the first 16-17 entries cons…
Web scraping the data from multiple TOC using python or R
I am new to web scraping. I would like to collect the data from: https://www.sec.gov/Archives/edgar/data/814453/000119312518067603/d494599d10k.htm#tx494599_11 I can see a lot of TOCs are there. I would like to scrape the “Income before income taxes” word with the amount. Please share idea and thro…
Python 404’ing on urllib.request
The basics of the code are below. I know for a fact how I’m retrieving these pages works for other URLs, as I just wrote a script scraping a different page in the same way. However with this specific URL it keeps throwing “urllib.error.HTTPError: HTTP Error 404: Not Found” in my face. I repl…
Python, extract XHR response data from website
I am trying to extract some data from https://www.barchart.com/stocks/signals/top-bottom/top?viewName=main. I am able to extract data from normal html using the xpath method, however i noticed that this website gets its data from a network. I have found the location of where the data I want is (the table from…