I am trying to gather the first two pages products names on Amazon based on seller name. When I request the page, it has all elements I need ,however, when I use BeautifulSoup – they are not being listed. Here is my code: The links of products are not listed. If the Amazon API gives this information, I am open
Tag: web-scraping
Scraping Data from a website which uses Power BI – retrieving data from Power BI on a website
I want to scrape data from this page (and pages similar to it): https://cereals.ahdb.org.uk/market-data-centre/historical-data/feed-ingredients.aspx This page uses Power BI. Unfortunately, finding a way to scrape Power BI is hard, because everyone wants to scrape using/into Power BI, not from it. The closest answer was this question. Yet unrelated. Firstly, I used Apache tika, and soon I realized the table data
How can I get the text clean with scrapy shell
I’m trying the following command on scrapy shell which returns this result: The thing is, I want to extract only the word “Ajax” that is is between <strong> tags. Answer You need to add <strong> tag to your selector
Web Scraping – URL extraction from Lazada ecommerce platform
I am currently trying to scrape the products URLs from Lazada ecommerce platform, however i am getting random links from the website rather than the products links. https://www.lazada.com.my/oldtown-white-coffee/?langFlag=en&q=All-Products&from=wangpu&pageTypeId=2 My code below: The result I am getting out of this code(which is not what i want) : This is the section of the links that I need, i wanted to list
Selenium unable to locate “app-id-title” element when trying to load google play page
I am trying to run this code to scrape reviews from the google play store – but I keep getting the following error: I suspect it has something to do with the id-app-title in Could someone point out where I would find that Id for the app I am interested in OR help me identify where I am going wrong.
Python, extract urls from xml sitemap that contain a certain word
I’m trying to extract all urls from a sitemap that contain the word foo in the url. I’ve managed to extract all the urls but can’t figure out how to only get the ones I want. So in the below example I only want the urls for apples and pears returned. Answer I modify the xml to valid format (add
How to get all external links found on a page using BeautifulSoup?
I’m reading the book, Web Scraping with Python which has the following function to retrieve external links found on a page: The problem is that it does not work the way it should. When i run it using the URL: http://www.oreilly.com, it returns this: Output: Question: Why are the first 16-17 entries considered “external links”? They belong to the same
Web scraping the data from multiple TOC using python or R
I am new to web scraping. I would like to collect the data from: https://www.sec.gov/Archives/edgar/data/814453/000119312518067603/d494599d10k.htm#tx494599_11 I can see a lot of TOCs are there. I would like to scrape the “Income before income taxes” word with the amount. Please share idea and throw some lights on this. Answer This will give your all the things from the table, you can
Python 404’ing on urllib.request
The basics of the code are below. I know for a fact how I’m retrieving these pages works for other URLs, as I just wrote a script scraping a different page in the same way. However with this specific URL it keeps throwing “urllib.error.HTTPError: HTTP Error 404: Not Found” in my face. I replaced the URL with a different one
Python, extract XHR response data from website
I am trying to extract some data from https://www.barchart.com/stocks/signals/top-bottom/top?viewName=main. I am able to extract data from normal html using the xpath method, however i noticed that this website gets its data from a network. I have found the location of where the data I want is (the table from the barchart website) which is shown in the picture below. Picture