Skip to content
Advertisement

Tag: web-scraping

Scraping Amazon products names

I am trying to gather the first two pages products names on Amazon based on seller name. When I request the page, it has all elements I need ,however, when I use BeautifulSoup – they are not being listed. Here is my code: The links of products are not listed. If the Amazon API gives this information, I am open

Scraping Data from a website which uses Power BI – retrieving data from Power BI on a website

I want to scrape data from this page (and pages similar to it): https://cereals.ahdb.org.uk/market-data-centre/historical-data/feed-ingredients.aspx This page uses Power BI. Unfortunately, finding a way to scrape Power BI is hard, because everyone wants to scrape using/into Power BI, not from it. The closest answer was this question. Yet unrelated. Firstly, I used Apache tika, and soon I realized the table data

Web Scraping – URL extraction from Lazada ecommerce platform

I am currently trying to scrape the products URLs from Lazada ecommerce platform, however i am getting random links from the website rather than the products links. https://www.lazada.com.my/oldtown-white-coffee/?langFlag=en&q=All-Products&from=wangpu&pageTypeId=2 My code below: The result I am getting out of this code(which is not what i want) : This is the section of the links that I need, i wanted to list

Web scraping the data from multiple TOC using python or R

I am new to web scraping. I would like to collect the data from: https://www.sec.gov/Archives/edgar/data/814453/000119312518067603/d494599d10k.htm#tx494599_11 I can see a lot of TOCs are there. I would like to scrape the “Income before income taxes” word with the amount. Please share idea and throw some lights on this. Answer This will give your all the things from the table, you can

Python 404’ing on urllib.request

The basics of the code are below. I know for a fact how I’m retrieving these pages works for other URLs, as I just wrote a script scraping a different page in the same way. However with this specific URL it keeps throwing “urllib.error.HTTPError: HTTP Error 404: Not Found” in my face. I replaced the URL with a different one

Python, extract XHR response data from website

I am trying to extract some data from https://www.barchart.com/stocks/signals/top-bottom/top?viewName=main. I am able to extract data from normal html using the xpath method, however i noticed that this website gets its data from a network. I have found the location of where the data I want is (the table from the barchart website) which is shown in the picture below. Picture

Advertisement