Tag: web-scraping

Replacing characters in Scrapy item

I’m trying to scrape from a commerce website using Scrapy. For the price tag, I want to remove the “$”, but my current code does not work. What is the appropriate method to remove characters when using Scrapy? Answer extract() would return you a list, you can use extract_first() to get a single value: Or, you can use the .re()

Getting ‘wrong’ page source when calling url from python

html python url web-scraping

Trying to retrieve the page source from a website, I get a completely different (and shorter) text than when viewing the same page source through a web browser. https://stackoverflow.com/questions/24563601/python-getting-a-wrong-source-code-of-the-web-page-asp-net This fellow has a related issue, but obtained the home page source instead of the requested one – I am getting something completely alien. The code is: This is the page

Beautiful Soup if Class “Contains” or Regex?

beautifulsoup python regex web-scraping

If my class names are constantly different say for example: Normally I could do: There are way too many class names to work with here so a bunch of these are out. I know Python doesn’t have a “.contains” I would normally use but it does have an “in”. Though I haven’t been able to work out a way to

Web scraping a text() in python

html lxml.html python web-scraping xpath

I am having trouble with a web scraping function. The XPath for the two things I am trying to get are The html is I am trying to have a function to loop through each li in tr[5]. The problem I am having is getting the text(). I have tried a number of different variations of this function This specific

Walmart Price Scraping with Python 3

beautifulsoup python web-scraping

I am very new to this concept, but I am trying to learn how to use python to manipulate HTML data. I wrote a python (ver. 3.4.1) script which fetches the URL and returns some information, which I parse using BeautifulSoup (ver. 4). In this example, I am attempting to obtain the price of the Xbox One. I chose this

Extracting url from style: background-url: with beautifulsoup and without regex?

beautifulsoup python string web-scraping

I have: I want to get the url, however I don’t know how to do that without the use of regex. Is it even possible? so far my solution with regex is: Answer You could try using the cssutils package. Something like this should work: Although you are ultimately going to need to parse out the actual url this method

Python web-scraping error – TypeError: can’t use a string pattern on a bytes-like object

findall python web-scraping

I want to build a web scraper. Currently, I’m learning Python. This is the very basics! Python Code Error: Answer You have to decode your data. Since the website in question says use that. utf-8 won’t work in this case.

Beautiful Soup 4: Remove comment tag and its content

beautifulsoup html html-parsing python web-scraping

The page that I’m scraping contains these HTML codes. How do I remove the comment tag <!– –> along with its content with bs4? Answer You can use extract() (solution is based on this answer): PageElement.extract() removes a tag or string from the tree. It returns the tag or string that was extracted. As a result you get your div

‘Show more results’ while scraping mobile details from flipkart

beautifulsoup python web-scraping

My question is same as Scraping all mobiles of Flipkart.com. I tried the solution given over there, but that change in the start variable is not working , and I can only scrape the starting twenty mobile information only. The initial value of start was 21, so increased to 50, but still I am getting the same result. Answer There

How to run python script inside rails application in heroku?

heroku python ruby-on-rails scrapy web-scraping

I have a rails application hosted in heroku. I also wrote a web scraper using scrapy in python. I need to run the python script from the rails application in heroku,I will explain with an example. Eg:The user will input the url to scrape in my rails app.Then the rails app give control to python script to scrape data which