Replacing characters in Scrapy item

I’m trying to scrape from a commerce website using Scrapy. For the price tag, I want to remove the “$”, but my current code does not work.

  def parse(self, response):
    for sel in response.xpath('//section[@class="items-box"]'):
      item = ShopItem()
      item['name'] = sel.xpath('a/div/h3/text()').extract()
      item['price'] = sel.xpath('a/div/div/div[1]/text()').extract().replace("$", "")
      yield item

AttributeError: 'list' object has no attribute 'replace'

JavaScript
​x
 
  def parse(self, response):
    for sel in response.xpath('//section[@class="items-box"]'):
      item = ShopItem()
      item['name'] = sel.xpath('a/div/h3/text()').extract()
      item['price'] = sel.xpath('a/div/div/div[1]/text()').extract().replace("$", "")
      yield item
​
AttributeError: 'list' object has no attribute 'replace'
​

What is the appropriate method to remove characters when using Scrapy?

Answer

extract() would return you a list, you can use extract_first() to get a single value:

item['price'] = sel.xpath('a/div/div/div[1]/text()').extract_first().replace("$", "")

JavaScript
 
item['price'] = sel.xpath('a/div/div/div[1]/text()').extract_first().replace("$", "")
​

Or, you can use the .re() method, something like:

item['price'] = sel.xpath('a/div/div/div[1]/text()').re(r"$(.*?)")

JavaScript
 
item['price'] = sel.xpath('a/div/div/div[1]/text()').re(r"$(.*?)")
​

Advertisement

Answer