Skip to content
Advertisement

scrapy css selector returning None then finds value

So basically I am adding this portion to my code and I have no clue whats going on. This is the link i am using https://www.digikey.com/products/en?keywords=ID82C55 All in the same Process: -So my css selector returns none. -Then it finds a couple of the html elements returns some of them. -Then finds the last element.

So this is causing my program to mix match data and yields it incorrectly to my csv file. If anyone could tell me what the problem is here? Thanks.

Code

def parse(self, response):

            
            for b in response.css('div#pdp_content.product-details > div'):

                if b.css('div.product-details-headline h1::text').get():
                    part = b.css('div.product-details-headline h1::text').get()
                    part = part.strip()
                    parts1 = part
                    print(b.css('div.product-details-headline h1::text').get())
                    print(parts1)

                else:
                    print(b.css('div.product-details-headline h1::text').get())

                if b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get():
                    cleaned_quantity = b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get()
                    print(cleaned_quantity)
                else:
                    print(b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get())
                if b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get():
                    cleaned_price = b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get()
                    print(cleaned_price)

                else:
                    print(b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get())
                if b.css('div.quantity-message span#dkQty::text').get():
                    cleaned_stock = b.css('div.quantity-message span#dkQty::text').get()
                    print(cleaned_stock)

                else:
                    print(b.css('div.quantity-message span#dkQty::text').get())

                if b.css('table#product-attribute-table > tr:nth-child(7) td::text').get():
                    status = b.css('table#product-attribute-table > tr:nth-child(7) td::text').get()
                    status = status.strip()
                    cleaned_status = status
                    print(cleaned_status)

                else:
                    print(b.css('table#product-attribute-table > tr:nth-child(7) td::text').get())

                # yield {
                #     'Part': parts1,
                #     'Quantity': cleaned_quantity,
                #     'Price': cleaned_price,
                #     'Stock': cleaned_stock,
                #     'Status': cleaned_status,
                # }

Output

None
None
None
None
None
None
2,500
29.10828
29
None

                                ID82C55A
                            
ID82C55A
None
None
None
Active

Advertisement

Answer

I highly recommend you to switch to XPath expressions:

part_number = b.xpath('.//th[.="Manufacturer Part Number"]/following-sibling::td[1]/text()').get()
stock = b.xpath('.//span[.="In Stock"]/preceding-sibling::span[1]/text()').get()
etc.
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement