Scrapy get only text ignoring the commented content

Question

I researched but can't find any answers to my question: I want get the main content, ignoring the commented content, how should I do? my scrapy spider looks like: But this codes give me only some nt. plz help, thank you. Answer When /text() in XPath or ::text in CSS fails to produce the desired result, I use another library.

Accepted Answer

When /text() in XPath or ::text in CSS fails to produce the desired result, I use another library.to install it.pip3 install html2textfrom html2text import HTML2Texth = HTML2Text()h.ignore_links = Trueh.ignore_images = Trueh.ignore_emphasis = True#Inside the scrapy projectname = h.handle(row.xpath('td[2]').get()).strip()yield ....

Advertisement

Answer