Why does linkextractor skip link?

Question

I am Scraping some pages and am trying to use the LinkExtractor to get the URLs from the response. In general that is going quite ok, but the LinkExtractor is not able to extract the relative link to a pdf file that is found at line 111 of the html I have tried a lot, but haven&#8217;t been able to

Accepted Answer

set() takes a sequence as an argument and makes a set of each item in the sequence.  Strings are sequences of individual characters, so set("pdf") makes a set of the characters p d f.If you want the whole string &#8220;pdf&#8221; in the set, then you need to enclose it in a list:set(["pdf"])Or it might be simpler to use {} notation instead of calling set():{"pdf"}

Advertisement

Answer