Skip to content
Advertisement

Scrapping with request_HTML

I am trying to scrape this website down below: https://www.kayak-polo.info/kphistorique.php?Group=CE&lang=en

down below is my code. I am trying to actually get the text inside the caption element (as shown on the screenshot). However I believe I cannot find the
tag because it has no closing tag and that’s why I think it’s not returning the text.

For clarity purposes. I already have the tournament name. But I would like the category too which is “men” in the screenshot below

screenshot of element

JavaScript

I tried multiple things, such as trying to get the HTML of that element and then wanting to a do .split on certain things. However it seems when I do .html I get the entire page’s html which doesn’t help my case.

I also tried .attrs in the hopes of finding the right tag, but it returns nothing.

Advertisement

Answer

Here is one possible solution:

JavaScript

The performance of this solution(processing all 4960 elements) is 55 sec

Output:

JavaScript

Solution based on ThreadPoolExecutor:

JavaScript

The performance of this solution(processing all 4960 elements) is ~35 sec

And of course, since in this solution we work with threads all data will be mixed

Output:

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement