I am a first time user of concurrent.futures and following the official guides.
Problem: Inside the as_completed() block, how do I access the k, v which is inside the future_to_url?
The k variable is vital.
Using something like:
for (future, k,v) in zip(concurrent.futures.as_completed(future_to_url), urls.items()):
I stumbled on this post however I cannot decipher the syntax to reproduce
Original
def start(): with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor: future_to_url = {executor.submit(visit_url, v): v for k, v in urls.items()} for future in concurrent.futures.as_completed(future_to_url): data = future.result() json = data.json() print(f"k: {future[k]}")
Second Attempt – Using zip which breaks
def start(): with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor: future_to_url = {executor.submit(visit_url, v): v for k, v in urls.items()} for (future, k, v) in zip(concurrent.futures.as_completed(future_to_url), urls.items()): data = future.result() json = data.json() print(f"k: {k}")
Third Broken Attempt – Using Map source
for future, (k, v) in map(concurrent.futures.as_completed(future_to_url), scraping_robot_urls.items()):
TypeError: ‘generator’ object is not callable
Fourth Broken Attempt – Storing the k,v pairs before the as_completed() loop and pairing them with an enumerate index
with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor: future_to_url = {executor.submit(get_response, v): v for k, v in scraping_robot_urls.items()} info = {k: v for k, v in scraping_robot_urls.items()} for i, future in enumerate(concurrent.futures.as_completed(future_to_url)): url = future_to_url[future] data = future.result() print(f"data: {data}") print(f"key: {list(info)[i]} / url: {url}")
This does not work as the URL, does not match the key, they seem to be mismatched, and I cannot rely on this behaviour working.
For completeness, here are the dependencies
def visit_url(url): return requests.get(url) urls = { 'id123': 'www.google.com', 'id456': 'www.bing.com', 'id789': 'www.yahoo.com' }
Sources of inspiration:
Advertisement
Answer
This has nothing to do with futures and more to do with list comprehension.
future_to_url = {executor.submit(visit_url, v): v for k, v in urls.items()}
Is looping everything in the urls dict and getting the key and value(k, v) and submitting that to the executor to run visit_url. k and v will not be available outside of the for loop because the scope of those variables belong to the for loop.
If you want to have the results of the call and what URL it was called on you can pass the URL back as a return tuple:
from tornado import concurrent def start(): with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor: future_to_url = {executor.submit(visit_url, k, v): v for k, v in urls.items()} for future in concurrent.futures.as_completed(future_to_url): id, data = future.result() json = data.json() print(f"id: {id}") print(f"data: {json}") def visit_url(id, url): return id, requests.get(url) urls = { 'id123': 'www.google.com', 'id456': 'www.bing.com', 'id789': 'www.yahoo.com' }
After comments made by OP (mainly that this seems dirty by using the scope of the visit_url function to pass context/keys back after exec) I can propose a more OOP way of doing this:
import requests from tornado import concurrent class URL: def __init__(self, id, url): self.id = id self.url = url self.response = None def vist(self): self.response = requests.get(self.url) return self def start(): with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor: future_to_url = {executor.submit(c.vist): c for c in urls} for future in concurrent.futures.as_completed(future_to_url): data = future.result() print(f"response: {data.response}") print(f"id: {data.id}") urls = [ URL('id123', 'http://www.google.com'), URL('id456', 'http://www.bing.com'), URL('id789', 'http://www.yahoo.com') ] start()
This ensures the response, ID and URL are together in their class which might be cleaner for some. The for loop to submit to the executor is simplified as well.