Skip to content
Advertisement

Combing concurrent.future.as_complete() with dictionary using zip()

I am a first time user of concurrent.futures and following the official guides.

Problem: Inside the as_completed() block, how do I access the k, v which is inside the future_to_url?

The k variable is vital.

Using something like:

for (future, k,v) in zip(concurrent.futures.as_completed(future_to_url), urls.items()):

I stumbled on this post however I cannot decipher the syntax to reproduce

Original

def start():
    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(visit_url, v): v for k, v in urls.items()}
        for future in concurrent.futures.as_completed(future_to_url):
            data = future.result()
            json = data.json()
            print(f"k: {future[k]}")

Second Attempt – Using zip which breaks

def start():
    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(visit_url, v): v for k, v in urls.items()}
        for (future, k, v) in zip(concurrent.futures.as_completed(future_to_url), urls.items()):
            data = future.result()
            json = data.json()
            print(f"k: {k}")

Third Broken Attempt – Using Map source

for future, (k, v) in map(concurrent.futures.as_completed(future_to_url), scraping_robot_urls.items()):

TypeError: ‘generator’ object is not callable

Fourth Broken Attempt – Storing the k,v pairs before the as_completed() loop and pairing them with an enumerate index

    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(get_response, v): v for k, v in scraping_robot_urls.items()}
        info = {k: v for k, v in scraping_robot_urls.items()}
        for i, future in enumerate(concurrent.futures.as_completed(future_to_url)):
            url = future_to_url[future]
            data = future.result()
            print(f"data: {data}")
            print(f"key: {list(info)[i]} / url: {url}")

This does not work as the URL, does not match the key, they seem to be mismatched, and I cannot rely on this behaviour working.

For completeness, here are the dependencies

def visit_url(url):
    return requests.get(url)

urls = {
  'id123': 'www.google.com', 
  'id456': 'www.bing.com', 
  'id789': 'www.yahoo.com'
}

Sources of inspiration:

Advertisement

Answer

This has nothing to do with futures and more to do with list comprehension.

    future_to_url = {executor.submit(visit_url, v): v for k, v in urls.items()}

Is looping everything in the urls dict and getting the key and value(k, v) and submitting that to the executor to run visit_url. k and v will not be available outside of the for loop because the scope of those variables belong to the for loop.

If you want to have the results of the call and what URL it was called on you can pass the URL back as a return tuple:

from tornado import concurrent


def start():
    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(visit_url, k, v): v for k, v in urls.items()}
        for future in concurrent.futures.as_completed(future_to_url):
            id, data = future.result()
            json = data.json()
            print(f"id: {id}")
            print(f"data: {json}")

def visit_url(id, url):
    return id, requests.get(url)

urls = {
  'id123': 'www.google.com',
  'id456': 'www.bing.com',
  'id789': 'www.yahoo.com'
}

After comments made by OP (mainly that this seems dirty by using the scope of the visit_url function to pass context/keys back after exec) I can propose a more OOP way of doing this:

import requests
from tornado import concurrent

class URL:
    def __init__(self, id, url):
        self.id = id
        self.url = url
        self.response = None

    def vist(self):
        self.response = requests.get(self.url)
        return self

def start():
    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(c.vist): c for c in urls}
        for future in concurrent.futures.as_completed(future_to_url):
            data = future.result()
            print(f"response: {data.response}")
            print(f"id: {data.id}")

urls = [
  URL('id123', 'http://www.google.com'),
  URL('id456', 'http://www.bing.com'),
  URL('id789', 'http://www.yahoo.com')
]

start()

This ensures the response, ID and URL are together in their class which might be cleaner for some. The for loop to submit to the executor is simplified as well.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement