Skip to content
Advertisement

Using threading/multiprocessing in Python to download images concurrently

I have a list of search queries to build a dataset:

classes = [...]. There are 100 search queries in this list.

Basically, I divide the list into 4 chunks of 25 queries.

JavaScript

And below, I’ve created a function that downloads queries from each chunk iteratively:

JavaScript

However, I want to run each 4 chunks concurrently. In other words, I want to run 4 separate iterative operations concurrently. I took both the Threading and Multiprocessing approaches but both of them don’t work:

JavaScript

Advertisement

Answer

You’re running download_chunk outside of the thread/process. You need to provide the function and arguments separately in order to delay execution:

For example:

JavaScript

Refer to the multiprocessing docs for more information about using the multiprocessing.Process class.

For this use-case, I would suggest using multiprocessing.Pool:

JavaScript

It handles the work of creating, starting, and later joining the 4 processes. Each process calls download_chunk with each of the arguments provided in the iterable, which is range(4) in this case.

More info about multiprocessing.Pool can be found in the docs.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement