I’m in a single-producer/multiple-consumers scenario. Consider that each job is independent and the consumers do not communicate among them.
Could it be a good idea to create a different queue for each consumer? In that way, the producer adds jobs in each queue in a round-robin fashion and there are no delays in accessing a single queue.
Or is it better to minimize the number of queues as much as possible?
In the case of a single Queue and lots of consumers (like 20 or more), is the delay due to the synchronization access to the queue relevant?
I’m using Python 3.7 and multithreading/multiprocessing to create several Consumers. Each Consumer needs to run an executable and perform some I/O operation (write, move or copy files). I’ve currently developed it with multiprocessing and single queue, but I’m thinking to change the approach to multithreading and multiple queues.
Single Queue
Consumer / / .. Producer --> [ Queue ] -- Consumer .. Consumer
Multiple Queue
-> [ Queue ] -- Consumer / / .. Producer ----- -> [ Queue ] -- Consumer .. -> [ Queue ] -- Consumer
Advertisement
Answer
Specifically in the case of one producer -> many consumers, the benefit is to only have 1 Queue that the producer has to connect to and you can spin up as many consumers as you want to process ‘the next item’. Because Python has a very complicated relationship with Threading, I would recommend to use asyncio
with asyncio.Queue
. It is very intuitive and easy to use.
I recently brushed up on this topic and I found this gist very helpful in understanding how it works.
In any case, having more Queues will probably not speed up your processing. This could only be the case if (time to process message) < (get message from queue), which is not the case in your use case (with IO tasks etc).