Are there any deliberately unfair (LIFO) synchronisation constructs within Python async?

Question

I'm looking for a way to rate limit a recursive algorithm. To demonstrate the problem I'll use the example of exploring a directory tree: The problem with this code is the way the number of active tasks explode exponentially. Many of these tasks hold OS resources, so even though tasks themselves are theoretically cheap, running millions simultaneously is going to

Accepted Answer

To my surprise python&#8217;s semaphore is not fair.  There&#8217;s a race condition which gives priority to new calls to acquire() over tasks which have been released but not yet executed on the event loop.  That is:sem.realease()await sem.acquire()Irrespective of the number of waiting tasks, the above code will never block and will even re-order the queue of waiting tasks as a result.  So sadly this object is useless for enforcing a strict order.I wrote my own:class FairSemaphore:    """    Semaphore with strictly controlled order.    By default this will be first-in-first-out but can be configured to be last-in-first-out    """    _queue: Deque[asyncio.Future]    _value: int    _fifo: bool    def __init__(self, value: int, fifo=True):        """        Initial value of the semaphore        :param value: Initial value for the semaphore        :param fifo: If True (default) the first task to call acquire() will be the first to be released.            If False the last task to call acquire() at the moment release() is called will be the first to be released.        """        self._value = value        self._queue = collections.deque()        self._fifo = fifo    def locked(self) -> bool:        """        Indicates if acquire() can be called without blocking.        """        return not self._value    async def acquire(self):        if self._value:            self._value -= 1        else:            loop: asyncio.AbstractEventLoop = asyncio.get_running_loop()            future = loop.create_future()            self._queue.append(future)            try:                await future            except:                # This condition happens when the future's result was set but the task was cancelled                # In other words another task completed and released this one... but this one got cancelled before it                # could do anything.  As a result we need to release another.                if not future.cancelled():                    self.release()                # else:                # But if we were NOT released then we do not have the right to release another.                raise    def release(self):        # Tasks can get cancelled while in the queue.        # Naively you would expect their _acquire() code to remove them from the queue.  But that doesn't always work        # because the event loop might not have given them chance execute the CancelledError except clause yet.        # It's absolutely unavoidable that there could be cancelled tasks waiting on this queue.        # When that happen the done() state of the future goes to True...        while self._queue:            future = self._queue.popleft() if self._fifo else self._queue.pop()            if not future.done():                future.set_result(None)                break            # ... we discard any task which is already "done" because        else:            self._value += 1    async def __aenter__(self):        await self.acquire()    async def __aexit__(self, exc_type, exc, tb):        self.release()

Advertisement

Answer