Iterables / generators with specified length

Question

Iterable objects are those that implement __iter__ function, which returns an iterator object, i.e. and object providing the functions __iter__ and __next__ and behaving correctly. Usually the size of the iterable object is not known beforehand, and iterable object is not expected to know how long the iteration will last; however, there are some cases in which knowing the length

Accepted Answer

It sounds like you&#8217;re asking about something like __length_hint__. Excerpts from PEP 424 – A method for exposing a length hint:CPython currently defines a __length_hint__ method on several types, such as various iterators. This method is then used by various other functions (such as list) to presize lists based on the estimate returned by __length_hint__. Types which are not sized, and thus should not define __len__, can then define __length_hint__, to allow estimating or computing a size (such as many iterators).Being able to pre-allocate lists based on the expected size, as estimated by __length_hint__, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present.For example, range iterators support this (Try it online!):it = iter(range(1000))print(it.__length_hint__())     # prints 1000next(it)print(it.__length_hint__())     # prints 999And list iterators even take list length changes into account (Try it online!):a = [None] * 10it = iter(a)print(it.__length_hint__())     # prints 10next(it)print(it.__length_hint__())     # prints 9a.pop()print(it.__length_hint__())     # prints 8a.append(None)print(it.__length_hint__())     # prints 9Generator iterators don&#8217;t support it, but you can support it in other iterators you write. Here&#8217;s a demo iterator that&#8230;Produces 10,000 elements.Hints at having 5,000 elements.After every 1,000 elements it shows the memory size of the list being built.import gcbeacon = object()class MyIterator:    def __init__(self):        self.n = 10_000    def __iter__(self):        return self    def __length_hint__(self):        print('__length_hint__ called')        return 5_000    def __next__(self):        if self.n == 0:            raise StopIteration        self.n -= 1        if self.n % 1_000 == 0:            for obj in gc.get_objects():                if isinstance(obj, list) and obj and obj[0] is beacon:                    print(obj.__sizeof__())        return beaconlist(MyIterator())Output (Try it online!):__length_hint__ called45088450884508845088450885077657168643607245681560We see that list asks for a length hint and from the start pre-allocates enough memory for 5,000 references of 8 bytes each, plus 12.5% overallocation. After the first 5,000 elements, it doesn&#8217;t ask for length hints anymore, and keeps increasing its size bit by bit.If my __length_hint__ instead accurately returns 10,000, then list instead pre-allocates 90088 bytes and that remains until the end.

Advertisement

Answer