Say I have a list of items, and I want to iterate over the first few of it:
items = list(range(10)) # I mean this to represent any kind of iterable. limit = 5
Naive implementation
The Python naïf coming from other languages would probably write this perfectly serviceable and performant (if unidiomatic) code:
index = 0 for item in items: # Python's `for` loop is a for-each. print(item) # or whatever function of that item. index += 1 if index == limit: break
More idiomatic implementation
But Python has enumerate, which subsumes about half of that code nicely:
for index, item in enumerate(items): print(item) if index == limit: # There's gotta be a better way. break
So we’ve about cut the extra code in half. But there’s gotta be a better way.
Can we approximate the below pseudocode behavior?
If enumerate took another optional stop
argument (for example, it takes a start
argument like this: enumerate(items, start=1)
) that would, I think, be ideal, but the below doesn’t exist (see the documentation on enumerate here):
# hypothetical code, not implemented: for _, item in enumerate(items, start=0, stop=limit): # `stop` not implemented print(item)
Note that there would be no need to name the index
because there is no need to reference it.
Is there an idiomatic way to write the above? How?
A secondary question: why isn’t this built into enumerate?
Advertisement
Answer
How can I limit iterations of a loop in Python?
for index, item in enumerate(items): print(item) if index == limit: breakIs there a shorter, idiomatic way to write the above? How?
Including the index
zip
stops on the shortest iterable of its arguments. (In contrast with the behavior of zip_longest
, which uses the longest iterable.)
range
can provide a limited iterable that we can pass to zip along with our primary iterable.
So we can pass a range
object (with its stop
argument) to zip
and use it like a limited enumerate.
zip(range(limit), items)
Using Python 3, zip
and range
return iterables, which pipeline the data instead of materializing the data in lists for intermediate steps.
for index, item in zip(range(limit), items): print(index, item)
To get the same behavior in Python 2, just substitute xrange
for range
and itertools.izip
for zip
.
from itertools import izip for index, item in izip(xrange(limit), items): print(item)
If not requiring the index, itertools.islice
You can use itertools.islice
:
for item in itertools.islice(items, 0, stop): print(item)
which doesn’t require assigning to the index.
Composing enumerate(islice(items, stop))
to get the index
As Pablo Ruiz Ruiz points out, we can also compose islice with enumerate.
for index, item in enumerate(islice(items, limit)): print(index, item)
Why isn’t this built into
enumerate
?
Here’s enumerate implemented in pure Python (with possible modifications to get the desired behavior in comments):
def enumerate(collection, start=0): # could add stop=None i = start it = iter(collection) while 1: # could modify to `while i != stop:` yield (i, next(it)) i += 1
The above would be less performant for those using enumerate already, because it would have to check whether it is time to stop every iteration. We can just check and use the old enumerate if don’t get a stop argument:
_enumerate = enumerate def enumerate(collection, start=0, stop=None): if stop is not None: return zip(range(start, stop), collection) return _enumerate(collection, start)
This extra check would have a slight negligible performance impact.
As to why enumerate does not have a stop argument, this was originally proposed (see PEP 279):
This function was originally proposed with optional start and stop arguments. GvR [Guido van Rossum] pointed out that the function call
enumerate(seqn, 4, 6)
had an alternate, plausible interpretation as a slice that would return the fourth and fifth elements of the sequence. To avoid the ambiguity, the optional arguments were dropped even though it meant losing flexibility as a loop counter. That flexibility was most important for the common case of counting from one, as in:for linenum, line in enumerate(source,1): print linenum, line
So apparently start
was kept because it was very valuable, and stop
was dropped because it had fewer use-cases and contributed to confusion on the usage of the new function.
Avoid slicing with subscript notation
Another answer says:
Why not simply use
for item in items[:limit]: # or limit+1, depends
Here’s a few downsides:
- It only works for iterables that accept slicing, thus it is more limited.
- If they do accept slicing, it usually creates a new data structure in memory, instead of iterating over the reference data structure, thus it wastes memory (All builtin objects make copies when sliced, but, for example, numpy arrays make a view when sliced).
- Unsliceable iterables would require the other kind of handling. If you switch to a lazy evaluation model, you’ll have to change the code with slicing as well.
You should only use slicing with subscript notation when you understand the limitations and whether it makes a copy or a view.
Conclusion
I would presume that now the Python community knows the usage of enumerate, the confusion costs would be outweighed by the value of the argument.
Until that time, you can use:
for index, element in zip(range(limit), items): ...
or
for index, item in enumerate(islice(items, limit)): ...
or, if you don’t need the index at all:
for element in islice(items, 0, limit): ...
And avoid slicing with subscript notation, unless you understand the limitations.