This is rather the inverse of What can you use Python generator functions for?: python generators, generator expressions, and the itertools
module are some of my favorite features of python these days. They’re especially useful when setting up chains of operations to perform on a big pile of data–I often use them when processing DSV files.
So when is it not a good time to use a generator, or a generator expression, or an itertools
function?
- When should I prefer
zip()
overitertools.izip()
, or range()
overxrange()
, or[x for x in foo]
over(x for x in foo)
?
Obviously, we eventually need to “resolve” a generator into actual data, usually by creating a list or iterating over it with a non-generator loop. Sometimes we just need to know the length. This isn’t what I’m asking.
We use generators so that we’re not assigning new lists into memory for interim data. This especially makes sense for large datasets. Does it make sense for small datasets too? Is there a noticeable memory/cpu trade-off?
I’m especially interested if anyone has done some profiling on this, in light of the eye-opening discussion of list comprehension performance vs. map() and filter(). (alt link)
Advertisement
Answer
Use a list instead of a generator when:
1) You need to access the data multiple times (i.e. cache the results instead of recomputing them):
for i in outer: # used once, okay to be a generator or return a list for j in inner: # used multiple times, reusing a list is better ...
2) You need random access (or any access other than forward sequential order):
for i in reversed(data): ... # generators aren't reversible s[i], s[j] = s[j], s[i] # generators aren't indexable
3) You need to join strings (which requires two passes over the data):
s = ''.join(data) # lists are faster than generators in this use case
4) You are using PyPy which sometimes can’t optimize generator code as much as it can with normal function calls and list manipulations.