I have such code:
from sys import getsizeof as sizer test_list = [i for i in range(10)] test_list2 = list([i for i in range(10)]) test_list3 = [] for i in range(10): test_list3.append(i) print(f"""{sizer(test_list)} bytes: {test_list} {sizer(test_list2)} bytes: {test_list2} {sizer(test_list3)} bytes: {test_list3}""")
With the next result:
184 bytes: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 136 bytes: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 184 bytes: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
The question is: Why does a list created with list([])
weigh less than a list created with just []
or for _ in condition
?
Advertisement
Answer
(The details in this answer depend on the implementation, they are written to match CPython 3.10.0. Other versions or other implementations of Python work differently.)
Lists in Python are dynamically sized, unfortunately our computers live in hardware land, where memory is physical and can’t be squished or stretched.
So if you make your list longer, by appending something to it, Python will have to allocate a new bit of memory and copy all the old cells to the new memory.
Now this is all pretty fast most of the time, but it all adds up. If you want to add a 1000 elements to an empty list one at a time, this means you will have to copy about a million cells in total.
So to make sure that it doesn’t have to do it too often, Python usually over-allocates the space needed. For example, if you add a single item to an empty list, it will allocate more than one cell, so Python doesn’t have to allocate any more cells for the next couple of items added. This is true whether you use list.append
to add items or you use list comprehensions.
Now there are other ways to create lists. Notably, you can create lists from existing iterable objects, using this syntax: list(obj)
. It works for any iterable, but if len(obj)
works, Python only has to allocate memory once, using the exact number of cells it’s going to need. So that’s the difference. You can also write it as list(range(10))
, in which case you don’t need to create the intermediate list, which does over-allocate.
As a bonus, try this:
lists = [[], [1], [1, 2], [1, 2, 3]] for lst in lists: print(sizer(lst), lst)
What is the pattern here? Do you know why the pattern is like that?