I expected array.array
to be faster than lists, as arrays seem to be unboxed.
However, I get the following result:
In [1]: import array In [2]: L = list(range(100000000)) In [3]: A = array.array('l', range(100000000)) In [4]: %timeit sum(L) 1 loop, best of 3: 667 ms per loop In [5]: %timeit sum(A) 1 loop, best of 3: 1.41 s per loop In [6]: %timeit sum(L) 1 loop, best of 3: 627 ms per loop In [7]: %timeit sum(A) 1 loop, best of 3: 1.39 s per loop
What could be the cause of such a difference?
Advertisement
Answer
The storage is “unboxed”, but every time you access an element Python has to “box” it (embed it in a regular Python object) in order to do anything with it. For example, your sum(A)
iterates over the array, and boxes each integer, one at a time, in a regular Python int
object. That costs time. In your sum(L)
, all the boxing was done at the time the list was created.
So, in the end, an array is generally slower, but requires substantially less memory.
Here’s the relevant code from a recent version of Python 3, but the same basic ideas apply to all CPython implementations since Python was first released.
Here’s the code to access a list item:
PyObject * PyList_GetItem(PyObject *op, Py_ssize_t i) { /* error checking omitted */ return ((PyListObject *)op) -> ob_item[i]; }
There’s very little to it: somelist[i]
just returns the i
‘th object in the list (and all Python objects in CPython are pointers to a struct whose initial segment conforms to the layout of a struct PyObject
).
And here’s the __getitem__
implementation for an array
with type code l
:
static PyObject * l_getitem(arrayobject *ap, Py_ssize_t i) { return PyLong_FromLong(((long *)ap->ob_item)[i]); }
The raw memory is treated as a vector of platform-native C
long
integers; the i
‘th C long
is read up; and then PyLong_FromLong()
is called to wrap (“box”) the native C long
in a Python long
object (which, in Python 3, which eliminates Python 2’s distinction between int
and long
, is actually shown as type int
).
This boxing has to allocate new memory for a Python int
object, and spray the native C long
‘s bits into it. In the context of the original example, this object’s lifetime is very brief (just long enough for sum()
to add the contents into a running total), and then more time is required to deallocate the new int
object.
This is where the speed difference comes from, always has come from, and always will come from in the CPython implementation.