I’m storing a lot of complex data in tuples/lists, but would prefer to use small wrapper classes to make the data structures easier to understand, e.g.
class Person: def __init__(self, first, last): self.first = first self.last = last p = Person('foo', 'bar') print(p.last) ...
would be preferable over
p = ['foo', 'bar'] print(p[1]) ...
however there seems to be a horrible memory overhead:
l = [Person('foo', 'bar') for i in range(10000000)] # ipython now taks 1.7 GB RAM
and
del l l = [('foo', 'bar') for i in range(10000000)] # now just 118 MB RAM
Why? is there any obvious alternative solution that I didn’t think of?
Thanks!
(I know, in this example the ‘wrapper’ class looks silly. But when the data becomes more complex and nested, it is more useful)
Advertisement
Answer
As others have said in their answers, you’ll have to generate different objects for the comparison to make sense.
So, let’s compare some approaches.
tuple
l = [(i, i) for i in range(10000000)] # memory taken by Python3: 1.0 GB
class Person
class Person: def __init__(self, first, last): self.first = first self.last = last l = [Person(i, i) for i in range(10000000)] # memory: 2.0 GB
namedtuple
(tuple
+ __slots__
)
from collections import namedtuple Person = namedtuple('Person', 'first last') l = [Person(i, i) for i in range(10000000)] # memory: 1.1 GB
namedtuple
is basically a class that extends tuple
and uses __slots__
for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True
).
class Person
+ __slots__
class Person: __slots__ = ['first', 'last'] def __init__(self, first, last): self.first = first self.last = last l = [Person(i, i) for i in range(10000000)] # memory: 0.9 GB
This is a trimmed-down version of namedtuple
above. A clear winner, even better than pure tuples.