Consider the following :
from scipy.spatial import ConvexHull import numpy as np pts = np.random.rand(30, 2) hull = ConvexHull(pts) foo = hull.points foo[0] = 4 print(pts[0]) # -> [4. 4.] bar = foo[0] bar[0] = 8 print(pts[0]) # -> [8. 4.]
How am i supposed to know that modifying hull.points (or foo, a reference to hull.points) is modifying pts ?
The documentation only say :
points: ndarray of double, shape (npoints, ndim) Coordinates of input points.
The inspector in pycharm also tell me that both foo and hull.points are a ndarray and nothing in the code, documentation, inspector tell me that my variables are, in fact, pointers referencing the same value (yes, i come from C, sorry)
It can go wrong horribly quickly because if i directly modify a single element of “pts” (the 2D array holding all the values referenced by my other variable/pointers) it modify all my variables too and my convex hull “bar” isn’t convex anymore :
... pts[0] = 16 print(bar) # -> [16. 16.]
unless i call again
pts = np.random.rand(30, 2) print(bar) # -> [16. 16.] print(pts[0]) # -> [some random value]
pts apparently became a whole different object in a new memory location so, in this specific case, bar is not a reference to pts anymore.
And it can go on forever : if i now modify foo = hull.points then bar isn’t a reference to foo anymore (well… it’s a reference to the “old foo” which is not accessible anymore)
My use case : with all the arguments passing and return values of differents method : I’m loosing track of all my reference and values. I end up unknowingly returning a list of value despite the fact that i modified (also unknowingly) a reference (and therefore the original ndarray) and i don’t even know anymore if my returned value is a reference or a standalone object that can be safely modified without messing up everything else.
full simplified use case :
from scipy.spatial import ConvexHull import numpy as np pts = np.random.rand(30, 2) hull = ConvexHull(pts) foo = hull.points foo[0] = 4 print(pts[0]) # -> [4. 4.] bar = foo[0] bar[0] = 8 print(pts[0]) # -> [8. 4.] qux = bar[0] # WOOPS /! qux isn't a reference to an ndarray element, it's just a "float" value bar[0] = 16 # BUT this is a reference so i end up modifying pts BUT not qux /! print(pts[0], foo[0], bar, qux) # -> [16. 4.] [16. 4.] [16. 4.] 8.0 qux = bar # /! qux is now again a reference to pts qux[0] = 128 print(pts[0], foo[0], bar, qux) # -> [128. 4.] [128. 4.] [128. 4.] [128. 4.] qux = foo[0] # remember that qux = bar[0] didn't create a reference ? qux[0] = 256 # but in this case, it is ! bar[0] is just a single float value while foo[0] is a reference to a ndarray print(pts[0], foo[0], bar, qux) # -> [256. 4.] [256. 4.] [256. 4.] [256. 4.]
And since this is so much “fun”. Now that i have qux and bar referencing foo[0], what happens to qux and bar if i say foo = None ? Nothing… qux an bar are still referencing pts[0] even if i never explicitly said so … i’m so lost.
I’m also wondering if i’m not in a special case because it’s numpy/scipy/ndarray. i never struggled like this before. (i got lucky ?)
Advertisement
Answer
In python variables are always references. But the nature of stored value type defines it’s behaviour. Variables can be mutable or immutable. Mutables are: int, float, str, tuple, etc. Immutables are most collections: dict, list, set, etc.
Consider this example:
a, b, c = 1, 2, 3 my_list = [a, b, c] new_list = my_list new_list[0] = 0 >>> print(a, b, c) ... 1 2 3 >>> print(my_list) ... [0, 2, 3] >>> print(new_list) ... [0, 2, 3]
What happens here: you do change 0th element of my_list. But since it is int
and is immutable, that 0th element will be assigned new value and will so a new reference. Yet a
will be still pointing to the same value as before.
This is basically the idea of immutable objects: changing it creates a new object in memory and updates pointer to point to this new object. So when you do a += 1
you in fact create a new int
object and set a
to point to that new object.
But my_list
is a list
and is mutable. So changing it will not change the reference. This way when you do new_list = my_list
you create variable new_list
that references the same object as my_list
. So changing one will change another.
Variable never holds a value itself, it’s always a reference. But changing variable doesn’t mean changing referenced object. For immutable object changing value is changing reference, for mutable objects changing value is changing it’s content. But there is never a mutable object that isn’t a collection of some sort. So when you change a mutable item of a list, list stays the same, but reference for that item is changed to a new reference. So that contents of a list are changing, but the actual list object stays the same.
Basically any data structure in python can be drilled down to immutables. Is it a list of int’s? Well, there are your immutables. Is it a list of a list of int’s? One level deeper there are still immutables. Is it a class instance? I bet it has fields, and fields are no different than any other data structure. You get the point.
Here is another example:
a, b = 1, 2 my_list = [a, b] >>> print(id(a), id(b)) ... 4325931056 4325931088 >>> print(id(my_list[0]), id(my_list[1])) ... 4325931056 4325931088 >>> my_list[0] += 10 >>> print(id(a), id(b)) ... 4325931056 4325931088 # a still has the same reference >>> print(id(my_list[0]), id(my_list[1])) ... 4325931376 4325931088 # my_list[0] now has a new reference >>> b += 10 >>> print(id(b)) ... 4325931408 # b now has a new reference >>> print(id(my_list[1])) ... 4325931088 # my_list[1] reference is still the same