I am learning Python, and have encountered numpy.sum
. It has an optional parameter axis
. This parameter is used to get either column-wise summation or row-wise summation. When axis = 0
we imply to sum it over columns only. For example,
a = np.array([[1, 2, 3], [4, 5, 6]]) np.sum(a, axis = 0)
This snippet of code produces output: array([5, 7, 9])
, fine. But if I do:
a = np.array([1, 2, 3]) np.sum(a, axis = 0)
I get result: 6
, why is that? Shouldn’t I get array([1, 2, 3])
?
Advertisement
Answer
All that is going on is that numpy is summing across the first (0th) and only axis. Consider the following:
In [2]: a = np.array([1, 2, 3]) In [3]: a.shape Out[3]: (3,) In [4]: len(a.shape) # number of dimensions Out[4]: 1 In [5]: a1 = a.reshape(3,1) In [6]: a2 = a.reshape(1,3) In [7]: a1 Out[7]: array([[1], [2], [3]]) In [8]: a2 Out[8]: array([[1, 2, 3]]) In [9]: a1.sum(axis=1) Out[9]: array([1, 2, 3]) In [10]: a1.sum(axis=0) Out[10]: array([6]) In [11]: a2.sum(axis=1) Out[11]: array([6]) In [12]: a2.sum(axis=0) Out[12]: array([1, 2, 3])
So, to be more explicit:
In [15]: a1.shape Out[15]: (3, 1)
a1
is 2-dimensional, the “long” axis being the first.
In [16]: a1[:,0] # give me everything in the first axis, and the first part of the second Out[16]: array([1, 2, 3])
Now, sum along the first axis:
In [17]: a1.sum(axis=0) Out[17]: array([6])
Now, consider a less trivial two-dimensional case:
In [20]: b = np.array([[1,2,3],[4,5,6]]) In [21]: b Out[21]: array([[1, 2, 3], [4, 5, 6]]) In [22]: b.shape Out[22]: (2, 3)
The first axis is the “rows”. Sum along the rows:
In [23]: b.sum(axis=0) Out[23]: array([5, 7, 9])
The second axis are the “columns”. Sum along the columns:
In [24]: b.sum(axis=1) Out[24]: array([ 6, 15])