I have a pandas dataframe like this:
JavaScript
x
5
1
column1 | column2 | column3
2
1 | 4 | 10.4
3
4 | 7 | 11.1
4
3 | 3 | 3.3
5
How could I calculate the sum of the squared values for the entire column (I am trying something like deviation = df[columnName].pow(2).sum()
in a loop, but ideas are very welcome!) but also afterwards identifying the column that has the smallest of those sums and the actual smallest sum?
Edit: Adding desired output
Desired output in this case would be:
JavaScript
1
3
1
Minimum sum of squared values: 26
2
Column containing minimum sum of squared values: column1
3
Advertisement
Answer
You can calculate the sum of square on the entire data frame, which returns a Series object with the column names as index. And then you can find the minimum value as well as minimum index using min
and idxmin
:
JavaScript
1
11
11
1
col_squares = df.pow(2).sum()
2
3
col_squares
4
#column1 26.00
5
#column2 74.00
6
#column3 242.26
7
#dtype: float64
8
9
col_squares.min(), col_squares.idxmin()
10
#(26.0, 'column1')
11