Skip to content
Advertisement

How to identify minimum squared value of an entire pandas dataframe column by column?

I have a pandas dataframe like this:

column1 | column2  | column3
1       | 4        |   10.4  
4       | 7        |   11.1
3       | 3        |   3.3

How could I calculate the sum of the squared values for the entire column (I am trying something like deviation = df[columnName].pow(2).sum() in a loop, but ideas are very welcome!) but also afterwards identifying the column that has the smallest of those sums and the actual smallest sum?

Edit: Adding desired output

Desired output in this case would be:

Minimum sum of squared values: 26
Column containing minimum sum of squared values: column1

Advertisement

Answer

You can calculate the sum of square on the entire data frame, which returns a Series object with the column names as index. And then you can find the minimum value as well as minimum index using min and idxmin:

col_squares = df.pow(2).sum()

col_squares
#column1     26.00
#column2     74.00
#column3    242.26
#dtype: float64

col_squares.min(), col_squares.idxmin()
#(26.0, 'column1')
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement