I have a pandas dataframe like this:
column1 | column2 | column3 1 | 4 | 10.4 4 | 7 | 11.1 3 | 3 | 3.3
How could I calculate the sum of the squared values for the entire column (I am trying something like deviation = df[columnName].pow(2).sum()
in a loop, but ideas are very welcome!) but also afterwards identifying the column that has the smallest of those sums and the actual smallest sum?
Edit: Adding desired output
Desired output in this case would be:
Minimum sum of squared values: 26 Column containing minimum sum of squared values: column1
Advertisement
Answer
You can calculate the sum of square on the entire data frame, which returns a Series object with the column names as index. And then you can find the minimum value as well as minimum index using min
and idxmin
:
col_squares = df.pow(2).sum() col_squares #column1 26.00 #column2 74.00 #column3 242.26 #dtype: float64 col_squares.min(), col_squares.idxmin() #(26.0, 'column1')