Pandas – Compute z-score for all columns

Question

I have a dataframe containing a single column of IDs and all other columns are numerical values for which I want to compute z-scores. Here&#8217;s a subsection of it: Some of my columns contain NaN values which I do not want to include into the z-score calculations so I intend to use a solution offered to thi…

Accepted Answer

Build a list from the columns and remove the column you don&#8217;t want to calculate the Z score for:In [66]:cols = list(df.columns)cols.remove('ID')df[cols]Out[66]:   Age  BMI  Risk  Factor0    6   48  19.3       41    8   43  20.9     NaN2    2   39  18.1       33    9   41  19.5     NaNIn [68]:# now iterate over the remaining columns and create a new zscore columnfor col in cols:    col_zscore = col + '_zscore'    df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0)dfOut[68]:   ID  Age  BMI  Risk  Factor  Age_zscore  BMI_zscore  Risk_zscore  0  PT    6   48  19.3       4   -0.093250    1.569614    -0.150946   1  PT    8   43  20.9     NaN    0.652753    0.074744     1.459148   2  PT    2   39  18.1       3   -1.585258   -1.121153    -1.358517   3  PT    9   41  19.5     NaN    1.025755   -0.523205     0.050315      Factor_zscore  0              1  1            NaN  2             -1  3            NaN

Advertisement

Answer