I have a DataFrame that looks like this:
AD1 AD2 AD3 AD4 AD5 1 0 0 0 0 0 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 0 5 0 0 0 0 0 .. ... ... ... ... ... 497 138099424 126282256 0 197637664 0 498 90184696 53508360 90254688 66803036 57421672 499 41476688 54889712 36677268 33858400 50282272 500 26322476 27609302 26245020 20566374 23664970 501 595136256 241994400 489766144 314901408 730741120
I need to find the mean of each row, ignoring instances of 0. My initial plan was to replace 0 with NaN and then get the mean excluding NaN.
I tried to replace 0
with NaN
, however this didn’t work, and the DataFrame still contained 0. I tried:
df = df.replace(0, np.nan) df = df.replace(0, np.nan, inplace=True)
The second issue is when I tried to calculate the mean, even knowing 0
would be included, the mean could not be calculated. I used:
df = df.assign(mean=df.mean(axis=1))
The outcome was:
AD1 AD2 AD3 AD4 AD5 mean 1 0 0 0 0 0 NaN 2 0 0 0 0 0 NaN 3 0 0 0 0 0 NaN 4 0 0 0 0 0 NaN 5 0 0 0 0 0 NaN .. ... ... ... ... ... ... 497 138099424 126282256 0 197637664 0 NaN 498 90184696 53508360 90254688 66803036 57421672 NaN 499 41476688 54889712 36677268 33858400 50282272 NaN 500 26322476 27609302 26245020 20566374 23664970 NaN 501 595136256 241994400 489766144 314901408 730741120 NaN
How can I accomplish this?
Advertisement
Answer
I tried to replace 0 with NaN, however this didn’t work, and the DataFrame still contained 0. I tried:
Convert your string values to numeric
df['mean'] = df.astype(float).replace(0, np.nan).mean(axis=1) print(df) # Output AD1 AD2 AD3 AD4 AD5 mean 1 0 0 0 0 0 NaN 2 0 0 0 0 0 NaN 3 0 0 0 0 0 NaN 4 0 0 0 0 0 NaN 5 0 0 0 0 0 NaN 497 138099424 126282256 0 197637664 0 154006448.0 498 90184696 53508360 90254688 66803036 57421672 71634490.4 499 41476688 54889712 36677268 33858400 50282272 43436868.0 500 26322476 27609302 26245020 20566374 23664970 24881628.4 501 595136256 241994400 489766144 314901408 730741120 474507865.6