Skip to content
Advertisement

How to create sum of columns in Pandas based on a conditional of multiple columns?

I am trying to sum two columns of the DataFrame to create a third column where the value in the third column is equal to the sum of the positive elements of the other columns. I have tried the below and just receive a column of NaN values

df = pd.DataFrame(np.array([[-1, 2], [-2, 2], [1, -3], [1, -4], [ -2 , -2]]),
                   columns=['a', 'b'])

df['Sum of Positives'] = 0

df['Sum of Positives'] = df.loc[df.a > 0 ,'a'] +df.loc[df.b >0 , 'b']

DataFrame:

enter image description here

Advertisement

Answer

You can use df.mask here and fill value less than 0 i.e negative value with 0 and do df.sum over axis 1.

df['sum of pos'] = df.mask(df<0, 0).sum(axis=1)

   a  b  sum of pos
0 -1  2           2
1 -2  2           2
2  1 -3           1
3  1 -4           1
4 -2 -2           0

Few NumPy hacks that are useful here.

  • Using np.copyto

    t = np.copy(df.values)
    np.copyto(t, 0, where=df.values<0)
    df['sum of pos'] = t.sum(axis=1)
    
  • Using np.where

    df['sum of pos'] = np.where(df.values<0, 0, df.values).sum(axis=1)
    
  • Using np.clip

    df['sum of pos'] = np.clip(df.values, 0, None).sum(axis=1)
    
  • Using np.ma.array

    m = np.ma.array(df.values, mask=df.values<0, fill_value=0)
    df['sum of pos'] = m.filled().sum(axis=1)
    
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement