Pandas: New column with values greater than 0 and operate with these values

I have a big dataframe with more than 2500 columns but the structure is very similar than this:

      A     B     C     D      E
0     1     0     8     0      0
1     0     0     0     0      5
2     1     2     3     0      0
3     0     2     0     1      0

JavaScript
​x
 
      A     B     C     D      E
0     1     0     8     0      0
1     0     0     0     0      5
2     1     2     3     0      0
3     0     2     0     1      0
​

I need to detect all the columns with a value greater than 0 and I have done with this:

df['X'] = df.gt(0).dot(df.columns + ',')

JavaScript
 
df['X'] = df.gt(0).dot(df.columns + ',')
​

And then I get this:

      A     B     C     D      E    X
0     1     0     8     0      0    A,C
1     0     0     0     0      5    E
2     1     2     9     0      0    A,B,C
3     0     3     0     1      0    B,D

JavaScript
 
      A     B     C     D      E    X
0     1     0     8     0      0    A,C
1     0     0     0     0      5    E
2     1     2     9     0      0    A,B,C
3     0     3     0     1      0    B,D
​

The problem is that I don’t need in ‘X’ the name of the columns but the value of the columns and I need to perform the following mathematical operation:

I want it to detect only the 2 lowest values and subtract the lowest value from the second lowest. In the case that there were not more than 2 values in ‘X’ it would be enough for me to show the value of the column only.

The final result in my example would be the following:

      A     B     C     D      E    X
0     1     0     8     0      0    7
1     0     0     0     0      5    5
2     1     2     9     0      0    1
3     0     3     0     1      0    2

JavaScript
 
      A     B     C     D      E    X
0     1     0     8     0      0    7
1     0     0     0     0      5    5
2     1     2     9     0      0    1
3     0     3     0     1      0    2
​

Any idea how to solve it or any orientation?

Answer

We can do nsmallest then follow by np.ptp and condition for those row only have one value not equal to 0

df['new'] = df.apply(lambda x :  np.ptp(pd.Series.nsmallest(x[x!=0],2)) if sum(x!=0) != 1 else x[x!=0].iloc[0],axis=1)
Out[520]: 
0    7
1    5
2    1
3    1
dtype: int64

JavaScript
 
df['new'] = df.apply(lambda x :  np.ptp(pd.Series.nsmallest(x[x!=0],2)) if sum(x!=0) != 1 else x[x!=0].iloc[0],axis=1)
Out[520]: 
0    7
1    5
2    1
3    1
dtype: int64
​

Or doing two steps

df['new'] = df[df.ne(0).sum(1)>1].apply(lambda x :  np.ptp(pd.Series.nsmallest(x,2)),axis=1)
df['new'].fillna(df.max(1),inplace=True)
df
Out[530]: 
   A  B  C  D  E  new
0  1  0  8  0  0  7.0
1  0  0  0  0  5  5.0
2  1  2  3  0  0  1.0
3  0  2  0  1  0  1.0

JavaScript
 
df['new'] = df[df.ne(0).sum(1)>1].apply(lambda x :  np.ptp(pd.Series.nsmallest(x,2)),axis=1)
df['new'].fillna(df.max(1),inplace=True)
df
Out[530]: 
   A  B  C  D  E  new
0  1  0  8  0  0  7.0
1  0  0  0  0  5  5.0
2  1  2  3  0  0  1.0
3  0  2  0  1  0  1.0
​

Advertisement

Answer