Skip to content
Advertisement

calculate number of non-missing counts in specific columns

I have a data like below:

Col1   F1    L1    F2    L2     F3    L3
A     Jon    Ro   NaN   NaN    NaN   NaN
B    Koss   Lon   Pet Gross    NaN   NaN
C      Lo   NaN   NaN   NaN    NaN   NaN

I want to get count of non-missing based on F and L:

Col1   F1    L1    F2    L2     F3    L3   Cnt
A     Jon    Ro   NaN   NaN    NaN   NaN     1
B    Koss   Lon   Pet Gross    NaN   NaN     2
C      Lo   Pho   NaN   NaN    NaN   NaN     1

I tried below’s code but got wrong result since it considered F and L separate instead of based on their suffix:

df["Cnt"] = df[df.filter(regex = "F|L").columns.tolist()].apply(lambda x: x.count(), axis=1)

Col1   F1    L1    F2    L2     F3    L3   Cnt
A     Jon    Ro   NaN   NaN    NaN   NaN     2
B    Koss   Lon   Pet Gross    NaN   NaN     4
C      Lo   Pho   NaN   NaN    NaN   NaN     1

Any idea?

Advertisement

Answer

df['Cnt'] = (
    df.filter(regex="L|F") # select only L or F columns 
      .groupby(lambda col: col[-1], axis=1)  # group the columns by suffix (assuming it's just the last character)
      .agg(lambda g: g.notna().any(axis=1))  # for each i check if  Fi or Li are not NaNs
      .sum(axis=1) # count the not-NaN groups row-wise (i.e. all the True values)
)

>>> df

  Col1    F1   L1   F2     L2  F3  L3  Cnt
0    A   Jon   Ro  NaN    NaN NaN NaN    1
1    B  Koss  Lon  Pet  Gross NaN NaN    2
2    C    Lo  Pho  NaN    NaN NaN NaN    1
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement