Skip to content
Advertisement

Pandas Dataframe create new column with grouppy count with condition on count

I have this Dataframe

 df = pd.DataFrame({"A": [1, 1, 1, 1, 1, 2, 2, 2, 3], "B": [1, 4, 5, 6, 10, 7, 8, 9, 3], "C": ["Hello", "World", "How", "are", "you", "today", "miss", "?", "!"]})

    A    B      C
0  a1   a1  Hello
1  a1   a4  World
2  a1   a5    How
3  a1   a6    are
4  a1  a10    you
5  a2   a7  today
6  a2   a8   miss
7  a2   a9      ?
8  a3   a3      !

And I want something like this

    A    B      C   n
0  a1   a1  Hello   4
1  a1   a4  World   4
2  a1   a5    How   4
3  a1   a6    are   4
4  a1  a10    you   4
5  a2   a7  today   3
6  a2   a8   miss   3
7  a2   a9      ?   3
8  a3   a3      !   0

I tried this operation

df["n"] = df.loc[df.A != df.B].groupby("A")["B"].transform(len)

But I have this result

    A   B      C   n
0  a1  a1  Hello   NaN
1  a1  a4  World   4
2  a1  a5    How   4
3  a1  a6    are   4
4  a1  a10   you   4
5  a2  a7  today   3
6  a2  a8   miss   3
7  a2  a9      ?   3
8  a3  a3      !   NaN

Do you know i could set my condition df.A != df.B on the transform instead on the original dataframe ? Thanks

Advertisement

Answer

For count matched values (Trues) is possible pass mask with sum, Trues are processing like 1 and Falses like 0:

df["n"] = (df.A != df.B).groupby(df["A"]).transform('sum')
    
print (df)
   A   B      C  n
0  1   1  Hello  4
1  1   4  World  4
2  1   5    How  4
3  1   6    are  4
4  1  10    you  4
5  2   7  today  3
6  2   8   miss  3
7  2   9      ?  3
8  3   3      !  0

Or create helper column:

df["n"] = df.assign(B = df.A != df.B).groupby("A")['B'].transform('sum')
    
print (df)

   A   B      C  n
0  1   1  Hello  4
1  1   4  World  4
2  1   5    How  4
3  1   6    are  4
4  1  10    you  4
5  2   7  today  3
6  2   8   miss  3
7  2   9      ?  3
8  3   3      !  0
    
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement