I have this Dataframe
df = pd.DataFrame({"A": [1, 1, 1, 1, 1, 2, 2, 2, 3], "B": [1, 4, 5, 6, 10, 7, 8, 9, 3], "C": ["Hello", "World", "How", "are", "you", "today", "miss", "?", "!"]}) A B C 0 a1 a1 Hello 1 a1 a4 World 2 a1 a5 How 3 a1 a6 are 4 a1 a10 you 5 a2 a7 today 6 a2 a8 miss 7 a2 a9 ? 8 a3 a3 !
And I want something like this
A B C n 0 a1 a1 Hello 4 1 a1 a4 World 4 2 a1 a5 How 4 3 a1 a6 are 4 4 a1 a10 you 4 5 a2 a7 today 3 6 a2 a8 miss 3 7 a2 a9 ? 3 8 a3 a3 ! 0
I tried this operation
df["n"] = df.loc[df.A != df.B].groupby("A")["B"].transform(len)
But I have this result
A B C n 0 a1 a1 Hello NaN 1 a1 a4 World 4 2 a1 a5 How 4 3 a1 a6 are 4 4 a1 a10 you 4 5 a2 a7 today 3 6 a2 a8 miss 3 7 a2 a9 ? 3 8 a3 a3 ! NaN
Do you know i could set my condition df.A != df.B
on the transform
instead on the original dataframe ?
Thanks
Advertisement
Answer
For count matched values (True
s) is possible pass mask with sum
, True
s are processing like 1
and False
s like 0
:
df["n"] = (df.A != df.B).groupby(df["A"]).transform('sum') print (df) A B C n 0 1 1 Hello 4 1 1 4 World 4 2 1 5 How 4 3 1 6 are 4 4 1 10 you 4 5 2 7 today 3 6 2 8 miss 3 7 2 9 ? 3 8 3 3 ! 0
Or create helper column:
df["n"] = df.assign(B = df.A != df.B).groupby("A")['B'].transform('sum') print (df) A B C n 0 1 1 Hello 4 1 1 4 World 4 2 1 5 How 4 3 1 6 are 4 4 1 10 you 4 5 2 7 today 3 6 2 8 miss 3 7 2 9 ? 3 8 3 3 ! 0