Pandas: Dataframe itertuples boolean series groupby optimization

Question

I'm new in python. I have data frame (DF) example: id type 1 A 1 B 2 C 2 B I would like to add a column example A_flag group by id. In the end I have data frame (DF): id type A_flag 1 A 1 1 B 1 2 C 0 2 B 0 I can do this in

Accepted Answer

Change your codes with slow iterative coding to fast vectorized coding by replacing your first step to generate a boolean series by Pandas built-in functions, e.g.df['type'].eq('A')Then, you can attach it to the groupby statement for second step, as follows:df['A_flag'] = df['type'].eq('A').groupby(df['id']).transform('max').astype(int)Resultprint(df)   id type  A_flag0   1    A       11   1    B       12   2    C       03   2    B       0In general, if you have more complicated conditions, you can also define it in vectorized way, eg. define the boolean series m by:m = df['type'].eq('A') & df['type1'].gt(1)  | (df['type2'] != 0)Then, use it in step 2 as follows:m.groupby(df['id']).transform('max').astype(int)

Advertisement

Answer