I have a pandas dataframe with 4 columns – A, B, C,D and E. I want create groups by column A and then find the max of column B and C and then populate E with the corresponding value present in D as shown in example below. The condition here is that Column E is to be populated with 0, if a group in column B has all 1’s.
My expected output
JavaScript
x
14
14
1
column A column B column C column D Column E
2
AA 1 a 12 15
3
AA 2 d 13 15
4
AA 3 b 14 15
5
AA 3 e 15 15
6
BB 1 c 15 17
7
BB 1 d 16 17
8
BB 2 b 17 17
9
BB 2 a 18 17
10
CC 1 a 11 0
11
CC 1 c 10 0
12
CC 1 b 11 0
13
CC 1 e 15 0
14
My code:
JavaScript
1
6
1
df1 = df1.sort_values(by=['column A', 'column B','column C','column D'],ascending=[True,False, False,False])
2
col_e_df = df1.groupby(['column A']).agg({'column D':['first']})
3
col_e__df = tranform_aggregated_data(col_e_df, {'first': 'column D'})
4
#print(col_e_df.sort_index())
5
df1 = df1.merge(col_e__df, on = ['column A'])
6
This code works fine when there is number greater than 2, but when when column B has all 1 for any group, it is not populating 0’s(which is what I need to do). When I add the if condition , the group with number greater than 2 also shows 0.
The line of code where it is getting messed up
JavaScript
1
17
17
1
col_e_df = (np.where(df1['column B'] >= 2, df1.groupby(['column A'])['columnD'].transform('first') , 0))
2
3
The actual output from my code is
4
column A column B column C column D Column E
5
AA 1 a 12 0
6
AA 2 d 13 15
7
AA 3 b 14 15
8
AA 3 e 15 15
9
BB 1 c 15 0
10
BB 1 d 16 0
11
BB 2 b 17 17
12
BB 2 a 18 17
13
CC 1 a 11 0
14
CC 1 c 10 0
15
CC 1 b 11 0
16
CC 1 e 15 0
17
Advertisement
Answer
Try:
JavaScript
1
15
15
1
df = df.sort_values(
2
by=["column A", "column B", "column C", "column D"],
3
ascending=[True, False, False, False],
4
)
5
6
df["Column E"] = df.groupby("column A")["column D"].transform("first")
7
8
# set "Column E" to 0 if all values in "column B" == 1
9
df["Column E"] = np.where(
10
df.groupby("column A")["column B"].transform(lambda x: x.eq(1).all()),
11
0,
12
df["Column E"],
13
)
14
print(df.sort_index())
15
Prints:
JavaScript
1
14
14
1
column A column B column C column D Column E
2
0 AA 1 a 12 15
3
1 AA 2 d 13 15
4
2 AA 3 b 14 15
5
3 AA 3 e 15 15
6
4 BB 1 c 15 17
7
5 BB 1 d 16 17
8
6 BB 2 b 17 17
9
7 BB 2 a 18 17
10
8 CC 1 a 11 0
11
9 CC 1 c 10 0
12
10 CC 1 b 11 0
13
11 CC 1 e 15 0
14