Skip to content
Advertisement

Find max of two columns and populate with value in third column based on a condition

I have a pandas dataframe with 4 columns – A, B, C,D and E. I want create groups by column A and then find the max of column B and C and then populate E with the corresponding value present in D as shown in example below. The condition here is that Column E is to be populated with 0, if a group in column B has all 1’s.

My expected output

column A    column B    column C    column D    Column E
AA         1           a            12          15
AA         2           d            13          15
AA         3           b            14          15
AA         3           e            15          15
BB         1           c            15          17
BB         1           d            16          17
BB         2           b            17          17
BB         2           a            18          17
CC         1           a            11          0
CC         1           c            10          0
CC         1           b            11          0
CC         1           e            15          0

My code:

df1 = df1.sort_values(by=['column A', 'column B','column C','column D'],ascending=[True,False, False,False])
col_e_df = df1.groupby(['column A']).agg({'column D':['first']})
col_e__df = tranform_aggregated_data(col_e_df, {'first':  'column D'})
#print(col_e_df.sort_index())
df1 = df1.merge(col_e__df, on = ['column A'])

This code works fine when there is number greater than 2, but when when column B has all 1 for any group, it is not populating 0’s(which is what I need to do). When I add the if condition , the group with number greater than 2 also shows 0.

The line of code where it is getting messed up

col_e_df = (np.where(df1['column B'] >= 2, df1.groupby(['column A'])['columnD'].transform('first') , 0))

The actual output from my code is
column A    column B    column C    column D    Column E
AA          1           a           12          0
AA          2           d           13          15
AA          3           b           14          15
AA          3           e           15          15
BB          1           c           15          0
BB          1           d           16          0
BB          2           b           17          17
BB          2           a           18          17
CC          1           a           11          0
CC          1           c           10          0
CC          1           b           11          0
CC          1           e           15          0

Advertisement

Answer

Try:

df = df.sort_values(
    by=["column A", "column B", "column C", "column D"],
    ascending=[True, False, False, False],
)

df["Column E"] = df.groupby("column A")["column D"].transform("first")

# set "Column E" to 0 if all values in "column B" == 1
df["Column E"] = np.where(
    df.groupby("column A")["column B"].transform(lambda x: x.eq(1).all()),
    0,
    df["Column E"],
)
print(df.sort_index())

Prints:

   column A  column B column C  column D  Column E
0        AA         1        a        12        15
1        AA         2        d        13        15
2        AA         3        b        14        15
3        AA         3        e        15        15
4        BB         1        c        15        17
5        BB         1        d        16        17
6        BB         2        b        17        17
7        BB         2        a        18        17
8        CC         1        a        11         0
9        CC         1        c        10         0
10       CC         1        b        11         0
11       CC         1        e        15         0
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement