I have this DataFrame
lst = [['AAA',15,'BBB',20],['BBB',16,'AAA',12],['BBB',22,'CCC',15],['CCC',11,'AAA',31],['DDD',25,'EEE',35]] df = pd.DataFrame(lst,columns = ['name1','val1','name2','val2'])
which looks like this
name1 val1 name2 val2 0 AAA 15 BBB 20 1 BBB 16 AAA 12 2 BBB 22 CCC 15 3 CCC 11 AAA 31 4 DDD 25 EEE 35
I want this
name1 val1 name2 val2 0 AAA 31 BBB 22 1 BBB 22 AAA 31 2 BBB 22 CCC 15 3 CCC 15 AAA 31 4 DDD 25 EEE 35
replaced all values with the maximum value. we choose the maximum value from both val1
and val2
if i do this i will get the maximum from only val1
df["val1"] = df.groupby("name1")["val1"].transform("max")
Advertisement
Answer
Try using pd.wide_to_long
to melt that dataframe into a long form, then use groupby with transform to find the max value. Map that max value to ‘name’ and reshape back to four column (wide) dataframe:
df_long = pd.wide_to_long(df.reset_index(), ['name','val'], 'index', j='num',sep='',suffix='d+') mapper= df_long.groupby('name')['val'].max() df_long['val'] = df_long['name'].map(mapper) df_new = df_long.unstack() df_new.columns = [f'{i}{j}' for i,j in df_new.columns] df_new
Output:
name1 name2 val1 val2 index 0 AAA BBB 31 22 1 BBB AAA 22 31 2 BBB CCC 22 15 3 CCC AAA 15 31 4 DDD EEE 25 35