import pandas as pd df = pd.DataFrame({'cost_m1': [10, 11, 140], 'cost_m2': [130, 90, 110], 'cost_m3': [80, 110, 120], 'prob_m1': [0.1, 0.11, 0.25], 'prob_m2':[0.13, 0.19, 0.19], 'prob_m3':[0.21, 0.11, 0.14] }) df = df.reset_index() # make sure indexes pair with number of rows
I have got the dataframe above where costs and their corresponding probabilities are given. What I want is the following output. Any help would be greatly appreciated.
Advertisement
Answer
Here’s a way to do it:
df1 = ( pd.DataFrame( [sorted(((df.loc[i,f'cost_m{j}'], df.loc[i,f'prob_m{j}']) for j in range(1,4)), reverse=True) for i in range(len(df))]) ) df1 = ( pd.DataFrame( [df1.iloc[i,].explode().to_numpy() for i in range(len(df1))], columns=pd.Series([(f'{nth}_largest_cost', f'{nth}_largest_corres_prob') for nth in ('1st','2nd','3rd')]).explode()) ) df = pd.concat([df.drop(columns='index'), df1], axis=1)
Output:
cost_m1 cost_m2 cost_m3 prob_m1 prob_m2 prob_m3 1st_largest_cost 1st_largest_corres_prob 2nd_largest_cost 2nd_largest_corres_prob 3rd_largest_cost 3rd_largest_corres_prob 0 10 130 80 0.10 0.13 0.21 130 0.13 80 0.21 10 0.10 1 11 90 110 0.11 0.19 0.11 110 0.11 90 0.19 11 0.11 2 140 110 120 0.25 0.19 0.14 140 0.25 120 0.14 110 0.19
Explanation:
- create a new dataframe that pairs cost and prob for m1, m2 and m3 and sorts in reverse
- create another dataframe using a list of exploded rows (to get interleaved cost/prob columns) from the previous dataframe together with the desired new column labels, and concat it to the right of the original dataframe with the column named
index
dropped.