Pandas columns created function on Groupby sorted columns

Question

I have a dataframe like below. What i am trying to do is calculate a column E1 and F1 with a sort and group by then return the entire data frame. The B1 column is incremental, but not necessarily by 1, but the sort on B1 will be if i only had one A1 value like my workflow is Which

Accepted Answer

I am using the following CSV (saved as test.csv):A1,B1,C1,D1A,8,0.4,1.3A,1,0.25,1.25B,5,0.5,1.02C,2,0.32,1.85B,1,0.15,1.22B,4,0.66,1.97B,3,0.29,1.87C,1,0.99,1.22C,3,0.78,1.39C,4,0.65,1.99A,2,0.32,1.02The following code will group on A1 and sort on B1:import pandas as pddf = pd.read_csv('test.csv')res = pd.DataFrame(columns = df.columns.tolist())for _, df_chnk in df.groupby('A1', sort=True):    df_chnk = df_chnk.reset_index(drop=True).sort_values(by=['B1'], ascending=True)    res = res.append(df_chnk)res = res.reset_index(drop=True)print(res)The above code will generate the following dataframe:   A1 B1    C1    D10   A  1  0.25  1.251   A  2  0.32  1.022   A  8  0.40  1.303   B  1  0.15  1.224   B  3  0.29  1.875   B  4  0.66  1.976   B  5  0.50  1.027   C  1  0.99  1.228   C  2  0.32  1.859   C  3  0.78  1.3910  C  4  0.65  1.99To perform the following operation:tempZ = df.loc[l[i], 'B1'] - df.loc[l[i-1], 'B1']tempJ = (df.loc[l [i], 'C1'] - df.loc[l[i-1], 'C1'])tempI = (df.loc[l [i], 'D1'] - df.loc[l[i-1], 'D1'])Check out pandas.DataFrame.diff. You can use it on each of the df_chnk!

Advertisement

Answer