Pandas columns created function on Groupby sorted columns

I have a dataframe like below. What i am trying to do is calculate a column E1 and F1 with a sort and group by then return the entire data frame. The B1 column is incremental, but not necessarily by 1, but the sort on B1 will be

A1  B1    C1     D1  
A    8    0.4    1.3
A    1    0.25   1.25   
B    5    0.5    1.02 
C    2    0.32   1.85  
B    1    0.15   1.22  
B    4    0.66   1.97  
B    3    0.29   1.87  
C    1    0.99   1.22  
C    3    0.78   1.39  
C    4    0.65   1.99  
A    2    0.32   1.02

JavaScript
​x
 
A1  B1    C1     D1  
A    8    0.4    1.3
A    1    0.25   1.25   
B    5    0.5    1.02 
C    2    0.32   1.85  
B    1    0.15   1.22  
B    4    0.66   1.97  
B    3    0.29   1.87  
C    1    0.99   1.22  
C    3    0.78   1.39  
C    4    0.65   1.99  
A    2    0.32   1.02  
​

if i only had one A1 value like

A1  B1    C1     D1  
A    1    0.25   1.25   
A    8    0.4    1.3
A    2    0.32   1.02

JavaScript
 
A1  B1    C1     D1  
A    1    0.25   1.25   
A    8    0.4    1.3
A    2    0.32   1.02  
​

my workflow is

df.sort_values(by=['B1'])
df = df.reset_index(drop=True)

l = df.index.values
for i in range(len(l)):
    tempZ = df.loc[l[i], 'B1'] - df.loc[l[i-1], 'B1']
    tempJ = (df.loc[l [i], 'C1'] - df.loc[l[i-1], 'C1'])
    tempI = (df.loc[l [i], 'D1'] - df.loc[l[i-1], 'D1'])
    *** DO a whole series of mathematical calculations ***
    X = *** Final mathematical calculations ***
    Y = *** Final mathematical calculations ***
    df.loc[l[i], 'E1'] = tempE
    df.loc[l[i], 'F1'] = tempF

JavaScript
 
df.sort_values(by=['B1'])
df = df.reset_index(drop=True)
​
l = df.index.values
for i in range(len(l)):
    tempZ = df.loc[l[i], 'B1'] - df.loc[l[i-1], 'B1']
    tempJ = (df.loc[l [i], 'C1'] - df.loc[l[i-1], 'C1'])
    tempI = (df.loc[l [i], 'D1'] - df.loc[l[i-1], 'D1'])
    *** DO a whole series of mathematical calculations ***
    X = *** Final mathematical calculations ***
    Y = *** Final mathematical calculations ***
    df.loc[l[i], 'E1'] = tempE
    df.loc[l[i], 'F1'] = tempF
​

Which would then output my dataframe as expected

A1  B1    C1     D1      E1    F1
A    1    0.25   1.25   7.33   0.33  
A    2    0.32   1.02   7.89   0.12
A    8    0.4    1.3    8.65   0.22

JavaScript
 
A1  B1    C1     D1      E1    F1
A    1    0.25   1.25   7.33   0.33  
A    2    0.32   1.02   7.89   0.12
A    8    0.4    1.3    8.65   0.22
​

But i want to do it for the entire dataframe

A1  B1    C1     D1      E1    F1
A    1    0.25   1.25   7.33   0.33  
A    2    0.32   1.02   7.89   0.12
A    8    0.4    1.3    8.65   0.22
B    1    0.15   1.22   12.32  0.77  
B    3    0.29   1.87   1.11   0.12
B    4    0.66   1.97   7.12   0.99
B    5    0.5    1.02   9.25   0.55
C    1    0.99   1.22   2.35   0.123
C    2    0.32   1.85   6.32   1.03 
C    3    0.78   1.39   6.33   1.00 
C    4    0.65   1.99   12.02  0.33

JavaScript
 
A1  B1    C1     D1      E1    F1
A    1    0.25   1.25   7.33   0.33  
A    2    0.32   1.02   7.89   0.12
A    8    0.4    1.3    8.65   0.22
B    1    0.15   1.22   12.32  0.77  
B    3    0.29   1.87   1.11   0.12
B    4    0.66   1.97   7.12   0.99
B    5    0.5    1.02   9.25   0.55
C    1    0.99   1.22   2.35   0.123
C    2    0.32   1.85   6.32   1.03 
C    3    0.78   1.39   6.33   1.00 
C    4    0.65   1.99   12.02  0.33  
​

So my dataframe needs be grouped by A1 and sorted on B1. the calculations are done on C1 and D1 columns, but also the previous columns.

So i guess my question is, what would be the most efficient way to accomplish this with a large dataframe where i would need to do some type of groupby then output the complete dataframe with the new columns?

Data

data = [['A',8,0.4,1.3],['A',1,0.25,1.25],['B',5,0.5,1.02],['C',2,0.32,1.85], 
 ['B',1,0.15,1.22],['B',4,0.66,1.97],['B',3,0.29,1.87],['C',1,0.99,1.22], 
 ['C',3,0.78,1.39],['C',4,0.65,1.99],['A',2,0.32,1.02]]

df = pd.DataFrame(data, columns = ['A1', 'B1', 'C1', 'D1'])

JavaScript
 
data = [['A',8,0.4,1.3],['A',1,0.25,1.25],['B',5,0.5,1.02],['C',2,0.32,1.85], 
 ['B',1,0.15,1.22],['B',4,0.66,1.97],['B',3,0.29,1.87],['C',1,0.99,1.22], 
 ['C',3,0.78,1.39],['C',4,0.65,1.99],['A',2,0.32,1.02]]
​
df = pd.DataFrame(data, columns = ['A1', 'B1', 'C1', 'D1'])
​

EDIT: The calculations are quite a bit more complex than i show, i do various operations with the sin,cos, acos, etc and apply calcs on these. do not believe that diff would really work for me

Answer

I am using the following CSV (saved as test.csv):

A1,B1,C1,D1
A,8,0.4,1.3
A,1,0.25,1.25
B,5,0.5,1.02
C,2,0.32,1.85
B,1,0.15,1.22
B,4,0.66,1.97
B,3,0.29,1.87
C,1,0.99,1.22
C,3,0.78,1.39
C,4,0.65,1.99
A,2,0.32,1.02

JavaScript
 
A1,B1,C1,D1
A,8,0.4,1.3
A,1,0.25,1.25
B,5,0.5,1.02
C,2,0.32,1.85
B,1,0.15,1.22
B,4,0.66,1.97
B,3,0.29,1.87
C,1,0.99,1.22
C,3,0.78,1.39
C,4,0.65,1.99
A,2,0.32,1.02
​

The following code will group on A1 and sort on B1:

import pandas as pd

df = pd.read_csv('test.csv')

res = pd.DataFrame(columns = df.columns.tolist())
for _, df_chnk in df.groupby('A1', sort=True):
    df_chnk = df_chnk.reset_index(drop=True).sort_values(by=['B1'], ascending=True)
    res = res.append(df_chnk)
res = res.reset_index(drop=True)
print(res)

JavaScript
 
import pandas as pd
​
df = pd.read_csv('test.csv')
​
res = pd.DataFrame(columns = df.columns.tolist())
for _, df_chnk in df.groupby('A1', sort=True):
    df_chnk = df_chnk.reset_index(drop=True).sort_values(by=['B1'], ascending=True)
    res = res.append(df_chnk)
res = res.reset_index(drop=True)
print(res)
​

The above code will generate the following dataframe:

   A1 B1    C1    D1
0   A  1  0.25  1.25
1   A  2  0.32  1.02
2   A  8  0.40  1.30
3   B  1  0.15  1.22
4   B  3  0.29  1.87
5   B  4  0.66  1.97
6   B  5  0.50  1.02
7   C  1  0.99  1.22
8   C  2  0.32  1.85
9   C  3  0.78  1.39
10  C  4  0.65  1.99

JavaScript
 
   A1 B1    C1    D1
 A  1  0.25  1.25
 A  2  0.32  1.02
 A  8  0.40  1.30
 B  1  0.15  1.22
 B  3  0.29  1.87
 B  4  0.66  1.97
 B  5  0.50  1.02
 C  1  0.99  1.22
 C  2  0.32  1.85
 C  3  0.78  1.39
C  4  0.65  1.99
​

To perform the following operation:

tempZ = df.loc[l[i], 'B1'] - df.loc[l[i-1], 'B1']
tempJ = (df.loc[l [i], 'C1'] - df.loc[l[i-1], 'C1'])
tempI = (df.loc[l [i], 'D1'] - df.loc[l[i-1], 'D1'])

JavaScript
 
tempZ = df.loc[l[i], 'B1'] - df.loc[l[i-1], 'B1']
tempJ = (df.loc[l [i], 'C1'] - df.loc[l[i-1], 'C1'])
tempI = (df.loc[l [i], 'D1'] - df.loc[l[i-1], 'D1'])
​

Check out pandas.DataFrame.diff. You can use it on each of the df_chnk!

Advertisement

Answer