JavaScript
x
8
1
df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/test_df.csv')
2
id rank date
3
1991513 FCWFKZVFAHFK7WP4 32 2021-06-01
4
1991514 FCWEUHFSM2BSQY2N 33 2021-06-01
5
1991515 FCWFV6T2GGPM8T2P 34 2021-06-01
6
1991516 FCWEQ8B4QDJJUNEH 35 2021-06-01
7
1991517 FCWFAUSPJFGDUBRG 36 2021-06-01
8
I have the above data for 1 month and I want to create a new column delta_rank_7 which tells me the change in rank in last 7 days for each id (NaNs for 2021-06-01 to 2021-06-07)
I can do something like mentioned here Calculating difference between two rows in Python / Pandas
JavaScript
1
2
1
df.set_index('date').diff(periods=7)
2
but I have multiple entries for each date and I want to do this for each id.
Advertisement
Answer
If there are duplicated id
use:
JavaScript
1
3
1
df = df.set_index('date')
2
df['delta_rank_7'] = df.groupby('id')['rank'].diff(periods=7)
3
If need differencies by 7 days use DataFrameGroupBy.shift
and subtract:
JavaScript
1
24
24
1
file = 'https://raw.githubusercontent.com/amanaroratc/hello-world/master/test_df.csv'
2
df=pd.read_csv(file, parse_dates=['date'])
3
4
df = df.sort_values(['id','date'])
5
df = df.merge((df.set_index(['id','date'])['rank']
6
.sub(df.set_index('date').groupby('id')['rank'].shift(7, freq='d'))
7
.reset_index(name='delta_rank_7'))
8
)
9
print (df)
10
id rank date delta_rank_7
11
0 CBKFGPBZMG48K5SF 2 2021-06-15 NaN
12
1 CBKFGPBZMG48K5SF 19 2021-06-19 NaN
13
2 CBKFGPBZMG48K5SF 2 2021-06-21 NaN
14
3 CBKFGPBZMG48K5SF 2 2021-06-22 0.0
15
4 CBKFGPBZMG48K5SF 48 2021-06-24 NaN
16
17
10010 FRNEUJZRVQGT94SP 112 2021-06-23 38.0
18
10011 FRNEUJZRVQGT94SP 109 2021-06-24 35.0
19
10012 FRNEUJZRVQGT94SP 68 2021-06-27 -73.0
20
10013 FRNEUJZRVQGT94SP 85 2021-06-28 NaN
21
10014 FRNEUJZRVQGT94SP 133 2021-06-30 21.0
22
23
[10015 rows x 4 columns]
24