My dataset is similar to the below:
JavaScript
x
4
1
data = [['Jane', 10,10.5,11,45,66,21,88,99,77,41,22], ['John',11,22,55,34,44,22,44,55,88,56,47],['Tom',23,32,43,12,11,44,77,85,99,45,63]]
2
df = pd.DataFrame(data, columns = ['Name', '09-Aug-21', 'Aug-21', '02-Sep-21', 'Sep-21', '18-Oct-21', 'Oct-21', '02-Nov-21','Nov-21','14-Dec-21', 'Dec-21', '15-Jan-22'])
3
df
4
How can I add columns to this which show the quarterly figure, which is an average of the preceding three months? Eg, suppose we started at adding a column after ‘Dec-21’ called Q4 2021 which took the average of the columns called ‘Oct-21’, ‘Nov-21’ and ‘Dec-21’.
Will I need to create a function which takes the preceding three values and returns an average, and then concatenate this to my dataframe? It does not have to be directly after each period, eg I am also happy to add all of the Quarterly averages right at the end.
Advertisement
Answer
JavaScript
1
27
27
1
from datetime import datetime
2
3
def get_quarter_name(timestamp):
4
"""Convert '2021-12-01' to 'Q4-2021'
5
"""
6
return f"Q{timestamp.quarter}-{timestamp.year}"
7
8
# your data
9
data = [['Jane', 10,10.5,11,45,66,21,88,99,77,41,22], ['John',11,22,55,34,44,22,44,55,88,56,47],['Tom',23,32,43,12,11,44,77,85,99,45,63]]
10
df = pd.DataFrame(data, columns = ['Name', '09-Aug-21', 'Aug-21', '02-Sep-21', 'Sep-21', '18-Oct-21', 'Oct-21', '02-Nov-21','Nov-21','14-Dec-21', 'Dec-21', '15-Jan-22'])
11
12
# filter only relevant columns, which start with an alphabetical character
13
cols = [col for col in df.columns if not col[0].isdigit()]
14
15
# extract only relevant columns and transpose
16
df_T = df[cols].set_index("Name").T
17
18
# convert index values to dates
19
df_T.index = pd.Index([pd.Timestamp(datetime.strptime(d,'%b-%y').strftime('%Y-%m-%d')) for d in df_T.index])
20
21
# resample by Quarters and transpose again to original format
22
df_quarter = df_T.resample("Q").mean().T
23
24
# rename columns to quarter-like descriptions
25
df_quarter.columns = [get_quarter_name(col) for col in df_quarter.columns]
26
27
df_quarter
is your final answer which you can merge back to original df
Output:
JavaScript
1
6
1
Q3-2021 Q4-2021
2
Name
3
Jane 27.75 53.666667
4
John 28.00 44.333333
5
Tom 22.00 58.000000
6