First of all, that’s my first code and question, so sorry for the begginer level here and lack of vocabulary.
I would like to calculate and store in a dataframe the average of the first 5 rows in a column “returns” with column “N” numbered as 1, and afterwards proceeding to calculate the average return of next 5 rows using the same column N numbered as 2, and so on (N goes up to 77). See table below as an example.
Actual data has more than 10.000 lines and column N goes from 1 to 77.
I did prepare a poor code (below as well), but I have two problems with it:
1 – I cannot reference the column N as a loop. I have to type 1 to 77 (so, 77 times) to get all the averages from samples 1 to 77
2 – I cannot write the code to store the output, given also I cannot write the code to repeat itself from N = 1 to 77
In the table below, the desired outcome (ie: average of the top 5 rows for each N), stored in a dataframe, would be: 0,1 (for N = 1) and 0,15 (for N = 2)
N Return 1 0.23 1 0.08 1 0.02 1 0.16 1 0.01 1 0.01 1 -0.19 1 -0.14 1 -0.04 1 0.03 2 0.29 2 0.27 2 0.08 2 0.07 2 0.07 2 0.03 2 0.04 2 0.12 2 0.00 2 0.11
import pandas as pd
df = pd.read_csv(arq_csv)
ndf = df.loc[df["N"] == 1].head(5)
average = ndf["Return"].mean()
print(average)
Advertisement
Answer
Try this code:
import pandas as pd, random
# make dummy data
src = []
for i in range(77):
for k in range(10):
src.append([i + 1, random.randint(-10, 10)])
df = pd.DataFrame(src, columns=('N', 'Return'))
print(df)
# process data
df = df.groupby('N').head(5).groupby('N').mean().reset_index()
print(df)
Output
N Return
0 1 -1.4
1 2 -2.6
2 3 2.0
3 4 -0.6
4 5 -1.0
.. ..
72 73 -2.0
73 74 -0.2
74 75 -2.0
75 76 -7.0
76 77 1.8
[77 rows x 2 columns]