First of all, that’s my first code and question, so sorry for the begginer level here and lack of vocabulary.
I would like to calculate and store in a dataframe the average of the first 5 rows in a column “returns” with column “N” numbered as 1, and afterwards proceeding to calculate the average return of next 5 rows using the same column N numbered as 2, and so on (N goes up to 77). See table below as an example.
Actual data has more than 10.000 lines and column N goes from 1 to 77.
I did prepare a poor code (below as well), but I have two problems with it:
1 – I cannot reference the column N as a loop. I have to type 1 to 77 (so, 77 times) to get all the averages from samples 1 to 77
2 – I cannot write the code to store the output, given also I cannot write the code to repeat itself from N = 1 to 77
In the table below, the desired outcome (ie: average of the top 5 rows for each N), stored in a dataframe, would be: 0,1 (for N = 1) and 0,15 (for N = 2)
N Return 1 0.23 1 0.08 1 0.02 1 0.16 1 0.01 1 0.01 1 -0.19 1 -0.14 1 -0.04 1 0.03 2 0.29 2 0.27 2 0.08 2 0.07 2 0.07 2 0.03 2 0.04 2 0.12 2 0.00 2 0.11
import pandas as pd df = pd.read_csv(arq_csv) ndf = df.loc[df["N"] == 1].head(5) average = ndf["Return"].mean() print(average)
Advertisement
Answer
Try this code:
import pandas as pd, random # make dummy data src = [] for i in range(77): for k in range(10): src.append([i + 1, random.randint(-10, 10)]) df = pd.DataFrame(src, columns=('N', 'Return')) print(df) # process data df = df.groupby('N').head(5).groupby('N').mean().reset_index() print(df)
Output
N Return 0 1 -1.4 1 2 -2.6 2 3 2.0 3 4 -0.6 4 5 -1.0 .. .. ... 72 73 -2.0 73 74 -0.2 74 75 -2.0 75 76 -7.0 76 77 1.8 [77 rows x 2 columns]