Skip to content
Advertisement

Write a loop code to calculate average 77 different times, using another column as criteria

First of all, that’s my first code and question, so sorry for the begginer level here and lack of vocabulary.

I would like to calculate and store in a dataframe the average of the first 5 rows in a column “returns” with column “N” numbered as 1, and afterwards proceeding to calculate the average return of next 5 rows using the same column N numbered as 2, and so on (N goes up to 77). See table below as an example.

Actual data has more than 10.000 lines and column N goes from 1 to 77.

I did prepare a poor code (below as well), but I have two problems with it:

1 – I cannot reference the column N as a loop. I have to type 1 to 77 (so, 77 times) to get all the averages from samples 1 to 77

2 – I cannot write the code to store the output, given also I cannot write the code to repeat itself from N = 1 to 77

In the table below, the desired outcome (ie: average of the top 5 rows for each N), stored in a dataframe, would be: 0,1 (for N = 1) and 0,15 (for N = 2)

Sample table

N Return 1 0.23 1 0.08 1 0.02 1 0.16 1 0.01 1 0.01 1 -0.19 1 -0.14 1 -0.04 1 0.03 2 0.29 2 0.27 2 0.08 2 0.07 2 0.07 2 0.03 2 0.04 2 0.12 2 0.00 2 0.11

import pandas as pd
df = pd.read_csv(arq_csv)
ndf = df.loc[df["N"] == 1].head(5)
average = ndf["Return"].mean()
print(average)

Advertisement

Answer

Try this code:

import pandas as pd, random

# make dummy data
src = []
for i in range(77):
    for k in range(10):
        src.append([i + 1, random.randint(-10, 10)])
df = pd.DataFrame(src, columns=('N', 'Return'))
print(df)

# process data
df = df.groupby('N').head(5).groupby('N').mean().reset_index()
print(df)

Output

     N  Return
0    1    -1.4
1    2    -2.6
2    3     2.0
3    4    -0.6
4    5    -1.0
..  ..     ...
72  73    -2.0
73  74    -0.2
74  75    -2.0
75  76    -7.0
76  77     1.8

[77 rows x 2 columns]
Advertisement