Skip to content
Advertisement

Calculating the average value for every 10 cells in each column by pandas

In my excel csv files, there are around 1500 rows and 30 columns. I believe I can use python to complete it. so here is my target:

  1. How to let python read my excel file correctly.
  2. I want to reduce the number of rows to 1/10, so How can I calculate the average value for every 10 rows in each column?
  3. At the same time, I hope to keep the timeslot column so I understand what period it indicates.

Here is my excel file in short. enter image description here

I have uploaded the file on Google Drive, please try to have a look https://drive.google.com/file/d/1EDmSgsEoNQYZeRD_JiR33WNv7ENW4cp3/view?usp=sharing

The code I used is shown below

    import numpy as np
    import pandas as pd 
import glob
location='C:\Users\Poon\Downloads\20211014_SBS_BEMS\20211014_SBS_BEMS\1043 succeed.csv'
csvfiles=glob.glob(location)

df1=pd.DataFrame()

for file_new_2 in csvfiles: 
    df2=pd.read_csv(file_new_2)
    df1=pd.concat([df1,df2],ignore_index=True)
    df1.mean(axis=0)#average for each column
    df1.mean(axis=1)
    n = 100 # the number of rows
    df1.groupby(np.arange(len(df1))//n).mean()

print(df1)

Advertisement

Answer

This code would clean your data and take the mean for every 10th row.

df = df.iloc[1:, :]
df = pd.concat([pd.to_datetime(df.iloc[:,0], errors = "coerce"), df.iloc[:, 1:].apply(pd.to_numeric)], axis = 1)
df.dropna(inplace = True)
df["index"] = df.index//10
df.groupby("index").mean()
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement