In my excel csv files, there are around 1500 rows and 30 columns. I believe I can use python to complete it. so here is my target: How to let python read my excel file correctly. I want to reduce the number of rows to 1/10, so How can I calculate the average value for every 10 rows in each

Calculating the average value for every 10 cells in each column by pandas

In my excel csv files, there are around 1500 rows and 30 columns. I believe I can use python to complete it. so here is my target:

How to let python read my excel file correctly.
I want to reduce the number of rows to 1/10, so How can I calculate the average value for every 10 rows in each column?
At the same time, I hope to keep the timeslot column so I understand what period it indicates.

Here is my excel file in short. enter image description here

I have uploaded the file on Google Drive, please try to have a look https://drive.google.com/file/d/1EDmSgsEoNQYZeRD_JiR33WNv7ENW4cp3/view?usp=sharing

The code I used is shown below

    import numpy as np
    import pandas as pd 
import glob
location='C:\Users\Poon\Downloads\20211014_SBS_BEMS\20211014_SBS_BEMS\1043 succeed.csv'
csvfiles=glob.glob(location)

df1=pd.DataFrame()

for file_new_2 in csvfiles: 
    df2=pd.read_csv(file_new_2)
    df1=pd.concat([df1,df2],ignore_index=True)
    df1.mean(axis=0)#average for each column
    df1.mean(axis=1)
    n = 100 # the number of rows
    df1.groupby(np.arange(len(df1))//n).mean()

print(df1)

Answer

This code would clean your data and take the mean for every 10th row.

df = df.iloc[1:, :]
df = pd.concat([pd.to_datetime(df.iloc[:,0], errors = "coerce"), df.iloc[:, 1:].apply(pd.to_numeric)], axis = 1)
df.dropna(inplace = True)
df["index"] = df.index//10
df.groupby("index").mean()

Advertisement

Answer