In my excel csv files, there are around 1500 rows and 30 columns. I believe I can use python to complete it. so here is my target:
- How to let python read my excel file correctly.
- I want to reduce the number of rows to 1/10, so How can I calculate the average value for every 10 rows in each column?
- At the same time, I hope to keep the timeslot column so I understand what period it indicates.
Here is my excel file in short. enter image description here
I have uploaded the file on Google Drive, please try to have a look https://drive.google.com/file/d/1EDmSgsEoNQYZeRD_JiR33WNv7ENW4cp3/view?usp=sharing
The code I used is shown below
import numpy as np import pandas as pd import glob location='C:\Users\Poon\Downloads\20211014_SBS_BEMS\20211014_SBS_BEMS\1043 succeed.csv' csvfiles=glob.glob(location) df1=pd.DataFrame() for file_new_2 in csvfiles: df2=pd.read_csv(file_new_2) df1=pd.concat([df1,df2],ignore_index=True) df1.mean(axis=0)#average for each column df1.mean(axis=1) n = 100 # the number of rows df1.groupby(np.arange(len(df1))//n).mean() print(df1)
Advertisement
Answer
This code would clean your data and take the mean for every 10th row.
df = df.iloc[1:, :] df = pd.concat([pd.to_datetime(df.iloc[:,0], errors = "coerce"), df.iloc[:, 1:].apply(pd.to_numeric)], axis = 1) df.dropna(inplace = True) df["index"] = df.index//10 df.groupby("index").mean()