In my excel csv files, there are around 1500 rows and 30 columns. I believe I can use python to complete it. so here is my target:
- How to let python read my excel file correctly.
- I want to reduce the number of rows to 1/10, so How can I calculate the average value for every 10 rows in each column?
- At the same time, I hope to keep the timeslot column so I understand what period it indicates.
Here is my excel file in short. enter image description here
I have uploaded the file on Google Drive, please try to have a look https://drive.google.com/file/d/1EDmSgsEoNQYZeRD_JiR33WNv7ENW4cp3/view?usp=sharing
The code I used is shown below
JavaScript
x
18
18
1
import numpy as np
2
import pandas as pd
3
import glob
4
location='C:\Users\Poon\Downloads\20211014_SBS_BEMS\20211014_SBS_BEMS\1043 succeed.csv'
5
csvfiles=glob.glob(location)
6
7
df1=pd.DataFrame()
8
9
for file_new_2 in csvfiles:
10
df2=pd.read_csv(file_new_2)
11
df1=pd.concat([df1,df2],ignore_index=True)
12
df1.mean(axis=0)#average for each column
13
df1.mean(axis=1)
14
n = 100 # the number of rows
15
df1.groupby(np.arange(len(df1))//n).mean()
16
17
print(df1)
18
Advertisement
Answer
This code would clean your data and take the mean for every 10th row.
JavaScript
1
6
1
df = df.iloc[1:, :]
2
df = pd.concat([pd.to_datetime(df.iloc[:,0], errors = "coerce"), df.iloc[:, 1:].apply(pd.to_numeric)], axis = 1)
3
df.dropna(inplace = True)
4
df["index"] = df.index//10
5
df.groupby("index").mean()
6