I have the data from several sensors stored in a CSV file.
The time resolution is one minute.
The daily average for each sensor was easily calculated.
I need to calculate the hourly average of the data.
CSV data set format:
5/27/2016 0:00:00 Temperature 39 25 5/27/2016 0:00:00 Pressure 12 39 5/27/2016 0:00:00 Humidity 19 79 . . . 5/27/2016 0:01:00 Temperature 39 25
- First is date,
- second is time,
- third is parameter name,
- fourth is parameter code, and
- last one is value.
Relevant part of my code:
import pandas as pd import numpy as np df = pd.read_csv('2016-05-27.csv') Count_Row=df.shape[0] #gives number of row count print("The number of rows is", Count_Row) Parameter_code = 11201 timestamp= [] Par_value = [] n = 0 for rown in range(0,Count_Row): if df.iloc[rown,3] == Parameter_code: temp = df.iloc[rown, 4] if temp != 'nan': Par_value.append(float (temp)) timestamp.append(df.iloc[rown, 1]) n += 1 print("total sapmles", n) print (timestamp) daily_avg = np.average(Par_value) print("The daily average ", daily_avg)
Question: Is there any method to this hourly averaging?
Advertisement
Answer
I think you need avoid loops in pandas, because slow and use:
- add parameter
name
for columns names andparse_date
for datetimes toread_csv
- filter by
boolean indexing
resample
and aggregatemean
,NaN
s are omit by default
import pandas as pd temp=u""" 5/27/2016,0:00:00,Temperature,39,25 5/27/2016,0:00:00,Pressure,12,39 5/27/2016,0:00:00,Temperature,39,NaN 5/27/2016,0:01:00,Temperature,39,25""" #after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv' names = ['date','time','name','parameter code','value'] df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=[['date','time']], names=names) print (df) date_time name parameter code value 0 2016-05-27 00:00:00 Temperature 39 25.0 1 2016-05-27 00:00:00 Pressure 12 39.0 2 2016-05-27 00:00:00 Temperature 39 NaN 3 2016-05-27 00:01:00 Temperature 39 25.0 df = df[df['parameter code'] == 39] print (df) date_time name parameter code value 0 2016-05-27 00:00:00 Temperature 39 25.0 2 2016-05-27 00:00:00 Temperature 39 NaN 3 2016-05-27 00:01:00 Temperature 39 25.0 df1 = df.resample('H', on='date_time')['value'].mean().reset_index(name='mean_val') print (df1) date_time mean_val 0 2016-05-27 25.0