I have the data from several sensors stored in a CSV file.
The time resolution is one minute.
The daily average for each sensor was easily calculated.
I need to calculate the hourly average of the data.
CSV data set format:
5/27/2016 0:00:00 Temperature 39 25 5/27/2016 0:00:00 Pressure 12 39 5/27/2016 0:00:00 Humidity 19 79 . . . 5/27/2016 0:01:00 Temperature 39 25
- First is date,
- second is time,
- third is parameter name,
- fourth is parameter code, and
- last one is value.
Relevant part of my code:
import pandas as pd
import numpy as np
df = pd.read_csv('2016-05-27.csv')
Count_Row=df.shape[0] #gives number of row count
print("The number of rows is", Count_Row)
Parameter_code = 11201
timestamp= []
Par_value = []
n = 0
for rown in range(0,Count_Row):
if df.iloc[rown,3] == Parameter_code:
temp = df.iloc[rown, 4]
if temp != 'nan':
Par_value.append(float (temp))
timestamp.append(df.iloc[rown, 1])
n += 1
print("total sapmles", n)
print (timestamp)
daily_avg = np.average(Par_value)
print("The daily average ", daily_avg)
Question: Is there any method to this hourly averaging?
Advertisement
Answer
I think you need avoid loops in pandas, because slow and use:
- add parameter
namefor columns names andparse_datefor datetimes toread_csv - filter by
boolean indexing resampleand aggregatemean,NaNs are omit by default
import pandas as pd
temp=u"""
5/27/2016,0:00:00,Temperature,39,25
5/27/2016,0:00:00,Pressure,12,39
5/27/2016,0:00:00,Temperature,39,NaN
5/27/2016,0:01:00,Temperature,39,25"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
names = ['date','time','name','parameter code','value']
df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=[['date','time']], names=names)
print (df)
date_time name parameter code value
0 2016-05-27 00:00:00 Temperature 39 25.0
1 2016-05-27 00:00:00 Pressure 12 39.0
2 2016-05-27 00:00:00 Temperature 39 NaN
3 2016-05-27 00:01:00 Temperature 39 25.0
df = df[df['parameter code'] == 39]
print (df)
date_time name parameter code value
0 2016-05-27 00:00:00 Temperature 39 25.0
2 2016-05-27 00:00:00 Temperature 39 NaN
3 2016-05-27 00:01:00 Temperature 39 25.0
df1 = df.resample('H', on='date_time')['value'].mean().reset_index(name='mean_val')
print (df1)
date_time mean_val
0 2016-05-27 25.0