Skip to content
Advertisement

How to calculate the hourly average of my data?

I have the data from several sensors stored in a CSV file.

The time resolution is one minute.

The daily average for each sensor was easily calculated.

I need to calculate the hourly average of the data.

CSV data set format:

5/27/2016 0:00:00 Temperature 39 25
5/27/2016 0:00:00 Pressure    12 39
5/27/2016 0:00:00 Humidity    19 79
.
.
.
5/27/2016 0:01:00 Temperature 39 25
  • First is date,
  • second is time,
  • third is parameter name,
  • fourth is parameter code, and
  • last one is value.

Relevant part of my code:

import pandas as pd
import numpy as np

df = pd.read_csv('2016-05-27.csv')

Count_Row=df.shape[0] #gives number of row count

print("The number of rows is", Count_Row)

Parameter_code = 11201
timestamp= []
Par_value = []

n = 0
for rown in range(0,Count_Row):
   if df.iloc[rown,3] == Parameter_code:
       temp = df.iloc[rown, 4]
       if temp != 'nan':
           Par_value.append(float (temp))
           timestamp.append(df.iloc[rown, 1])
           n += 1

print("total sapmles", n) 

print (timestamp)
daily_avg = np.average(Par_value)        
print("The daily average ", daily_avg)

Question: Is there any method to this hourly averaging?

Advertisement

Answer

I think you need avoid loops in pandas, because slow and use:

  • add parameter name for columns names and parse_date for datetimes to read_csv
  • filter by boolean indexing
  • resample and aggregate mean, NaNs are omit by default

import pandas as pd

temp=u"""
5/27/2016,0:00:00,Temperature,39,25
5/27/2016,0:00:00,Pressure,12,39
5/27/2016,0:00:00,Temperature,39,NaN
5/27/2016,0:01:00,Temperature,39,25"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
names = ['date','time','name','parameter code','value']
df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=[['date','time']], names=names)
print (df)
            date_time         name  parameter code  value
0 2016-05-27 00:00:00  Temperature              39   25.0
1 2016-05-27 00:00:00     Pressure              12   39.0
2 2016-05-27 00:00:00  Temperature              39    NaN
3 2016-05-27 00:01:00  Temperature              39   25.0

df = df[df['parameter code'] == 39]
print (df)
            date_time         name  parameter code  value
0 2016-05-27 00:00:00  Temperature              39   25.0
2 2016-05-27 00:00:00  Temperature              39    NaN
3 2016-05-27 00:01:00  Temperature              39   25.0

df1 = df.resample('H', on='date_time')['value'].mean().reset_index(name='mean_val')
print (df1)
   date_time  mean_val
0 2016-05-27      25.0
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement