I am trying to use pandas to compute daily climatology. My code is:
JavaScript
x
7
1
import pandas as pd
2
3
dates = pd.date_range('1950-01-01', '1953-12-31', freq='D')
4
rand_data = [int(1000*random.random()) for i in xrange(len(dates))]
5
cum_data = pd.Series(rand_data, index=dates)
6
cum_data.to_csv('test.csv', sep="t")
7
cum_data is the data frame containing daily dates from 1st Jan 1950 to 31st Dec 1953. I want to create a new vector of length 365 with the first element containing the average of rand_data for January 1st for 1950, 1951, 1952 and 1953. And so on for the second element…
Any suggestions how I can do this using pandas?
Advertisement
Answer
You can groupby the day of the year, and the calculate the mean for these groups:
JavaScript
1
2
1
cum_data.groupby(cum_data.index.dayofyear).mean()
2
However, you have the be aware of leap years. This will cause problems with this approach. As alternative, you can also group by the month and the day:
JavaScript
1
13
13
1
In [13]: cum_data.groupby([cum_data.index.month, cum_data.index.day]).mean()
2
Out[13]:
3
1 1 462.25
4
2 631.00
5
3 615.50
6
4 496.00
7
8
12 28 378.25
9
29 427.75
10
30 528.50
11
31 678.50
12
Length: 366, dtype: float64
13