How to count pandas datetime months by continuous season

Question

I have a large time-series dataframe. The column has already been formatted as datetime. Such as I want to plot the sample numbers for each season. Such as the following. The values are the count number of samples in that season. I do make a little search and realize I can create a dictionary to convert the months into seasons.

Accepted Answer

There is a fast way to solve it, but it’s not very orthodox…You create a column ‘season’, and with a np.where(), you assign the season. At the beginning, you say winter for the first 3 month, spring, for the 3 next, and so. And then, you apply a shift(-1) on the column to shift it by one row back. Then, you’ve got your seasons (just ffill the las nan). You can then solve your problem in a lazy way.If you’re not confortable with the code, tell me, I will edit it.EDIT:I assume that the dates are in the index. If not, you should apply a dt.month instead of .month.I decompose it to make it clear_condtion_spring = (df.index.month>=4)&(df.index.month<=6)_condition_summer = (df.index.month>7)&(df.index.month<=9)_condition_automn = (df.index.month>=10)@(df.index.month<=12)df['Season'] = np.where(_condition_winter,'Winter',np.where(_condtion_spring,'Spring',np.where(_condition_summer,'Summer',np.where(_condition_automn,'Automn',np.nan))))df['Season'] = df['Season'].shift(-1).fillna(method='ffill')EDIT 2:Here a complete example :dates = pd.date_range("1983-09-01","1985-12-31",freq="1M")df = pd.DataFrame(np.random.randint(100, 200,size=28)/100,index =dates,columns=["Sample"])df = df.sort_index()_condition_winter = (df.index.month>=1)&(df.index.month<=3)_condtion_spring = (df.index.month>=4)&(df.index.month<=6)_condition_summer = (df.index.month>=7)&(df.index.month<=9)_condition_automn = (df.index.month>=10)@(df.index.month<=12)df['Season'] = np.where(_condition_winter,'Winter',np.where(_condtion_spring,'Spring',np.where(_condition_summer,'Summer',np.where(_condition_automn,'Automn',np.nan))))df['Season'] = df['Season']+'_'+df.index.strftime(date_format='%Y')df['Season'] = df['Season'].shift(-1).fillna(method='ffill')print('Sample for winter 1984 = ',df[df.Season=='Winter_1984'].Sample.sum())EDIT 3:Here the complete example if you have several rows for same month:#### Build our df#### This is just to make it clear that we will have 2 rows of each month. It could be more or less.dates = pd.date_range("1983-09-01","1985-12-31",freq="1M")dates2 = pd.date_range("1983-09-01","1985-12-31",freq="1M")df1 = pd.DataFrame(np.random.randint(100, 200,size=28)/100,index =dates,columns=["Sample"]).append(pd.DataFrame(np.random.randint(100, 200,size=28)/100,index =dates2,columns=["Sample"]))df1 = df1.sort_index()#### Now, to keep it clear, even if we could do faster, let's do a dataframe with 1 row per month with total of sample each timedf = pd.DataFrame()df = df1.groupby(df1.index).sum()#### Let's sort by date to be sure that it won't me messy#### If you've got a 'Date' column and not the index, apply a .sort_values('Date') instead of sort_indexdf = df.sort_index()#### If youve got a 'Date' column, it will be df.Date.dt.month istead of df.index.month_condition_winter = (df.index.month>=1)&(df.index.month<=3)_condtion_spring = (df.index.month>=4)&(df.index.month<=6)_condition_summer = (df.index.month>=7)&(df.index.month<=9)_condition_automn = (df.index.month>=10)@(df.index.month<=12)df['Season'] = np.where(_condition_winter,'Winter',np.where(_condtion_spring,'Spring',np.where(_condition_summer,'Summer',np.where(_condition_automn,'Automn',np.nan))))df['Season'] = df['Season']+'_'+df.index.strftime(date_format='%Y')df['Season'] = df['Season'].shift(-1).fillna(method='ffill')print('Sample for winter 1984 = ',df[df.Season=='Winter_1984'].Sample.sum())

Advertisement

Answer