Count number of days in each continuous period pandas

Question

Suppose I have next df N03_zero (date_code is already datetime): Millions of rows with date_code assigned to some item_code. I am trying to get the number of days of each continuous period for each item_code, all other similar questions doesn't helped me. The expected df should be: Once days sequence breaks, it should count days in this sequence and then

Accepted Answer

For consecutive days compare difference by Series.diff in days by Series.dt.days for not equal 1 by Series.ne with cumulative sum by Series.cumsum and then use GroupBy.size, remove second level by DataFrame.droplevel and create DataFrame:df['date_code'] = pd.to_datetime(df['date_code'])df1= (df.groupby(['item_code',df['date_code'].diff().dt.days.ne(1).cumsum()], sort=False)        .size()        .droplevel(1)        .reset_index(name='continuous_days'))print (df1)       item_code  continuous_days0  8028558104973                31  8028558104973                22  7622300443269                13  7622300443269                24         513082                3And then aggregate values by named aggregations by GroupBy.agg:df2 = (df1.groupby('item_code', sort=False, as_index=False)          .agg(**{'no. periods': ('continuous_days','size'),                 'min':('continuous_days','min'),                 'max':('continuous_days','max'),                 'mean':('continuous_days','mean')}))print (df2)       item_code  no. periods  min  max  mean0  8028558104973            2    2    3   2.51  7622300443269            2    1    2   1.52         513082            1    3    3   3.0

Advertisement

Answer