Skip to content
Advertisement

Pandas dataframe: Sum up rows by date and keep only one row per day without timestamp

I have such a dataframe:

ds  y
2018-07-25 22:00:00 1
2018-07-25 23:00:00 2
2018-07-26 00:00:00 3
2018-07-26 01:00:00 4
2018-07-26 02:00:00 5

What I want to get is a new dataframe which looks like this

ds  y
2018-07-25 3
2018-07-26 12

I want to get a new dataframe df1 where all the entries of one day are summed up in y and I only want to keep one column of this day without a timestamp.

What I did so far is this:

df1 = df.groupby(df.index.date).transform(lambda x: x[:24].sum()) 

24 because I have 24 entries every day (for every hour). I get the correct sum for every day but I also get 24 rows for every day together with the existing timestamps. How can I achieve what I want?

Advertisement

Answer

If need sum all values per days then filtering first 24 rows is not necessary:

df1 = df.groupby(df.index.date)['y'].sum().reset_index()
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement