Given a dataframe like:
JavaScript
x
13
13
1
import numpy as np
2
import pandas as pd
3
4
df = pd.DataFrame(
5
{'Date' : pd.date_range('1/1/2011', periods=5, freq='3675S'),
6
'Num' : np.random.rand(5)})
7
Date Num
8
0 2011-01-01 00:00:00 0.580997
9
1 2011-01-01 01:01:15 0.407332
10
2 2011-01-01 02:02:30 0.786035
11
3 2011-01-01 03:03:45 0.821792
12
4 2011-01-01 04:05:00 0.807869
13
I would like to remove the ‘minutes’ and ‘seconds’ information.
The following (mostly stolen from: How to remove the ‘seconds’ of Pandas dataframe index?) works okay,
JavaScript
1
8
1
df = df.assign(Date = lambda x: pd.to_datetime(x['Date'].dt.strftime('%Y-%m-%d %H')))
2
Date Num
3
0 2011-01-01 00:00:00 0.580997
4
1 2011-01-01 01:00:00 0.407332
5
2 2011-01-01 02:00:00 0.786035
6
3 2011-01-01 03:00:00 0.821792
7
4 2011-01-01 04:00:00 0.807869
8
but it feels strange to convert a datetime to a string then back to a datetime. Is there a way to do this more directly?
Advertisement
Answer
dt.round
This is how it should be done… use dt.round
JavaScript
1
9
1
df.assign(Date=df.Date.dt.round('H'))
2
3
Date Num
4
0 2011-01-01 00:00:00 0.577957
5
1 2011-01-01 01:00:00 0.995748
6
2 2011-01-01 02:00:00 0.864013
7
3 2011-01-01 03:00:00 0.468762
8
4 2011-01-01 04:00:00 0.866827
9
OLD ANSWER
One approach is to set the index and use resample
JavaScript
1
9
1
df.set_index('Date').resample('H').last().reset_index()
2
3
Date Num
4
0 2011-01-01 00:00:00 0.577957
5
1 2011-01-01 01:00:00 0.995748
6
2 2011-01-01 02:00:00 0.864013
7
3 2011-01-01 03:00:00 0.468762
8
4 2011-01-01 04:00:00 0.866827
9
Another alternative is to strip the date
and hour
components
JavaScript
1
11
11
1
df.assign(
2
Date=pd.to_datetime(df.Date.dt.date) +
3
pd.to_timedelta(df.Date.dt.hour, unit='H'))
4
5
Date Num
6
0 2011-01-01 00:00:00 0.577957
7
1 2011-01-01 01:00:00 0.995748
8
2 2011-01-01 02:00:00 0.864013
9
3 2011-01-01 03:00:00 0.468762
10
4 2011-01-01 04:00:00 0.866827
11