I have been working on a dataframe where one of the column (flight_time) contains flight duration, all of the strings are in 3 different formats for example:
- “
07 h 05 m“ - “
13h 55m“ - “
2h 23m“
I would like to change them all to HH:MM format and finally change the data type from object to time.
Can somebody tell me how to do this?
Advertisement
Answer
It’s not possible to have a time dtype. You can have a datetime64 (pd.DatetimeIndex) or a timedelta64 (pd.TimedeltaIndex). In your case, I think it’s better to have a TimedeltaIndex so you can use the pd.to_timedelta function:
df['flight_time2'] = pd.to_timedelta(df['flight_time']) print(df) # Output flight_time flight_time2 0 07 h 05 m 0 days 07:05:00 1 13h 55m 0 days 13:55:00 2 2h 23m 0 days 02:23:00
If you want individual time from datetime.time, use:
df['flight_time2'] = pd.to_datetime(df['flight_time'].str.findall('d+')
.str.join(':')).dt.time
print(df)
# Output
flight_time flight_time2
0 07 h 05 m 07:05:00
1 13h 55m 13:55:00
2 2h 23m 02:23:00
In this case, flight_time2 has still object dtype:
>>> df.dtypes flight_time object flight_time2 object dtype: object
But each value is an instance of datetime.time:
>>> df.loc[0, 'flight_time2'] datetime.time(7, 5)
In the first case, you can use vectorized method while in the second version is not possible. Furthermore, you loose the dt accessor.