Skip to content
Advertisement

Changing column various string formats in pandas

I have been working on a dataframe where one of the column (flight_time) contains flight duration, all of the strings are in 3 different formats for example:

  • 07 h 05 m
  • 13h 55m
  • 2h 23m

I would like to change them all to HH:MM format and finally change the data type from object to time.

Can somebody tell me how to do this?

Advertisement

Answer

It’s not possible to have a time dtype. You can have a datetime64 (pd.DatetimeIndex) or a timedelta64 (pd.TimedeltaIndex). In your case, I think it’s better to have a TimedeltaIndex so you can use the pd.to_timedelta function:

df['flight_time2'] = pd.to_timedelta(df['flight_time'])
print(df)

# Output
  flight_time       flight_time2
0   07 h 05 m    0 days 07:05:00
1     13h 55m    0 days 13:55:00
2      2h 23m    0 days 02:23:00

If you want individual time from datetime.time, use:

df['flight_time2'] = pd.to_datetime(df['flight_time'].str.findall('d+')
                                .str.join(':')).dt.time
print(df)

# Output
  flight_time    flight_time2
0   07 h 05 m        07:05:00
1     13h 55m        13:55:00
2      2h 23m        02:23:00

In this case, flight_time2 has still object dtype:

>>> df.dtypes
flight_time     object
flight_time2    object
dtype: object

But each value is an instance of datetime.time:

>>> df.loc[0, 'flight_time2']
datetime.time(7, 5)

In the first case, you can use vectorized method while in the second version is not possible. Furthermore, you loose the dt accessor.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement