Change a column format while ignoring (or keeping) NaN

I want to change a column from a DataFrame which contains values of this format hh:mm:ss to a column containing the number of minutes (while keeping the NaN values)

I can’t change it directly from the excel file so I’ve tried to do it with pandas (I’m working on a ML model with a health database):

38      00:35:00
39      00:50:00
40      00:45:00
41      01:32:00
42      00:29:00
43           NaN
44      00:45:00
45      00:13:00
46      00:20:00
47      00:31:00
48      00:54:00
49      00:43:00
50      02:33:00

JavaScript
​x
 
    00:35:00
    00:50:00
    00:45:00
    01:32:00
    00:29:00
         NaN
    00:45:00
    00:13:00
    00:20:00
    00:31:00
    00:54:00
    00:43:00
    02:33:00
​

I tried to separate the values from the NaN values using a mask then convert to minutes with str.split()

df1 = df['delay'][df['delay'].notnull()].astype(str).str.split(':').apply(lambda x: int(x[0]) * 60 + int(x[1]))```
df2 = df['delai_ponc_recal_calc'][df['delai_ponc_recal_calc'].isnull()]

JavaScript
 
df1 = df['delay'][df['delay'].notnull()].astype(str).str.split(':').apply(lambda x: int(x[0]) * 60 + int(x[1]))```
df2 = df['delai_ponc_recal_calc'][df['delai_ponc_recal_calc'].isnull()]
​

But then I cannot merge to two series without loosing the order (I get the NaN values with the correct indexes at the end of the merged series)

39        50
40        45
41        92
42        29
44        45
45        13
46        20
47        31
48        54
49        43
50       153
43       NaN

I also tried to go from hh:mm:ss to minutes with datatime.time and timedelta using a loop (without using a mask) but I still can’t have a column (series or DF) with the all the values in minutes while keeping the NaN …

Answer

You can use pd.to_timedelta to convert the delay column to pandas timedelta series then divide it by Timedelta of 1 min to get total minutes:

pd.to_timedelta(df['delay'], errors='coerce') / pd.Timedelta(1, 'min')

JavaScript
 
pd.to_timedelta(df['delay'], errors='coerce') / pd.Timedelta(1, 'min')
​

39     50.0
40     45.0
41     92.0
42     29.0
43      NaN
44     45.0
45     13.0
46     20.0
47     31.0
48     54.0
49     43.0
50    153.0
Name: delay, dtype: float64

JavaScript
 
   50.0
   45.0
   92.0
   29.0
    NaN
   45.0
   13.0
   20.0
   31.0
   54.0
   43.0
  153.0
Name: delay, dtype: float64
​

Advertisement

Answer