How to datetime parse a non-standardized time format

I would like to create datetime objects from a list of string timecodes like these. However, parse interprets incorrectly for my use case.

from datetime import datetime
from dateutil import parser

timecodes = ['0:00', '0:01', '1:01', '10:01', '1:10:01']

dt = parser.parse(timecode)
print(dt)

JavaScript
​x
 
from datetime import datetime
from dateutil import parser
​
timecodes = ['0:00', '0:01', '1:01', '10:01', '1:10:01']
​
dt = parser.parse(timecode)
print(dt)
​

The list above comes from YouTube’s transcript timecodes. When copied from the site, they use a variable format to designate hours, minutes, and time, based on elapsed time:

0:00     # 0 minutes, 0 seconds
0:01     # 0 minutes, 1 seconds
1:01     # 1 minutes, 1 seconds
10:01    # 10 minutes, 1 seconds
1:10:01  # 1 hours, 10 minutes, 1 seconds

JavaScript
 
00     # 0 minutes, 0 seconds
01     # 0 minutes, 1 seconds
01     # 1 minutes, 1 seconds
01    # 10 minutes, 1 seconds
10:01  # 1 hours, 10 minutes, 1 seconds
​

and parse results in (comments are my interpretations):

2022-10-24 00:00:00    #0 minutes, 0 seconds
2022-10-24 00:01:00    #1 minutes, 0 seconds
2022-10-24 01:01:00    #1 hours, 1 minutes, 0 seconds
2022-10-24 10:01:00    #10 hours, 1 minutes, 0 seconds
2022-10-24 01:10:01    #1 hours, 10 minutes, 1 seconds

JavaScript
 
2022-10-24 00:00:00    #0 minutes, 0 seconds
2022-10-24 00:01:00    #1 minutes, 0 seconds
2022-10-24 01:01:00    #1 hours, 1 minutes, 0 seconds
2022-10-24 10:01:00    #10 hours, 1 minutes, 0 seconds
2022-10-24 01:10:01    #1 hours, 10 minutes, 1 seconds
​

i.e. if a string doesn’t consist of a full timecode including hours, minutes, seconds, then parse appears to think that minutes are hours, and seconds are minutes.

How can I either dynamically parse the list to default interpretation to minutes & seconds instead of hours & minutes, or alternatively adjust the timecodes intelligently so that they conform to the parse format?

Answer

This is a little tricky but should work:

import datetime
timecodes = ['0:00', '0:01', '1:01', '10:01', '1:10:01']
zeroes = ['0','0','0']
dt = []
for i in timecodes:
    sep = i.split(':')
    sep = zeroes[:3-len(sep)] + sep
    dt.append(str(datetime.timedelta(seconds = sum([int(s) * 60**(2-sep.index(s)) for s in sep]))))

JavaScript
 
import datetime
timecodes = ['0:00', '0:01', '1:01', '10:01', '1:10:01']
zeroes = ['0','0','0']
dt = []
for i in timecodes:
    sep = i.split(':')
    sep = zeroes[:3-len(sep)] + sep
    dt.append(str(datetime.timedelta(seconds = sum([int(s) * 60**(2-sep.index(s)) for s in sep]))))
​

Output:

dt = ['0:00:00', '0:00:01', '0:01:01', '0:10:01', '1:10:01']

JavaScript
 
dt = ['0:00:00', '0:00:01', '0:01:01', '0:10:01', '1:10:01']
​

Advertisement

Answer