I have two columns of data in a DataFrame containing a date and a time. Both start as strings. I want them to end up merged as a single column in datetime format.
The head of the DataFrame is:
JavaScript
x
7
1
Date variable value
2
0 '04/10/2020' '00:30' 81.310
3
1 '05/10/2020' '00:30' 121.245
4
2 '06/10/2020' '00:30' 77.020
5
3 '07/10/2020' '00:30' 100.705
6
4 '08/10/2020' '00:30' 114.370
7
They are in a DF called df_flattened
and has about 20k rows and the code I am currently using is:
JavaScript
1
3
1
df_flattened['DateTime'] = df_flattened.apply(lambda x: x['Date'] + ' ' + x['variable'], axis=1)
2
df_flattened['DateTime'] = pd.to_datetime(df_flattened['DateTime'])
3
However, this takes about 2.6s to run and the dataset is going to get a lot bigger in the future. Can anyone suggest a fast way of doing this?
Advertisement
Answer
You can use +
for join columns instead apply
:
JavaScript
1
2
1
df_flattened['DateTime'] = pd.to_datetime(df_flattened['Date'] + ' ' + df_flattened['variable'])
2
Also is possible specify format of joined datetimes:
JavaScript
1
2
1
df_flattened['DateTime'] = pd.to_datetime(df_flattened['Date'] + ' ' + df_flattened['variable'], format='%d/%m/%Y %H:%M')
2
Performance for 20k rows:
JavaScript
1
17
17
1
#20k rows
2
df_flattened = pd.concat([df_flattened] * 4000, ignore_index=True)
3
4
5
In [44]: %%timeit
6
df_flattened['DateTime'] = df_flattened.apply(lambda x: x['Date'] + ' ' + x['variable'], axis=1) :
7
df_flattened['DateTime'] = pd.to_datetime(df_flattened['DateTime']) :
8
:
9
:
10
325 ms ± 26.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
11
12
In [45]: %timeit df_flattened['DateTime'] = pd.to_datetime(df_flattened['Date'] + ' ' + df_flattened['variable'])
13
11.9 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
14
15
In [46]: %timeit df_flattened['DateTime'] = pd.to_datetime(df_flattened['Date'] + ' ' + df_flattened['variable'], format='%d/%m/%Y %H:%M')
16
9.55 ms ± 96.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17