My data consist of 1million rows. A sample look like this:
JavaScript
x
25
25
1
_id:object("603678958a6eade21c0790b8")
2
id1:3758
3
date2:2010-01-01
4
time3:00:05:00
5
date4 :2009-12-31
6
time5:19:05:00
7
id6 :2
8
id7:-79.09
9
id8:35.97
10
id9:5.5
11
id10:0
12
id11:-99999
13
id12 :0
14
id13 :-9999
15
c14:"U"
16
id15:0
17
id16:99
18
id17:0
19
id18:-99
20
id19:-9999
21
id20:33
22
id21:0
23
id22:-99
24
id23:0
25
The thing is that date2 and date4 are in the form that i want but they are string and i want to convert them to date. The code i have used look like this:
JavaScript
1
12
12
1
df['date4'] = df['date4'].astype('datetime64[ns]')
2
df['date2'] = df['date2'].astype('datetime64[ns]')
3
4
5
df['time3'] = df['time3'].apply(lambda x:datetime.datetime.strptime(x[0]+x[1]+":"+x[2]+x[3], '%H:%M'))
6
df['time5'] = df['time5'].apply( lambda x: datetime.datetime.strptime(x[0] + x[1] + ":" + x[2] + x[3], '%H:%M'))
7
8
df['date2'] = df['date2'].apply(lambda x: arrow.get(x).format("YYYY-MM-DD"))
9
df['date4'] = df['date4'].apply(lambda x: arrow.get(x).format("YYYY-MM-DD"))
10
df['time3'] = df['time3'].apply(lambda x: arrow.get(x).format("HH:mm:ss"))
11
df['time5'] = df['time5'].apply(lambda x: arrow.get(x).format("HH:mm:ss"))
12
Do i need to convert them before inserting or after? Does anyone know how i can do that?
Advertisement
Answer
If it were me, I’d want to combine date2/time3 into one column, and date4/time5, as in:
JavaScript
1
3
1
df['date2'] = (df['date2']+'T'+df['time3']).astype('datetime64')
2
df['date4'] = (df['date4']+'T'+df['time5']).astype('datetime64')
3