My data consist of 1million rows. A sample look like this:
_id:object("603678958a6eade21c0790b8") id1:3758 date2:2010-01-01 time3:00:05:00 date4 :2009-12-31 time5:19:05:00 id6 :2 id7:-79.09 id8:35.97 id9:5.5 id10:0 id11:-99999 id12 :0 id13 :-9999 c14:"U" id15:0 id16:99 id17:0 id18:-99 id19:-9999 id20:33 id21:0 id22:-99 id23:0
The thing is that date2 and date4 are in the form that i want but they are string and i want to convert them to date. The code i have used look like this:
df['date4'] = df['date4'].astype('datetime64[ns]') df['date2'] = df['date2'].astype('datetime64[ns]') df['time3'] = df['time3'].apply(lambda x:datetime.datetime.strptime(x[0]+x[1]+":"+x[2]+x[3], '%H:%M')) df['time5'] = df['time5'].apply( lambda x: datetime.datetime.strptime(x[0] + x[1] + ":" + x[2] + x[3], '%H:%M')) df['date2'] = df['date2'].apply(lambda x: arrow.get(x).format("YYYY-MM-DD")) df['date4'] = df['date4'].apply(lambda x: arrow.get(x).format("YYYY-MM-DD")) df['time3'] = df['time3'].apply(lambda x: arrow.get(x).format("HH:mm:ss")) df['time5'] = df['time5'].apply(lambda x: arrow.get(x).format("HH:mm:ss"))
Do i need to convert them before inserting or after? Does anyone know how i can do that?
Advertisement
Answer
If it were me, I’d want to combine date2/time3 into one column, and date4/time5, as in:
df['date2'] = (df['date2']+'T'+df['time3']).astype('datetime64') df['date4'] = (df['date4']+'T'+df['time5']).astype('datetime64')