I’m trying to convert HTTP timestamps into standard timestamp for complete data framing and getting time-series plots. I’m looking for an efficient way to do this for the large dataset. My actual data frame is as follows:
# +----------------------+-------+ # | timestamp | col1 | # +----------------------+-------+ # | 10/May/2021:19:48:36 | 714 | # | 10/May/2021:00:28:12 | None | # | 10/May/2021:15:34:26 | 634 | # | 10/May/2021:00:20:25 | 8453 | # | 10/May/2021:14:31:32 | None | # +---------------------+-------+
I have tried couple of the following methods and get errors:
df["timestamp"] = pd.to_datetime(df['timestamp'], errors='coerce', format='%Y-%m-%d %H:%M:%S') df["timestamp"]
This returns me NaT
which is strange!
0 NaT 1 NaT 2 NaT 3 NaT 4 NaT ..
I updated the format and change it from format='%Y-%m-%d %H:%M:%S'
to format='%d/%b/%Y:%H:%M:%S'
and worked and I can use df.plot(x='timestamp', y='col1')
. Additionally I don’t see any changes/formatting in timestamps in plot:
The problem is when I plot it, timestamps related xticks
in x axis will be plotted as they are but I couldn’t limited time stamps to hours in the same day. This is not practical for my plot when events are belong to a same day. I tried to access/filter hours by this:
df['timestamp'].dt.hour
but it returns this error: AttributeError: Can only use .dt accessor with datetimelike values
I searched this error and checks previous posts in this regard but didn’t work for my case like this answer.
Any help will be highly appreciated
Advertisement
Answer
Try via split()
, join()
and to_datetime()
method:
df['timestamp']=pd.to_datetime(df['timestamp'].str.split(':',1).str.join(' '),errors='coerce')
output of df
:
timestamp col1 0 2021-05-10 19:48:36 714 1 2021-05-10 00:28:12 None 2 2021-05-10 15:34:26 634 3 2021-05-10 00:20:25 8453 4 2021-05-10 14:31:32 None