How can convert struct column timestamp with start and end into normal pythonic stamp column?

Question

I have a time-series pivot table with struct timestamp column including start and end of time frame of records as follow: Since later I will use timestamps as the index for time-series analysis, I need to convert it into timestamps with just end/start. I have tried to find the solution using regex maybe unsuc…

Accepted Answer

You can extract both values with an extract call:df[["start_timestamp", "end_timestamp"]] = df["timestamp"].str.extract(r'"start":"([^"]*)","end":"([^"]+)')The "start":"([^"]*)","end":"([^"]+) regex matches "start":", then captres any zero or more chars other than " into Group 1 (the start column value) and then matches ","end":" and then captures one or more chars other than " into Group 2 (the end column value).Also, if the data you have is valid JSON, you can parse the JSON instead of using a regex:def extract_startend(x):    j = json.loads(x)    return pd.Series([j["start"], j["end"]])df[["start_timestamp", "end_timestamp"]] = df["timestamp"].apply(extract_startend)Output of print(df.to_string()):                                                                   timestamp  X1  X2               start_timestamp                 end_timestamp0  {"start":"2022-01-19T00:00:00.000+0000","end":"2022-01-20T00:00:.........  25  33  2022-01-19T00:00:00.000+0000  2022-01-20T00:00:00.000+0000

Advertisement

Answer