Skip to content
Advertisement

How can convert struct column timestamp with start and end into normal pythonic stamp column?

I have a time-series pivot table with struct timestamp column including start and end of time frame of records as follow:

JavaScript

Since later I will use timestamps as the index for time-series analysis, I need to convert it into timestamps with just end/start. I have tried to find the solution using regex maybe unsuccessfully based on this post as follows:

JavaScript

but I get:

ValueError: Columns must be same length as key

so I try to reach following expected dataframe:

JavaScript

Advertisement

Answer

You can extract both values with an extract call:

JavaScript

The "start":"([^"]*)","end":"([^"]+) regex matches "start":", then captres any zero or more chars other than " into Group 1 (the start column value) and then matches ","end":" and then captures one or more chars other than " into Group 2 (the end column value).

Also, if the data you have is valid JSON, you can parse the JSON instead of using a regex:

JavaScript

Output of print(df.to_string()):

JavaScript
Advertisement