Skip to content
Advertisement

Spark: How to parse JSON string of nested lists to spark data frame?

How to parse JSON string of nested lists to spark data frame in pyspark ?

Input data frame:

JavaScript

Expected output:

JavaScript

Example code:

JavaScript

There are few examples, but I can not figure out how to do it:

Advertisement

Answer

With some replacements in the strings and by splitting you can get the desired result:

JavaScript

Explanation:

  1. trim(both '][' from json) : removes trailing and leading caracters [ and ], get someting like: 1572393600000, 1.000],[1572480000000, 1.007

  2. Now you can split by ],[ (\ is for escaping the brackets)

  3. transform takes the array from the split and for each element, it splits by comma and creates struct col_2 and col_3

  4. explode the array of structs you get from the transform and star expand the struct column

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement