Skip to content
Advertisement

Parse JSON string from Pyspark Dataframe

I have a nested JSON dict that I need to convert to spark dataframe. This JSON dict is present in a dataframe column. I have been trying to parse the dict present in dataframe column using “from_json” and “get_json_object”, but have been unable to read the data. Here’s the smallest snippet of the source data that I’ve been trying to read:

JavaScript

I need to extract the nested dict value. I used below code to clean the data and read it into a dataframe

JavaScript

I get a null dataframe each time I run the above code. Please help.

Tried below stuff and it didn’t work: PySpark: Read nested JSON from a String Type Column and create columns

Also tried to write it to a JSON file and read it. It didn’t work as well: reading a nested JSON file in pyspark

Advertisement

Answer

The null chars u0000 affect the parsing of the JSON. You can replace them as well:

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement