Parse JSON string from Pyspark Dataframe

Question

I have a nested JSON dict that I need to convert to spark dataframe. This JSON dict is present in a dataframe column. I have been trying to parse the dict present in dataframe column using &#8220;from_json&#8221; and &#8220;get_json_object&#8221;, but have been unable to read the data. Here&#8217;s the smalle…

Accepted Answer

The null chars u0000 affect the parsing of the JSON. You can replace them as well:df = spark.read.json('path')df2 = df.withColumn(    'cleansed_value',     F.regexp_replace('value','[u0000/]','')).withColumn(    'parsed',     F.from_json('cleansed_value','context string'))df2.show(20,0)+-----------------------+------------------+------+|value                  |cleansed_value    |parsed|+-----------------------+------------------+------+|/{"context":"data"}|{"context":"data"}|[data]|+-----------------------+------------------+------+

Advertisement

Answer