I have a spark dataframe like
key value 1111 {'fruit1': 'apple', 'fruit2': 'mango'} 2222 {'fruit1': 'orange'}
I want to convert the JSON (string) to new columns
key fruit1 fruit2 1111 apple mango 2222 orange null
I don’t want to manually specify the keys from JSON as there are more than 100 keys
Advertisement
Answer
json_keys = df.selectExpr('explode(json_object_keys(value))').select(F.collect_set('col')).first()[0] df.select('key', F.json_tuple('value', *json_keys).alias(*json_keys)).show()
+----+------+------+ | key|fruit1|fruit2| +----+------+------+ |1111| apple| mango| |2222|orange| null| +----+------+------+