Skip to content
Advertisement

Pyspark, iteratively get values from column containing json string

I wonder how you would iteratively get the values from a json string in pyspark. I have the following format of my data and would like to create the “value” column:

id_1 id_2 json_string value
1 1001 {“1001”:106, “2200”:101} 106
1 2200 {“1001”:106, “2200”:101} 101
JavaScript

Which gives the error Column is not iterable

However, just inserting the key manually works, i.e:

JavaScript

Any tips on solving this problem? It is not possible to manually insert the “id_2” values since there are many thousands of keys within the dataset and the json_string is in reality much longer with many more key-value pairs.

Super thankful for any suggestions!
Regards

Advertisement

Answer

You can use it within expr() which would allow you to concat the string and id_2.

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement