Pyspark, iteratively get values from column containing json string

Question

I wonder how you would iteratively get the values from a json string in pyspark. I have the following format of my data and would like to create the &#8220;value&#8221; column: id_1 id_2 json_string value 1 1001 {&#8220;1001&#8221;:106, &#8220;2200&#8221;:101} 106 1 2200 {&#8220;1001&#8221;:106, &#8220;2200&#…

Accepted Answer

You can use it within expr() which would allow you to concat the string and id_2.data_ls = [    ("1", "1001", '''{"1001":106, "2200":101}'''),     ("1", "2200", '''{"1001":106, "2200":101}''')]data_sdf = spark.createDataFrame(data_ls, ("id1", "id2", "jstr"))# +---+----+--------------------+# |id1| id2|                jstr|# +---+----+--------------------+# |  1|1001|{"1001":106, "220...|# |  1|2200|{"1001":106, "220...|# +---+----+--------------------+data_sdf.     withColumn('val', func.expr('get_json_object(jstr, concat("$.", id2))')).     show(truncate=False)# +---+----+------------------------+---+# |id1|id2 |jstr                    |val|# +---+----+------------------------+---+# |1  |1001|{"1001":106, "2200":101}|106|# |1  |2200|{"1001":106, "2200":101}|101|# +---+----+------------------------+---+

id_1	id_2	json_string	value
1	1001	{“1001”:106, “2200”:101}	106
1	2200	{“1001”:106, “2200”:101}	101

Advertisement

Answer