Spark Calculate Standard deviation row wise

Question

I need to calculate Standard deviation row wise assuming that I already have a column with calculated mean per row.I tried this but I got the following error Answer Your code is completely mixed up (at its current state it won't even cause the exception you described in the question). sqrt should be placed outside reduce call:

Accepted Answer

Your code is completely mixed up (at its current state it won&#8217;t even cause the exception you described in the question). sqrt should be placed outside reduce call:from pyspark.sql.functions import col, sqrtfrom operator import addfrom functools import reducedf = spark.createDataFrame([("_", "_", 2, 1, 2, 3)], ("_1", "_2", "mean"))cols = df.columns[3:]sd = sqrt(    reduce(add, ((col(x) - col("mean")) ** 2 for x in cols)) / (len(cols) - 1))sd# Column<b'SQRT((((POWER((_4 - mean), 2) + POWER((_5 - mean), 2)) + POWER((_6 - mean), 2)) / 2))'>df.withColumn("sd", sd).show()# +---+---+----+---+---+---+---+         # | _1| _2|mean| _4| _5| _6| sd|# +---+---+----+---+---+---+---+# |  _|  _|   2|  1|  2|  3|1.0|# +---+---+----+---+---+---+---+

Advertisement

Answer