PySpark write a function to count non zero values of given columns

Question

I want to have a function that will take as input column names and grouping conditions and based on that for each column it will return the count of non zero values for each column. Something like this, but include non-zero condition as well. Answer You can use a list comprehension to generate the list of aggregation expressions:

Accepted Answer

You can use a list comprehension to generate the list of aggregation expressions:import pyspark.sql.functions as Fdef count_non_zero (df, features, grouping):    return df.groupBy(*grouping).agg(*[F.count(F.when(F.col(c) != 0, 1)).alias(c) for c in features])

Advertisement

Answer