Counting consecutive occurrences of a specific value in PySpark

Question

I have a column named info defined as well: I would like to count the consecutive occurrences of 1s and insert 0 otherwise. The final column would be: I tried using the following function, but it didn&#8217;t work. Answer From Adding a column counting cumulative pervious repeating values, credits to @blackbis…

Accepted Answer

From Adding a column counting cumulative pervious repeating values, credits to @blackbishopfrom pyspark.sql import functions as F, Windowdf = spark.createDataFrame([0, 0, 0, 0, 1, 1, 0, 0, 1], 'int').toDF('info')df.withColumn("ID", F.monotonically_increasing_id())     .withColumn("group",            F.row_number().over(Window.orderBy("ID"))            - F.row_number().over(Window.partitionBy("info").orderBy("ID"))    )     .withColumn("Result", F.when(F.col('info') != 0, F.row_number().over(Window.partitionBy("group").orderBy("ID"))).otherwise(F.lit(0)))    .orderBy("ID")    .drop("ID", "group")    .show()+----+------+|info|Result|+----+------+|   0|     0||   0|     0||   0|     0||   0|     0||   1|     1||   1|     2||   0|     0||   0|     0||   1|     1|+----+------+

Advertisement

Answer