Skip to content
Advertisement

Pyspark: how to duplicate a row n time in dataframe?

I’ve got a dataframe like this and I want to duplicate the row n times if the column n is bigger than one:

JavaScript

And transform like this:

JavaScript

I think I should use explode, but I don’t understand how it works…
Thanks

Advertisement

Answer

The explode function returns a new row for each element in the given array or map.

One way to exploit this function is to use a udf to create a list of size n for each row. Then explode the resulting array.

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement