Skip to content
Advertisement

Python fast DataFrame concatenation

I wrote a code to concatenate parts of a DataFrame to the same DataFrame as to normalize the occurrence of rows as per a certain column.

JavaScript

and this is unbelievably slow. Is there a way to fast concatenate DataFrame without creating copies of it?

Advertisement

Answer

There are a couple of things that stand out.


To begin with, the loop

JavaScript

is going to be very slow. Pandas is not built for these dynamic concatenations, and I suspect the performance is quadratic for what you’re doing.

Instead, perhaps you could try

JavaScript

which just creates a list first, and then calls concat for a one-shot concatenation on the entire list. This should bring the complexity to being linear, and I suspect it will have lower constants in any case.


Another thing which would reduce these small concats is calling groupby-apply. Instead of iterating over the result of groupby, write the loop body as a function, and call apply on it. Let Pandas figure out best how to concat all of the results into a single DataFrame.

However, even if you prefer to keep the loop, I’d just append things into a list, and just concat everything at the end:

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement