Skip to content
Advertisement

Most efficient way to split up a dataframe into smaller dataframes

I am writing a python program that will parse a large dataframe (tens of thousands of lines) into smaller dataframes based on a column value, and it needs to be fairly efficient, because the user can change the ways they break up the dataframe, and I would like the output to update dynamically.

Example input:

id Column_1 Column_2
1 Oct 10000$
1 Dec 9000$
2 Oct 3400$
3 Dec 20000$
2 Nov 9000$
1 Nov 15000$

Example Output:

id Column_1 Column_2
1 Oct 10000$
1 Nov 15000$
1 Dec 9000$
id Column_1 Column_2
2 Oct 3400$
2 Nov 9000$
id Column_1 Column_2
3 Dec 20000$

The naïve way, in my mind, is to do something like this:

JavaScript

But I believe this would be looping over the same data more times than is necessary, which is inefficient. Is there a fast way of doing this?


Update

Did a little software drag racing. Here are the results:

JavaScript

9.96 ms ± 1.26 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

JavaScript

1.28 ms ± 92.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

JavaScript

9.19 ms ± 885 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Does anyone know why the second solution would be so much faster?

Advertisement

Answer

here is one way to do it

JavaScript
JavaScript
JavaScript
JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement