Skip to content
Advertisement

Pandas: efficiently inserting a large number of rows

I have a large dataframe in this format, call this df:

index val1 val2
0 0.2 0.1
1 0.5 0.7
2 0.3 0.4

I have a row I will be inserting, call this myrow:

index val1 val2
-1 0.9 0.9

I wish to insert this row 3 times after every row in the original dataframe, i.e.:

index val1 val2
0 0.2 0.1
-1 0.9 0.9
-1 0.9 0.9
-1 0.9 0.9
1 0.5 0.7
-1 0.9 0.9
-1 0.9 0.9
-1 0.9 0.9
2 0.3 0.4
-1 0.9 0.9
-1 0.9 0.9
-1 0.9 0.9

This is straightforward with a bit of looping. TLDR: how do I do this more efficiently?

Let’s make a repeat rows function, and create our set of 3 repeats:

JavaScript

Now we have our 3 repeats:

index val1 val2
-1 0.9 0.9
-1 0.9 0.9
-1 0.9 0.9

Finally, we can loop over the original df‘s rows, and concat repeats to the row, and concat the result of all of those together:

JavaScript

We now have the desired result!

The problem is, this is very slow, and I’m looking for a faster solution.

I’m guessing a better solution would follow this pattern:

JavaScript

However, I’m not sure how to do such a loc assignment. How can I make my solution more efficient?

Advertisement

Answer

reset_index so that df has a simple RangeIndex. Then we can do math with tiling and repeats to create an Index that when sorted will place 3 of the myrow rows between each row of your DataFrame. Finally remove this Index and get back to a normal RangeIndex.

Sample Data

JavaScript

Code

JavaScript

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement