I have a large dataframe in this format, call this df:
| index | val1 | val2 |
|---|---|---|
| 0 | 0.2 | 0.1 |
| 1 | 0.5 | 0.7 |
| 2 | 0.3 | 0.4 |
I have a row I will be inserting, call this myrow:
| index | val1 | val2 |
|---|---|---|
| -1 | 0.9 | 0.9 |
I wish to insert this row 3 times after every row in the original dataframe, i.e.:
| index | val1 | val2 |
|---|---|---|
| 0 | 0.2 | 0.1 |
| -1 | 0.9 | 0.9 |
| -1 | 0.9 | 0.9 |
| -1 | 0.9 | 0.9 |
| 1 | 0.5 | 0.7 |
| -1 | 0.9 | 0.9 |
| -1 | 0.9 | 0.9 |
| -1 | 0.9 | 0.9 |
| 2 | 0.3 | 0.4 |
| -1 | 0.9 | 0.9 |
| -1 | 0.9 | 0.9 |
| -1 | 0.9 | 0.9 |
This is straightforward with a bit of looping. TLDR: how do I do this more efficiently?
Let’s make a repeat rows function, and create our set of 3 repeats:
import pandas as pd
import numpy as np
def repeat_rows(df, n):
newdf = pd.DataFrame(np.repeat(df.values, n, axis=0))
newdf.columns = df.columns
return newdf
repeats = repeat_rows(myrow.to_frame().T, 3)
Now we have our 3 repeats:
| index | val1 | val2 |
|---|---|---|
| -1 | 0.9 | 0.9 |
| -1 | 0.9 | 0.9 |
| -1 | 0.9 | 0.9 |
Finally, we can loop over the original df‘s rows, and concat repeats to the row, and concat the result of all of those together:
blocks = []
for _, row in df.iterrows():
blocks.append(pd.concat([row.to_frame().T, repeats]))
result = pd.concat(blocks)
We now have the desired result!
The problem is, this is very slow, and I’m looking for a faster solution.
I’m guessing a better solution would follow this pattern:
result = repeat_rows(df, 4) result.loc[LAST_3_ROWS_IN_EACH_BLOCK_OF_4] = myrow
However, I’m not sure how to do such a loc assignment. How can I make my solution more efficient?
Advertisement
Answer
reset_index so that df has a simple RangeIndex. Then we can do math with tiling and repeats to create an Index that when sorted will place 3 of the myrow rows between each row of your DataFrame. Finally remove this Index and get back to a normal RangeIndex.
Sample Data
import pandas as pd
import numpy as np
myrow = pd.DataFrame({'index': [-1], 'val1': [0.9], 'val2': [0.9]})
df = pd.DataFrame({'index': [0,1,2],
'val1': [0.2, 0.5, 0.3],
'val2': [0.1, 0.7, 0.4]})
Code
# Ensure starting from a RangeIndex df = df.reset_index(drop=True) NR = 3 # Number of repeats mr = pd.concat([myrow]*len(df)*NR, ignore_index=True) mr.index = df.index.repeat(NR) + np.tile(np.arange(0, 1, 1/NR), len(df)) # `mr` second in the `concat` so rows go below df = pd.concat([df, mr]).sort_index().reset_index(drop=True)
index val1 val2 0 0 0.2 0.1 1 -1 0.9 0.9 2 -1 0.9 0.9 3 -1 0.9 0.9 4 1 0.5 0.7 5 -1 0.9 0.9 6 -1 0.9 0.9 7 -1 0.9 0.9 8 2 0.3 0.4 9 -1 0.9 0.9 10 -1 0.9 0.9 11 -1 0.9 0.9