Fastest way to append a row to an existing data frame?

Question

I know this question has been asked many a time, but none of the solutions already posted on this site is ideal. I have tested various methods found here, and timed them using IPython, I will post the results below: songs is a DataFrame with 4464 rows (initially) and 15 columns. I am fully aware DataFrame indexes are IMMUTABLE, so

Accepted Answer

First we establish the time needed to create a dataframe:%%timeit songs = pd.DataFrame(index=np.arange(4464 ), columns=np.arange(15))100 loops, best of 5: 5.21 ms per loopIt takes around 5.2 ms to create this dataframe and so we can use it as a reference for the next cases (to prevent caching for e.g).Case with append:%%timeitsongs = pd.DataFrame(index=np.arange(4464 ), columns=np.arange(15))s = pd.Series([1] * 15, index = songs.columns)songs.append(s, ignore_index=True) 100 loops, best of 5: 10.1 ms per loop10 ms meaning it takes around 5ms which is similar to your answer.With loc:%%timeit songs = pd.DataFrame(index=np.arange(4464 ), columns=np.arange(15))songs.loc[4464] = [1]*15100 loops, best of 5: 10.2 ms per loopagain around 5ms.Solution: Use iloc%%timeit songs = pd.DataFrame(index=np.arange(4464 ), columns=np.arange(15))songs.iloc[-1] = [1]*15100 loops, best of 5: 5.25 ms per loopThis solution gets the answer to around ~70µs.Tested alone, (by creating the dataframe at first):%%timeitsongs.iloc[-1] = [1]*1510000 loops, best of 5: 67.1 µs per loop

Advertisement

Answer