Best method for sending large pandas dataframe to SQL database?

Question

I have a pandas dataframe which has 10 columns and 10 million rows. I have created an empty table in pgadmin4 (an application to manage databases like MSSQL server) for this data to be stored. However, when running the following command: It takes a very long time in order to run and often crashes my jupyter k…

Accepted Answer

Have a look at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.htmlIf this applies to your version of pandas, use df.to_sql("table_name",           connection,           index=False,           if_exists='append',          chunksize=25000,          method=None)Your query might be crashing because you&#8217;re using method='multi', as this does the following:  method : {None, ‘multi’, callable}, default None    Controls the SQL insertion clause used:    ‘multi’: Pass multiple values in a single INSERT clause.  callable with signature (pd_table, conn, keys, data_iter).  Details and a sample callable implementation can be found in the section insert method.Which means that pandas would construct the statement in memory for all rows. Using chunksize and one INSERT statement per row will allow pandas to chunk-up the save to db.

Advertisement

Answer