Skip to content
Advertisement

Sqlite executemany and DELETE

Execute many seems to be very slow with deletion (Insertion is fine) and I was wondering if anyone knows why it takes so long.

Consider the code below:

import sqlite3

db = sqlite3.connect("mydb")
c = db.cursor()
c.execute("DROP TABLE IF EXISTS testing ")
c.execute("CREATE TABLE testing (val INTEGER);")
my_vals2 = [[x] for x in range(1,10000)]

def insertmany(vals):
    c.executemany("INSERT INTO testing (val) VALUES (?)",vals)
    db.commit()

def deletemany1(vals):
    c.executemany("DELETE FROM testing WHERE val=?",vals)
    db.commit()

def deletemany2(vals): #this is fastest even though im looping over to convert to strings and again to join ...
    vals = ["'%s'"%v[0] for v in vals] 
    c.execute("DELETE FROM testing WHERE val IN (%s)"%",".join(vals))
    #DELETE FROM TABLE WHERE x in (1,2,3...)

And the following time results (timeit was giving funny data so :/) from IPython:

%time insertmany(my_vals2) 
#CPU times: user 0.60 s, sys: 0.00 s, total: 0.60 s Wall time: 0.60 s

%time deletemany1(my_vals2)
#CPU times: user 3.58 s, sys: 0.00 s, total: 3.58 s Wall time: 3.58 s

%time deletemany2(my_vals2)
#CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s Wall time: 0.02 s

And just for the sake of completeness here are the timeit results (but I think timeit is broken on the second test(that or the ms is a different unit then the first test))

%timeit insertmany(my_vals2) 
#1 loops, best of 3: 358 ms per loop

%timeit deletemany1(my_vals2)
#1 loops, best of 3: 8.34 ms per loop  <- this is not faster than the above!!!! (timeit lies?)

%timeit deletemany2(my_vals2)
#100 loops, best of 3: 2.3 ms per loop  

So why is executemany soooooo slow with deletes?

Advertisement

Answer

SQLites stores table records in a B+ tree, sorted by rowid.

When you are inserting with an automatically generated rowid, all records are just appended at the end of the table. However, when deleting, SQLite has to search for the record first. This is slow if the id column is not indexed; either create an explicit index (as proposed by John), or declare the column as INTEGER PRIMARY KEY to make it the rowid.

Inserting with an index becomes faster if you don’t use the index, i.e., if you create the index only after bulk inserts.

Your last delete command deletes all records at once. If you know that you’re deleting all records in the table, you could speed it up even further by using just DELETE FROM testing, which doesn’t need to look at any records at all.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement