I want to remove duplicate data from the database, This statement can be executed in navicat.
delete from Proxy_Main
where (Proxy_Main.ip,Proxy_Main.port)
in (select ip,port from Proxy_Main group by ip,port
having count(*) > 1)
and rowid not in (select min(rowid) from Proxy_Main
group by ip,port having count(*)>1)
error info:
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) near “,”: syntax error [SQL: ‘delete from Proxy_Main where (Proxy_Main.ip,Proxy_Main.port) in (select ip,port from Proxy_Main group by ip,port having count() > 1) and rowid not in (select min(rowid) from Proxy_Main group by ip,port having count()>1)’] (Background on this error at: http://sqlalche.me/e/e3q8)
@staticmethod
def execute(sql):
conn = engine.connect()
conn.execute(sql)
conn.close()
@staticmethod
def deduplication():
SqlHelper.execute('delete from Proxy_Main where (Proxy_Main.ip,Proxy_Main.port) in (select ip,port from Proxy_Main group by ip,port having count(*) > 1) and rowid not in (select min(rowid) from Proxy_Main group by ip,port having count(*)>1)')
Advertisement
Answer
Row value comparison as being attempted in IN
clause is only available in SQLite 3.15+. As mentioned at very bottom of linked docs page:
Row values were added to SQLite version 3.15.0 (2016-10-14). Attempts to use row values in prior versions of SQLite will generate syntax errors.
Check your version (SELECT sqlite_version();
). Upgrade as needed since your query does work with an updated version (run the example SQL Fiddle).
Alternatively, consider a more ANSI-SQL solution (i.e., portable across RDBMS’s) using an aggregate join subquery:
DELETE FROM Proxy_Main
WHERE rowid IN
(SELECT p.rowid
FROM Proxy_Main p
INNER JOIN
(SELECT ip, port, MIN(rowid) As min_id
FROM Proxy_Main
GROUP BY ip, port
HAVING COUNT(*) > 1) AS agg
ON p.ip = agg.ip AND p.port = agg.port
AND p.rowid <> agg.min_id);
Fiddle Demo (press Run at top)
Do note you can pass multi-line queries in Python with triple quote strings.
@staticmethod
def deduplication():
sql = """DELETE FROM Proxy_Main
WHERE rowid IN
(SELECT p.rowid
FROM Proxy_Main p
INNER JOIN
(SELECT ip, port, MIN(rowid) As min_id
FROM Proxy_Main
GROUP BY ip, port
HAVING COUNT(*) > 1) AS agg
ON p.ip = agg.ip AND p.port = agg.port
AND p.rowid <> agg.min_id);"""
SqlHelper.execute(sql)