I ran a query which will eventually return roughly 17M rows in chunks of 500,000. Everything seemed to be going just fine, but I ran into the following error:
Traceback (most recent call last): File "sql_csv.py", line 22, in <module> for chunk in pd.read_sql_query(hours_query, db.conn, chunksize = 500000): File "/Users/michael.chirico/anaconda2/lib/python2.7/site-packages/pandas/io/sql.py", line 1424, in _query_iterator data = cursor.fetchmany(chunksize) File "/Users/michael.chirico/anaconda2/lib/python2.7/site-packages/jaydebeapi/__init__.py", line 546, in fetchmany row = self.fetchone() File "/Users/michael.chirico/anaconda2/lib/python2.7/site-packages/jaydebeapi/__init__.py", line 526, in fetchone if not self._rs.next(): jpype._jexception.SQLExceptionPyRaisable: java.sql.SQLException: Query failed (#20171013_015410_01255_8pff8): **Query exceeded maximum time limit of 60.00m**
Obviously such a query can be expected to take some time; I’m fine with this (and chunking means I know I won’t be breaking any RAM limitations — in fact the file output I was running shows the query finished 16M of the 17M rows before crashing!).
But I don’t see any direct options for read_sql_query
. params
seems like a decent candidate, but I can’t see in the jaydebeapi
documentation any hint of what the right parameter to give to execute
might be.
How can I overcome this and run my full query?
Advertisement
Answer
When executing queries, Presto restricts each query by CPU, memory, execution time and other constraints. You hit execution time limit. Please ensure that your query is sound, otherwise, you can crash the cluster.
To increase query execution time, define a new value in session variables.
SET SESSION query_max_execution_time=60m;