Skip to content
Advertisement

Load oracle Dataframe in dask dataframe

I used to work with pandas and cx_Oracle until now. But I haver to switch to dask now due to RAM limitations.

import pandas as pd
from dask import dataframe as dd
import os
import cx_Oracle as cx


con = cx.connect('USER','userpw' , 'oracle_db',encoding='utf-8') 
cursor = con.cursor()

query_V_Branchen = ('''SELECT * FROM DBOWNER.V_BRANCHEN vb''')

daskdf = dd.read_sql_table(query_V_Branchen,con ,index_col= 'RECID')

I tried to do it similar to how I used cx_oracle with pandas. But I receive an AttributeError named:

'cx_Oracle.Connection' object has no attribute '_instantiate_plugins'

Any ideas if its just a problem with the package?

Advertisement

Answer

Please read the dask doc on SQL:

  • you should provide a connection string, not an object

  • you should give a table name, not a query, or phrase your query using sqlalchemy’s expression syntax.

e.g.,

df = dd.read_sql_table('DBOWNER.V_BRANCHEN', 
    'oracle+cx_oracle://USER:userpw@oracle_db', index_col= 'RECID')
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement