How to even start a basic query in databricks using python?
The data I need is in databricks and so far I have been using Juypterhub to pull the data and modify few things. But now I want to eliminate a step of pulling the data in Jupyterhub and directly move my python code in databricks then schedule the job.
I started like below
%python import pandas as pd df = pd.read_sql('select * from databasename.tablename')
and got below error
TypeError: read_sql() missing 1 required positional argument: ‘con’
So I tried update
%python import pandas as pd import pyodbc odbc_driver = pyodbc.drivers()[0] conn = pyodbc.connect(odbc_driver) df = pd.read_sql('select * databasename.tablename', con=conn)
and I got below error
ModuleNotFoundError: No module named ‘pyodbc’
Can anyone please help? I can use sql to pull the data but I already have a lot of code in python that I dont know to convert in sql. So I just want my python code to work in databricks for now.
Advertisement
Answer
You should use directly spark’s SQL facilities:
my_df = spark.sql('select * FROM databasename.tablename')