Skip to content
Advertisement

Python in Databricks

How to even start a basic query in databricks using python?

The data I need is in databricks and so far I have been using Juypterhub to pull the data and modify few things. But now I want to eliminate a step of pulling the data in Jupyterhub and directly move my python code in databricks then schedule the job.

I started like below

%python
import pandas as pd
df = pd.read_sql('select * from databasename.tablename')

and got below error

TypeError: read_sql() missing 1 required positional argument: ‘con’

So I tried update

%python
import pandas as pd
import pyodbc

odbc_driver = pyodbc.drivers()[0]
conn = pyodbc.connect(odbc_driver) 

df = pd.read_sql('select * databasename.tablename', con=conn)

and I got below error

ModuleNotFoundError: No module named ‘pyodbc’

Can anyone please help? I can use sql to pull the data but I already have a lot of code in python that I dont know to convert in sql. So I just want my python code to work in databricks for now.

Advertisement

Answer

You should use directly spark’s SQL facilities:

my_df = spark.sql('select * FROM databasename.tablename') 
1 People found this is helpful
Advertisement