How to even start a basic query in databricks using python?
The data I need is in databricks and so far I have been using Juypterhub to pull the data and modify few things. But now I want to eliminate a step of pulling the data in Jupyterhub and directly move my python code in databricks then schedule the job.
I started like below
JavaScript
x
4
1
%python
2
import pandas as pd
3
df = pd.read_sql('select * from databasename.tablename')
4
and got below error
TypeError: read_sql() missing 1 required positional argument: ‘con’
So I tried update
JavaScript
1
9
1
%python
2
import pandas as pd
3
import pyodbc
4
5
odbc_driver = pyodbc.drivers()[0]
6
conn = pyodbc.connect(odbc_driver)
7
8
df = pd.read_sql('select * databasename.tablename', con=conn)
9
and I got below error
ModuleNotFoundError: No module named ‘pyodbc’
Can anyone please help? I can use sql to pull the data but I already have a lot of code in python that I dont know to convert in sql. So I just want my python code to work in databricks for now.
Advertisement
Answer
You should use directly spark’s SQL facilities:
JavaScript
1
2
1
my_df = spark.sql('select * FROM databasename.tablename')
2