Below code is in Python and i want to convert this code to pyspark, basically i’m not sure what will be the codefor the statement – pd.read_sql(query,connect_to_hive) to convert into pyspark
Need to extract from data from the EDL, so making the connection to the EDL using PYODBC and them extract the data using sql query.
pyodbc connection to the Enterprise Data Lake:
connect_to_hive = pyodbc.connect("DSN=Hive", autocommit=True) transaction=pd.read_sql(query, connect_to_hive) connect_to_hive.close()
#Query function: Below is just a basic sql query to replicate this problem.
query=f''' with trans as ( SELECT a.employee_name, a.employee_id FROM EMP '''
Advertisement
Answer
The above code can be converted to SparkSQL code as follows:
spark = SparkSession.builder.enableHiveSupport().getOrCreate() query=f''' with trans as ( SELECT a.employee_name, a.employee_id FROM EMP ''' employeeDF = spark.sql(query) employeeDF.show(truncate=False)
The query would be run as is on Hive and the result shall be available to you as a Spark DataFrame