I want to rename a redshift table within a Python Databricks notebook.
Currently I have a query that pulls in data and creates a table:
redshiftUrl = 'jdbc:redshift://myredshifturl' redshiftOptions = { 'url': redshiftUrl, 'aws_iam_role': 'myredshift_iam_role', 'tempdir': 's3bucket', } def run_query(query_string): # returns a df return spark.read.format('com.databricks.spark.redshift').options(**redshiftOptions).option('query', query_string) query = """ SELECT TOP 100 * FROM public.mytable """ df = run_query(query).load() def write_table(src_table, table_name=None, write_mode='overwrite', diststyle=None, distkey=None, sortkeyspec=None, preactions=None, postactions=None): srcDf = src_table loadOptions = redshiftOptions loadOptions['tempformat'] = 'CSV GZIP' loadOptions['extracopyoptions'] = 'truncatecolumns' if diststyle: loadOptions['diststyle'] = diststyle if distkey: loadOptions['diststyle'] = 'key' loadOptions['distkey'] = distkey if sortkeyspec: loadOptions['sortkeyspec'] = sortkeyspec if preactions: loadOptions['preactions'] = preactions if postactions: loadOptions['postactions'] = postactions if table_name == None: table_name = src_table srcDf.write.format('com.databricks.spark.redshift').options(**loadOptions).option('dbtable', table_name).mode(write_mode).save() write_table(df, table_name = 'public.test')
I want to take this table I created and rename it. I referenced this doc but find it hard to follow.
I want to run this SQL command alter table public.test rename to test_table_to_be_dropped
in my pyspark databricks notebook. (this command works within my SQL IDE)
Here’s what I have tried:
test_query = """ alter table public.test rename to test_table_to_be_dropped """ run_query(test_query).load()
This gives me an error:
java.sql.SQLException: [Amazon](500310) Invalid operation: syntax error at or near "table"
It leads me to believe that my run_query()
function is only for reading data and not for editing data. I also saw that I can use the %sql
magic command, but it looks like it will only reference my data lake and not my redshift database.
TL;DR:
Is there a way I can write a function to use sql write commands or to point my %sql
magic command to reference my redshift table?
I want to run this SQL command alter table public.test rename to test_table_to_be_dropped
in my pyspark databricks notebook. (this command works within my SQL IDE)
Advertisement
Answer
AWS Redshift has a postactions
and preactions
parameter that allows you to write SQL queries after your write query.
def write_table(src_table, table_name=None, diststyle=None, distkey=None, sortkeyspec=None, preactions=None, postactions=None): srcDf = src_table loadOptions = redshiftOptions loadOptions['tempformat'] = 'CSV GZIP' loadOptions['extracopyoptions'] = 'truncatecolumns compupdate off' if diststyle: loadOptions['diststyle'] = diststyle if distkey: loadOptions['diststyle'] = 'key' loadOptions['distkey'] = distkey if sortkeyspec: loadOptions['sortkeyspec'] = sortkeyspec if preactions: loadOptions['preactions'] = preactions if postactions: loadOptions['postactions'] = postactions if table_name == None: table_name = src_table srcDf.write.format('com.databricks.spark.redshift').options(**loadOptions).option('dbtable', table_name).mode('overwrite').save()
sql_query = """ ALTER TABLE some sql actions; DROP TABLE mytable """ write_table(df,table_name=mytable,postactions=sql_query)