Skip to content
Advertisement

Rename a redshift SQL table within PySpark Databricks

I want to rename a redshift table within a Python Databricks notebook.

Currently I have a query that pulls in data and creates a table:

redshiftUrl = 'jdbc:redshift://myredshifturl'

redshiftOptions = {
  'url':          redshiftUrl,
  'aws_iam_role': 'myredshift_iam_role',
  'tempdir':      's3bucket',
}

def run_query(query_string):
  # returns a df
  return spark.read.format('com.databricks.spark.redshift').options(**redshiftOptions).option('query', query_string)

query = """
SELECT TOP 100 * FROM public.mytable
"""

df = run_query(query).load()

def write_table(src_table, table_name=None, write_mode='overwrite', diststyle=None, distkey=None, sortkeyspec=None, preactions=None, postactions=None):
  srcDf = src_table
  loadOptions = redshiftOptions
  loadOptions['tempformat'] = 'CSV GZIP'
  loadOptions['extracopyoptions'] = 'truncatecolumns'
  if diststyle:
    loadOptions['diststyle'] = diststyle
  if distkey:
    loadOptions['diststyle'] = 'key'
    loadOptions['distkey'] = distkey
  if sortkeyspec:
    loadOptions['sortkeyspec'] = sortkeyspec
  if preactions:
    loadOptions['preactions'] = preactions
  if postactions:
    loadOptions['postactions'] = postactions
  if table_name == None: table_name = src_table
  srcDf.write.format('com.databricks.spark.redshift').options(**loadOptions).option('dbtable', table_name).mode(write_mode).save()

write_table(df, table_name = 'public.test')

I want to take this table I created and rename it. I referenced this doc but find it hard to follow.

I want to run this SQL command alter table public.test rename to test_table_to_be_dropped in my pyspark databricks notebook. (this command works within my SQL IDE)

Here’s what I have tried:

test_query = """
alter table public.test rename to test_table_to_be_dropped
"""
run_query(test_query).load()

This gives me an error:

java.sql.SQLException: [Amazon](500310) Invalid operation: syntax error at or near "table" 

It leads me to believe that my run_query() function is only for reading data and not for editing data. I also saw that I can use the %sql magic command, but it looks like it will only reference my data lake and not my redshift database.

TL;DR: Is there a way I can write a function to use sql write commands or to point my %sql magic command to reference my redshift table?

I want to run this SQL command alter table public.test rename to test_table_to_be_dropped in my pyspark databricks notebook. (this command works within my SQL IDE)

Advertisement

Answer

AWS Redshift has a postactions and preactions parameter that allows you to write SQL queries after your write query.

def write_table(src_table, table_name=None, diststyle=None, distkey=None, sortkeyspec=None, preactions=None, postactions=None):
  srcDf = src_table
  loadOptions = redshiftOptions
  loadOptions['tempformat'] = 'CSV GZIP'
  loadOptions['extracopyoptions'] = 'truncatecolumns compupdate off'
  if diststyle:
    loadOptions['diststyle'] = diststyle
  if distkey:
    loadOptions['diststyle'] = 'key'
    loadOptions['distkey'] = distkey
  if sortkeyspec:
    loadOptions['sortkeyspec'] = sortkeyspec
  if preactions:
    loadOptions['preactions'] = preactions
  if postactions:
    loadOptions['postactions'] = postactions
  if table_name == None: table_name = src_table
  srcDf.write.format('com.databricks.spark.redshift').options(**loadOptions).option('dbtable', table_name).mode('overwrite').save()
sql_query = """
ALTER TABLE some sql actions;
DROP TABLE mytable
"""

write_table(df,table_name=mytable,postactions=sql_query)
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement