I am using python to send a query to Athena and get table DDL. I am using start_query_execution and get_query_execution functions in the awswrangler package. The code above creates a dict object that stores query results in an s3 link. The link can be accessed by res[‘ResultConfiguration’][‘OutputLocation’]. It’s a text link: s3://…..txt Can someone help me figure how to access
Tag: amazon-athena
Correct Method to Delete Delta Lake Partion on AWS s3
I need to delete a Delta Lake partition with associated AWS s3 files and then need to make sure AWS Athena displays this change. The purpose is because I need to rerun some code to re-populate the data. I tried this And it completed with no errors but the files on s3 still exist and Athena still shows the data
How to run Airflow tasks synchronously
I have an airflow comprising of 2-3 steps PythonOperator –> It runs the query on AWS Athena and stores the generated file on specific s3 path BashOperator –> Increments the airflow variable for tracking BashOperator –> It takes the output(response) of task1 and and run some code on top of it. What happens here is the airflow gets completed within
Pyathena is super slow compared to querying from Athena
I run a query from AWS Athena console and takes 10s. The same query run from Sagemaker using PyAthena takes 155s. Is PyAthena slowing it down or is the data transfer from Athena to sagemaker so time consuming? What could I do to speed this up? Answer Just figure out a way of boosting the queries: Before I was trying:
Athena query fails with boto3 (S3 location invalid)
I’m trying to execute a query in Athena, but it fails. Code: But it raises the following exception: However, if I go to the Athena Console, go to Settings and enter the same S3 location (for example): the query runs fine. What’s wrong with my code? I’ve used the API of several the other services (eg, S3) successfully, but in