Skip to content
Advertisement

Query S3 from Python

I am using python to send a query to Athena and get table DDL. I am using start_query_execution and get_query_execution functions in the awswrangler package.

import boto3
import awswrangler as wr
import time
import pandas as pd

boto3.setup_default_session(region_name="us-east-1")

sql="show create table 'table-name'"
query_exec_id = wr.athena.start_query_execution(sql=sql, database='database-name')

time.sleep(20)

res=wr.athena.get_query_execution(query_execution_id=query_exec_id)

The code above creates a dict object that stores query results in an s3 link. The link can be accessed by

res['ResultConfiguration']['OutputLocation']. It’s a text link: s3://…..txt

Can someone help me figure how to access the output in the link. I tried using readlines() but it seemes to error out.

Here is what I did

import urllib3
target_url = res['ResultConfiguration']['OutputLocation']
f = urllib3.urlopen(target_url)
for l in f.readlines():
    print (l)

Or if someone can suggest an easier way to get table DDL in python.

Advertisement

Answer

Keep in mind that the returned link will time out after a short while… and make sure your credentials allow you to get the data from the URL specified. If you drop the error message here we can help you better. –

Oh… “It’s a text link: s3://…..txt” is not a standard URL. You cant read that with urllib3. You can use awswrangler to read the bucket. –

I think the form is wr.s3.read_fwf(…)

10 People found this is helpful
Advertisement