I am trying to create a pretty basic Glue job. I have two different AWS RDS Mariadb’s, with two similar tables (field names are different). I would like to transform the data from table A so it fits with table B schema (this seems pretty trivial and is working). And then i would like to update all exist…
Tag: aws-glue
Query S3 from Python
I am using python to send a query to Athena and get table DDL. I am using start_query_execution and get_query_execution functions in the awswrangler package. The code above creates a dict object that stores query results in an s3 link. The link can be accessed by res[‘ResultConfiguration’][‘…
How to Bulk insert data into MSSQL database in a AWS Glue python shell job?
I have large sets of data in s3. In my Python glue job, I will be extracting data from those files in the form of a pandas data frame and apply necessary transformations on the data frame and then load it into Microsoft SQL database using PYMSSQL library. The final data frame contains an average of 100-200K r…
AWS Glue python shell – Using multiple libraries
I was using AWS glue python shell. The program uses multiple python libraries which not natively available for AWS. Glue can take .egg or .whl files for external library reference. All we need to do is put these .egg or .whl file in some S3 location and point to it using it’s full path. I tried with one…
How to make connection from Aws Glue Catalog tables to custom python shell script?
I have some tables in aws glue data catalog which have been created by crawling the data from S3 buckets.I am writing my own python shell script to perform some data trasformations for data in those tables.But how can I make the connection to those tables in data catalog via python script? Answer If you want …
Col names not detected – AnalysisException: Cannot resolve ‘Name’ given input columns ‘col10’
I’m trying to run a transformation function in a pyspark script: My dataset looks like this: My desired output is something like this: However, the last code line gives me an error similar to this: When I check: I see ‘col1’, ‘col2’ etc in the first row instead of the actual labe…
Get tables from AWS Glue using boto3
I need to harvest tables and column names from AWS Glue crawler metadata catalogue. I used boto3 but constantly getting number of 100 tables even though there are more. Setting up NextToken doesn’t help. Please help if possible. Desired results is list as follows: lst = [table_one.col_one, table_one.col…
AWS Glue python install – Could not find a version
I am trying to use the AWSGlue module in Python, but cannot install the module in the terminal. Is there a way around this or is there a way I can download this from a third-party? Does anyone have this AWSGlue module working? Any help would be appreciated. Answer I believe the awsglue package is only availab…
AWS region in AWS Glue
How can I get the region in which the current Glue job is executing? When the Glue job starts executing, I see the output Detected region eu-central-1. In AWS Lambda, I can use the following lines to fetch the current region: However, it seems like the AWS_REGION environment variable is not present in Glue an…