Tag: aws-glue

AWS Glue Job upsert from one db table to annother db table

amazon-web-services aws-glue pyspark python sql

I am trying to create a pretty basic Glue job. I have two different AWS RDS Mariadb’s, with two similar tables (field names are different). I would like to transform the data from table A so it fits with table B schema (this seems pretty trivial and is working). And then i would like to update all existing entries (on

Query S3 from Python

amazon-athena amazon-s3 amazon-web-services aws-glue python

I am using python to send a query to Athena and get table DDL. I am using start_query_execution and get_query_execution functions in the awswrangler package. The code above creates a dict object that stores query results in an s3 link. The link can be accessed by res[‘ResultConfiguration’][‘OutputLocation’]. It’s a text link: s3://…..txt Can someone help me figure how to access

How to Bulk insert data into MSSQL database in a AWS Glue python shell job?

aws-glue pymssql python python-3.x

I have large sets of data in s3. In my Python glue job, I will be extracting data from those files in the form of a pandas data frame and apply necessary transformations on the data frame and then load it into Microsoft SQL database using PYMSSQL library. The final data frame contains an average of 100-200K rows and 180

AWS Glue python shell – Using multiple libraries

aws-glue aws-glue-connection aws-glue-workflow python python-packaging

I was using AWS glue python shell. The program uses multiple python libraries which not natively available for AWS. Glue can take .egg or .whl files for external library reference. All we need to do is put these .egg or .whl file in some S3 location and point to it using it’s full path. I tried with one external library

How to make connection from Aws Glue Catalog tables to custom python shell script?

amazon-web-services aws-glue aws-glue-data-catalog python

I have some tables in aws glue data catalog which have been created by crawling the data from S3 buckets.I am writing my own python shell script to perform some data trasformations for data in those tables.But how can I make the connection to those tables in data catalog via python script? Answer If you want to access Glue catalog

Col names not detected – AnalysisException: Cannot resolve ‘Name’ given input columns ‘col10’

apache-spark aws-glue pyspark python

I’m trying to run a transformation function in a pyspark script: My dataset looks like this: My desired output is something like this: However, the last code line gives me an error similar to this: When I check: I see ‘col1’, ‘col2’ etc in the first row instead of the actual labels ( [“Name”,”Type”] ). Should I separately remove and

Get tables from AWS Glue using boto3

amazon-web-services aws-glue boto3 pyspark python

I need to harvest tables and column names from AWS Glue crawler metadata catalogue. I used boto3 but constantly getting number of 100 tables even though there are more. Setting up NextToken doesn’t help. Please help if possible. Desired results is list as follows: lst = [table_one.col_one, table_one.col_two, table_two.col_one….table_n.col_n] UPDATED code, still need to have tablename+columnname: Answer Adding sub-loop did

AWS Glue python install – Could not find a version

amazon-web-services aws-glue python

I am trying to use the AWSGlue module in Python, but cannot install the module in the terminal. Is there a way around this or is there a way I can download this from a third-party? Does anyone have this AWSGlue module working? Any help would be appreciated. Answer I believe the awsglue package is only available in the images

AWS region in AWS Glue

amazon-web-services aws-glue python

How can I get the region in which the current Glue job is executing? When the Glue job starts executing, I see the output Detected region eu-central-1. In AWS Lambda, I can use the following lines to fetch the current region: However, it seems like the AWS_REGION environment variable is not present in Glue and therefore a KeyError is raised: