I am trying to create a pretty basic Glue job.
I have two different AWS RDS Mariadb’s, with two similar tables (field names are different).
I would like to transform the data from table A so it fits with table B schema (this seems pretty trivial and is working).
And then i would like to update all existing entries (on a specific key) and insert all non existing ones.
Ive used the basic transform job which works fine if table b is empty and the insertion works fine (AWS roles / permissions / ports are fine)
But i get an expected duplicate key error, because it is just trying to insert.
I am very unsure what the simplest solution is, and where i can read about it.
The key on which table B should be updated is central_requisition_id (which is the pk in table A, but not in table B)
schemaapplymapping= ApplyMapping.apply( frame=some_frame, mappings=[ ("supplier_id", "int", "central_parent_supplier_id", "int"), ("description", "string", "description", "string"), ("id", "int", "central_requisition_id", "int"), ], transformation_ctx="schemaapplymapping",)
Advertisement
Answer
I’m not sure on your specific needs but you can get around the duplicate key error by setting the write mode to overwrite
df.write.format('jdbc').options(url = dest_jdbc_url, user = username, password = password, dbtable = dest ).mode("overwrite").save()