There is this S3 notification feature described here:
Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.
and discussed here.
I thought I could mitigate the duplications a bit by deleting files I have already processed. The problem is, when a second event to the same file comes (a minute later) and I try to access the file, I don’t get an HTTP 404, I get an ugly AccessDenied:
[ERROR] ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied Traceback (most recent call last): File "/var/task/lambda_function.py", line 111, in lambda_handler raise e File "/var/task/lambda_function.py", line 104, in lambda_handler response = s3.get_object(Bucket=bucket, Key=key) File "/var/runtime/botocore/client.py", line 391, in _api_call return self._make_api_call(operation_name, kwargs) File "/var/runtime/botocore/client.py", line 719, in _make_api_call raise error_class(parsed_response, operation_name)
which is unexpected and not acceptable.
I don’t want my lambda to suppress AccessDenied errors for obvious reasons. Is there an easy way to find out if the file has been already processed in the past or if notification service is playing tricks?
EDIT:
For those who think this is “an indication of some bug in my application” here the relevant piece of code:
bucket = event['Records'][0]['s3']['bucket']['name'] key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8') logger.info(f'Requesting file from bucket {bucket} with key {key}') try: response = s3.get_object(Bucket=bucket, Key=key) except ClientError as e: error_code = e.response["Error"]["Code"] if error_code == 'NoSuchKey': logger.info('Object does not exist any more') return else: raise e
It rather smells like an ugly issue on AWS side to me.
Advertisement
Answer
On the duplicate delivery of notifications, yes this can happen as documented but is relatively rare:
Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.
One possible mechanism to deal with this is to build an idempotent workflow, for example that utilizes DynamoDB to record actions against an object at a given time that can be queried to prevent duplicate workflow on the same object. There are a number of idempotency features in the AWS Lambda PowerTools suite or third-party options that you might consider.
More discussion on the duplicate event topic can be found here.
On the AccessDenied error when attempting to download an absent object that you have GetObject permission for, this is actually a security feature designed to prevent the leakage of information. If you have ListBucket permission then you will get a 404 Not Found response indicating the absence of the object; if you don’t have ListBucket then you will get a 403 Forbidden response. To correct this, add s3:ListBucket
on arn:aws:s3:::mybucket
to your IAM policy.
More discussion on the AccessDenied topic can be found here.