I have an S3 bucket where my application saves some final result DataFrames as .csv files. I would like to download the latest 1000 files in this bucket, but I don’t know how to do it.
I cannot do it mannualy, as the bucket doesn’t allow me to sort the files by date because it has more than 1000 elements
I’ve seen some questions that could work using AWS CLI, but I don’t have enough user permissions to use the AWS CLI, so I have to do it with a boto3
python script that I’m going to upload into a lambda.
How can I do this?
Advertisement
Answer
If your application uploads files periodically, you could try this:
import boto3 import datetime last_n_days = 250 s3 = boto3.client('s3') paginator = s3.get_paginator('list_objects_v2') pages = paginator.paginate(Bucket='bucket', Prefix='processed') date_limit = datetime.datetime.now() - datetime.timedelta(30) for page in pages: for obj in page['Contents']: if obj['LastModified'] >= date_limit and obj['Key'][-1] != '/': s3.download_file('bucket', obj['Key'], obj['Key'].split('/')[-1])
With the script above, all files modified in the last 250 days will be downloaded. If your application uploads 4 files per day, this could do the fix.