Skip to content
Advertisement

How to download latest n items from AWS S3 bucket using boto3?

I have an S3 bucket where my application saves some final result DataFrames as .csv files. I would like to download the latest 1000 files in this bucket, but I don’t know how to do it.

I cannot do it mannualy, as the bucket doesn’t allow me to sort the files by date because it has more than 1000 elements

bucket limit size for sorting

I’ve seen some questions that could work using AWS CLI, but I don’t have enough user permissions to use the AWS CLI, so I have to do it with a boto3 python script that I’m going to upload into a lambda.

How can I do this?

Advertisement

Answer

If your application uploads files periodically, you could try this:

import boto3
import datetime

last_n_days = 250
s3 = boto3.client('s3')

paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='processed')
date_limit = datetime.datetime.now() - datetime.timedelta(30)
for page in pages:
    for obj in page['Contents']:
        if obj['LastModified'] >= date_limit and obj['Key'][-1] != '/':
             s3.download_file('bucket', obj['Key'], obj['Key'].split('/')[-1])

With the script above, all files modified in the last 250 days will be downloaded. If your application uploads 4 files per day, this could do the fix.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement