Skip to content
Advertisement

AWS Lambda – Combine multiple CSV files from S3 into one file

I am trying to understand and learn how to get all my files from the specific bucket into one csv file. I have the files that are like logs and are always in the same format and are kept in the same bucket. I have this code to access them and read them:

bucket = s3_resource.Bucket(bucket_name)
for obj in bucket.objects.all():
    x = obj.get()['Body'].read().decode('utf-8')
    print(x)

It does print them with separation between specific files and also column headers.

The question I have got is, how can I modify my loop to get them into just one csv file?

Advertisement

Answer

You should create a file in /tmp/ and write the contents of each object into that file.

Then, when all files have been read, upload the file (or do whatever you want to do with it).

output = open('/tmp/outfile.txt', 'w')

bucket = s3_resource.Bucket(bucket_name)
for obj in bucket.objects.all():
    output.write(obj.get()['Body'].read().decode('utf-8'))
    
output.close

Please note that there is a limit of 512MB in the /tmp/ directory.

Advertisement