I am trying to understand and learn how to get all my files from the specific bucket into one csv file. I have the files that are like logs and are always in the same format and are kept in the same bucket. I have this code to access them and read them:
bucket = s3_resource.Bucket(bucket_name) for obj in bucket.objects.all(): x = obj.get()['Body'].read().decode('utf-8') print(x)
It does print them with separation between specific files and also column headers.
The question I have got is, how can I modify my loop to get them into just one csv file?
Advertisement
Answer
You should create a file in /tmp/
and write
the contents of each object into that file.
Then, when all files have been read, upload the file (or do whatever you want to do with it).
output = open('/tmp/outfile.txt', 'w') bucket = s3_resource.Bucket(bucket_name) for obj in bucket.objects.all(): output.write(obj.get()['Body'].read().decode('utf-8')) output.close
Please note that there is a limit of 512MB in the /tmp/
directory.