I'm trying to write a pandas dataframe as a pickle file into an s3 bucket in AWS. I know that I can write dataframe new_df as a csv to an s3 bucket as follows: I've tried using the same code as above with to_pickle() but with no success. Answer I've found the solution, need to call BytesIO into the buffer

Writing a pickle file to an s3 bucket in AWS

I’m trying to write a pandas dataframe as a pickle file into an s3 bucket in AWS. I know that I can write dataframe new_df as a csv to an s3 bucket as follows:

bucket='mybucket'
key='path'

csv_buffer = StringIO()
s3_resource = boto3.resource('s3')

new_df.to_csv(csv_buffer, index=False)
s3_resource.Object(bucket,path).put(Body=csv_buffer.getvalue())

JavaScript
​x
 
bucket='mybucket'
key='path'
​
csv_buffer = StringIO()
s3_resource = boto3.resource('s3')
​
new_df.to_csv(csv_buffer, index=False)
s3_resource.Object(bucket,path).put(Body=csv_buffer.getvalue())
​

I’ve tried using the same code as above with to_pickle() but with no success.

Answer

I’ve found the solution, need to call BytesIO into the buffer for pickle files instead of StringIO (which are for CSV files).

import io
import boto3

pickle_buffer = io.BytesIO()
s3_resource = boto3.resource('s3')

new_df.to_pickle(pickle_buffer)
s3_resource.Object(bucket, key).put(Body=pickle_buffer.getvalue())

JavaScript
 
import io
import boto3
​
pickle_buffer = io.BytesIO()
s3_resource = boto3.resource('s3')
​
new_df.to_pickle(pickle_buffer)
s3_resource.Object(bucket, key).put(Body=pickle_buffer.getvalue())
​

Advertisement

Answer