I am working on a Jupyter notebook from AWS EMR. I am able to do this: pd.read_csv("s3:\mypath\xyz.csv'). However, if I try to open a pickle file like this, pd.read_pickle("s3:\mypath\xyz.pkl") I am getting this error: However, I can see both xyz.csv and xyz.pkl in the same path! Can anyone help? Answer Pandas read_pickle supports only local paths, unlike read_csv. So you

Pandas read_pickle from s3 bucket

I am working on a Jupyter notebook from AWS EMR.

I am able to do this: pd.read_csv("s3:\mypath\xyz.csv').

However, if I try to open a pickle file like this, pd.read_pickle("s3:\mypath\xyz.pkl")

I am getting this error:

[Errno 2] No such file or directory: 's3://pvarma1/users/users/candidate_users.pkl'
Traceback (most recent call last):
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 179, in read_pickle
    return try_read(path)
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 177, in try_read
    lambda f: pc.load(f, encoding=encoding, compat=True))
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 146, in read_wrapper
    is_text=False)
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/common.py", line 421, in _get_handle
    f = open(path_or_buf, mode)
IOError: [Errno 2] No such file or d

JavaScript
​x
 
[Errno 2] No such file or directory: 's3://pvarma1/users/users/candidate_users.pkl'
Traceback (most recent call last):
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 179, in read_pickle
    return try_read(path)
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 177, in try_read
    lambda f: pc.load(f, encoding=encoding, compat=True))
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 146, in read_wrapper
    is_text=False)
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/common.py", line 421, in _get_handle
    f = open(path_or_buf, mode)
IOError: [Errno 2] No such file or d
​

However, I can see both xyz.csv and xyz.pkl in the same path! Can anyone help?

Answer

Pandas read_pickle supports only local paths, unlike read_csv. So you should be copying the pickle file to your machine before reading it in pandas.

Advertisement

Answer