Skip to content
Advertisement

Tensorflow reading data from AWS s3 bucket

I want to stream my data files from an AWS s3 bucket. I’m following the setup described here, but using tensorflow 2.

The setup specifies that you can use a AWS configuration file in ~/.aws/credentials, but I also tried using the environment variables. However the smoke test below keeps giving the following error tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 's3' not implemented.

from tensorflow.python.lib.io import file_io
print(file_io.stat('s3://bucketname/key/'))

Advertisement

Answer

I found myself back at my question a few times. Generallly, when you experience this, you are trying to communicate with S3 on Windows in a place it is only implemented on linux.

My main advice is to not do when it comes to training data. If possible, try to download the files because it will be worth the overhead. Especially when using tools like sagemaker, which have very convenient methods for downloading the data at start-up. If you are really in need for quick startup & fast reading, it is worth looking at FSx for Lustre.

A few times I used smart_open which works well on Windows and linux.

from smart_open import open
with open("s3://bucket/key) as f:
   ...
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement