Skip to content
Advertisement

Unable to load S3-hosted CSV into Spark Dataframe on Jupyter Notebook

Unable to load S3-hosted CSV into Spark Dataframe on Jupyter Notebook.

I believe I uploaded the 2 required packages with the os.environ line below. If I did it incorrectly please show me how to correctly install it. The Jupyter Notebook is hosted on an EC2 instance, which is why I’m trying to pull the CSV from a S3 bucket.

Here is my code:

JavaScript

Output:

enter image description here

Then i do:

JavaScript

And i get an error of:

JavaScript

Advertisement

Answer

Here is an example using s3a.

JavaScript

Here is a more complete example taken from here to build the spark session with required config.

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement