I’m a newbie in mrjob and EMR and I’m still trying to figure out how things work. So I’m having this error when I’m running my script: python3 MovieSimilarities.py -r emr –items=ml-100k/u.item ml-100k/u.data > sims2t.txt Here’s the code: Here’s the link to get the data: files.grouplens.org/datasets/movielens/ml-100k.zip I have exported my aws_access_key_id and aws_secret_access_key in my .bashrc and restarted my shell.
Tag: amazon-emr
I can’t install spacy model in EMR PySpark notebook
I currently have an AWS EMR with a linked notebook to that same cluster. I would like to load a spacy model (en_core_web_sm) but first I need to download the model which is usually done using python -m spacy download en_core_web_sm but I really can’t find how to do it in a PySpark Session. Here is my config : I’m