I’m a newbie in mrjob and EMR and I’m still trying to figure out how things work. So I’m having this error when I’m running my script: python3 MovieSimilarities.py -r emr –items=ml-100k/u.item ml-100k/u.data > sims2t.txt Here’s the code: Here’s the link to get the data: files.grouplens.org/datasets/movielens/ml-100k.zip I have exported my aws_access_key_id and aws_secret_access_key in my .bashrc and restarted my shell.
Tag: amazon-web-services
AWS SageMaker Deployment for Batch Transform
I am trying to use a XGBoost model in Sage Maker and use it to score for a large data stored in S3 using Batch Transform. I build the model using existing Sagemaker Container as follows: The following code is used to do Batch Transform The above code works fine in Development environment (Jupyter notebook) when the model is built
Update values in Dynamodb by modifying the previous values
I created a dynamodb table and I am using the lambda function to update the values in the dynamodb table. The problem with my function for updating the values, not modifying the older values which were already present in the dynamodb, it just overwriting the values. My code snippet for updating the data item in Dynamodb table: Here Key is
Unable to import module ‘lambda_function’: No module named *
I am trying to run a python lambda function that uses additional packages. However whenever I upload the .zip file to the lambda console I get the error: I followed these instructions: https://docs.aws.amazon.com/lambda/latest/dg/python-package.html#python-package-dependencies which told me to make sure my packages were in a directory local to my lambda function: I am not using Pillow. This is sample code from
Pyathena is super slow compared to querying from Athena
I run a query from AWS Athena console and takes 10s. The same query run from Sagemaker using PyAthena takes 155s. Is PyAthena slowing it down or is the data transfer from Athena to sagemaker so time consuming? What could I do to speed this up? Answer Just figure out a way of boosting the queries: Before I was trying:
Include only .gz extension files from S3 bucket
I want to process/download .gz files from S3 bucket. There are more than 10,000 files on S3 so I am using This lists .txt files which I want to avoid. How can I do that? Answer The easiest way to filter objects by name or suffix is to do it within Python, such as using .endswith() to include/exclude objects. You
AWS lambda expected intended block error in Python
I am following the below document to connect to Cloudwatch logs to ELK. https://medium.com/@sohit_kumar/streaming-aws-cloudwatch-logs-to-your-own-elk-logging-solution-2bbd32f25100 I get a “expected an indented block” syntax error in python for this line. try: logs = awslogs_handler(s, event) Can someone help me figure this out? Not sure what im missing. Thanks! Answer You need to use something like this: An indent block(contains four whitespaces) are
How to restore postgreSQL from dump file to AWS?
I have PostgreSQL dump file in my local environment, and I want to restore it on AWS server where Django app was deployed. I think I should upload the dump file to AWS server but I don’t know where it should be uploaded to and how to restore it. Answer First, copy your file from your local environment to AWS
I can’t install spacy model in EMR PySpark notebook
I currently have an AWS EMR with a linked notebook to that same cluster. I would like to load a spacy model (en_core_web_sm) but first I need to download the model which is usually done using python -m spacy download en_core_web_sm but I really can’t find how to do it in a PySpark Session. Here is my config : I’m
Serving static files in elastick beanstalk
I’m deploying a python3 flask application in aws elasitc beanstalk (Amazon Linux 2 platform). The folder structure is as follows: In the template files, importing of static resources are defiled as: i.e. JS file: In the EB configurations, I’ve defined the static resources as below But the problem is, these resources are not loading and giving 404. i.e.: https://example.com/static/js/jquery.js cannot