I am trying to update pandas within a lifecycle configuration, and following the example of AWS I have the next code:
#!/bin/bash set -e # OVERVIEW # This script installs a single pip package in a single SageMaker conda environments. sudo -u ec2-user -i <<EOF # PARAMETERS PACKAGE=pandas ENVIRONMENT=python3 source /home/ec2-user/anaconda3/bin/activate "$ENVIRONMENT" pip install --upgrade "$PACKAGE"==0.25.3 source /home/ec2-user/anaconda3/bin/deactivate EOF
Then I attach it to a notebook and when I enter the notebook and open a notebook file, I see that pandas have not been updated. Using !pip show pandas
I get:
Name: pandas Version: 0.24.2 Summary: Powerful data structures for data analysis, time series, and statistics Home-page: http://pandas.pydata.org Author: None Author-email: None License: BSD Location: /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages Requires: pytz, python-dateutil, numpy Required-by: sparkmagic, seaborn, odo, hdijupyterutils, autovizwidget
So we can see that I am indeed in the python3 env although the version is 0.24.
However, the log in cloudwatch shows that it has been installed:
Collecting pandas==0.25.3 Downloading https://files.pythonhosted.org/packages/52/3f/f6a428599e0d4497e1595030965b5ba455fd8ade6e977e3c819973c4b41d/pandas-0.25.3-cp36-cp36m-manylinux1_x86_64.whl (10.4MB) 2020-02-03T12:33:09.065+01:00 Requirement already satisfied, skipping upgrade: pytz>=2017.2 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (2018.4) 2020-02-03T12:33:09.065+01:00 Requirement already satisfied, skipping upgrade: python-dateutil>=2.6.1 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (2.7.3) 2020-02-03T12:33:09.065+01:00 Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (1.16.4) 2020-02-03T12:33:09.065+01:00 Requirement already satisfied, skipping upgrade: six>=1.5 in ./anaconda3/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas==0.25.3) (1.13.0) 2020-02-03T12:33:09.065+01:00 Installing collected packages: pandas Found existing installation: pandas 0.24.2 Uninstalling pandas-0.24.2: Successfully uninstalled pandas-0.24.2 2020-02-03T12:33:12.066+01:00 Successfully installed pandas-0.25.3
What could be the problem?
Advertisement
Answer
if you want to install the packages only in for the python3 environment, use the following script in your Create Sagemaker Lifecycle configurations.
#!/bin/bash sudo -u ec2-user -i <<'EOF' # This will affect only the Jupyter kernel called "conda_python3". source activate python3 # Replace myPackage with the name of the package you want to install. pip install pandas==0.25.3 # You can also perform "conda install" here as well. source deactivate EOF
Reference : “Lifecycle Configuration Best Practices”