Sagemaker lifecycle configuration for installing pandas not working

Tags: , , , ,



I am trying to update pandas within a lifecycle configuration, and following the example of AWS I have the next code:

#!/bin/bash

set -e

# OVERVIEW
# This script installs a single pip package in a single SageMaker conda environments.

sudo -u ec2-user -i <<EOF
# PARAMETERS
PACKAGE=pandas
ENVIRONMENT=python3
source /home/ec2-user/anaconda3/bin/activate "$ENVIRONMENT"
pip install --upgrade "$PACKAGE"==0.25.3
source /home/ec2-user/anaconda3/bin/deactivate
EOF

Then I attach it to a notebook and when I enter the notebook and open a notebook file, I see that pandas have not been updated. Using !pip show pandas I get:

Name: pandas
Version: 0.24.2
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: http://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages
Requires: pytz, python-dateutil, numpy
Required-by: sparkmagic, seaborn, odo, hdijupyterutils, autovizwidget

So we can see that I am indeed in the python3 env although the version is 0.24.

However, the log in cloudwatch shows that it has been installed:

Collecting pandas==0.25.3 Downloading https://files.pythonhosted.org/packages/52/3f/f6a428599e0d4497e1595030965b5ba455fd8ade6e977e3c819973c4b41d/pandas-0.25.3-cp36-cp36m-manylinux1_x86_64.whl (10.4MB)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: pytz>=2017.2 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (2018.4)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: python-dateutil>=2.6.1 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (2.7.3)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in ./anaconda3/lib/python3.6/site-packages (from pandas==0.25.3) (1.16.4)
2020-02-03T12:33:09.065+01:00
Requirement already satisfied, skipping upgrade: six>=1.5 in ./anaconda3/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas==0.25.3) (1.13.0)
2020-02-03T12:33:09.065+01:00
Installing collected packages: pandas Found existing installation: pandas 0.24.2 Uninstalling pandas-0.24.2: Successfully uninstalled pandas-0.24.2
2020-02-03T12:33:12.066+01:00
Successfully installed pandas-0.25.3

What could be the problem?

Answer

if you want to install the packages only in for the python3 environment, use the following script in your Create Sagemaker Lifecycle configurations.

#!/bin/bash
sudo -u ec2-user -i <<'EOF'

# This will affect only the Jupyter kernel called "conda_python3".
source activate python3

# Replace myPackage with the name of the package you want to install.
pip install pandas==0.25.3
# You can also perform "conda install" here as well.
source deactivate
EOF

Reference : “Lifecycle Configuration Best Practices



Source: stackoverflow