Looking at the following source code taken from here (SDK v2):
import boto3 import sagemaker from sagemaker.xgboost.estimator import XGBoost from sagemaker.session import Session from sagemaker.inputs import TrainingInput # initialize hyperparameters hyperparameters = { "max_depth":"5", "eta":"0.2", "gamma":"4", "min_child_weight":"6", "subsample":"0.7", "verbosity":"1", "objective":"reg:linear", "num_round":"50"} # set an output path where the trained model will be saved bucket = sagemaker.Session().default_bucket() prefix = 'DEMO-xgboost-as-a-framework' output_path = 's3://{}/{}/{}/output'.format(bucket, prefix, 'abalone-xgb-framework') # construct a SageMaker XGBoost estimator # specify the entry_point to your xgboost training script estimator = XGBoost(entry_point = "your_xgboost_abalone_script.py", framework_version='1.2-2', hyperparameters=hyperparameters, role=sagemaker.get_execution_role(), instance_count=1, instance_type='ml.m5.2xlarge', output_path=output_path) # define the data type and paths to the training and validation datasets content_type = "libsvm" train_input = TrainingInput("s3://{}/{}/{}/".format(bucket, prefix, 'train'), content_type=content_type) validation_input = TrainingInput("s3://{}/{}/{}/".format(bucket, prefix, 'validation'), content_type=content_type) # execute the XGBoost training job estimator.fit({'train': train_input, 'validation': validation_input})
I wonder where the your_xgboost_abalone_script.py file has to be placed please? So far I used XGBoost as a built-in algorithm from my local machine with similar code (i.e. I span up a training job remotely). Thanks!
PS:
Looking at this, and source_dir, I wonder if one can upload Python files to S3. In this case, I take it is has to be tar.gz? Thanks!
Advertisement
Answer
your_xgboost_abalone_script.py
can be created locally. The path you provide is relative to where the code is running.
I.e. your_xgboost_abalone_script.py
can be located in the same directory where you are running the SageMaker SDK (“source code”).
For example if you have your_xgboost_abalone_script.py
in the same directory as the source code:
. ├── source_code.py └── your_xgboost_abalone_script.py
Then you can point to this file exactly how the documentation depicts:
estimator = XGBoost(entry_point = "your_xgboost_abalone_script.py", . . . )
The SDK will take your_xgboost_abalone_script.py
repackage it into a model tar ball and upload it to S3 on your behalf.