Lab 3 - Train your model

Download the training Script

Start with creating a folder for your training scripts.

# Create a directory
mkdir train
cd train

# Download the training script

# Go back to the project directory
cd .. 

This training script is a slightly modified version of the Transfer Learning for Computer vision Tutorial on the PyTorch website.

Create a training job

Now that we have a training script we need to configure how the training job is going to run in the cloud.

We start with creating an empty yaml file.

code job.yml

In this file we are going to configure how to execute and run our training file. Copy and paste the content below in the job.yml.

experiment_name: SimpsonsClassification
  local_path: ./train
command: python --data {inputs.training_data} --num-epochs 8 --model-name SimpsonsClassification
environment: azureml:AzureML-pytorch-1.7-ubuntu18.04-py37-cuda11-gpu:3
  target: azureml:gpu-cluster
    mode: mount
    data: azureml:LegoSimpsons:1

Now we can create the job with the command below. The job takes around 5-10 minutes to complete.

az ml job create --file job.yml --query name -o tsv

The "--query name -o tsv" command prints the name of the run in the console. Copy this name and put it in <run_name> in the command below.

While the job is running, you can stream the live output of the job using the command below.

az ml job stream -n <run_name>

In the current version of the SDK the command above does not work.

If you just want to see the status of the job use the command below.

az ml job show -n <run_name> --query status -o tsv

The final step in the training scripts registers a PyTorch Model and a PyTorch model converted to ONNX. The names of the models are: SimpsonsClassification-onnx and SimpsonsClassification-pytorch

az ml model list -o table


Now you have 2 versioned models that can classify Simpson Images in your Azure Machine Learning Workspace.

You have:

Last updated