Lab 3 - Train your model

Download the training Script

Start with creating a folder for your training scripts.
1
# Create a directory
2
mkdir train
3
cd train
4
5
# Download the training script
6
wget https://raw.githubusercontent.com/GlobalAICommunity/back-together-2021/main/workshop-assets/amls/train.py
7
8
# Go back to the project directory
9
cd ..
Copied!
This training script is a slightly modified version of the Transfer Learning for Computer vision Tutorial on the PyTorch website.

Create a training job

Now that we have a training script we need to configure how the training job is going to run in the cloud.
We start with creating an empty yaml file.
1
code job.yml
Copied!
In this file we are going to configure how to execute and run our training file. Copy and paste the content below in the job.yml.
1
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
2
experiment_name: SimpsonsClassification
3
code:
4
local_path: ./train
5
command: python train.py --data {inputs.training_data} --num-epochs 8 --model-name SimpsonsClassification
6
environment: azureml:AzureML-pytorch-1.7-ubuntu18.04-py37-cuda11-gpu:3
7
compute:
8
target: azureml:gpu-cluster
9
inputs:
10
training_data:
11
mode: mount
12
data: azureml:LegoSimpsons:1
Copied!
code local_path
This is the folder that contains the train.py and other files needed for your job to run successful. Everything is this folder is copied over to the experiment artifacts.
command
The command
Now we can create the job with the command below. The job takes around 5-10 minutes to complete.
1
az ml job create --file job.yml --query name -o tsv
Copied!
The "--query name -o tsv" command prints the name of the run in the console. Copy this name and put it in <run_name> in the command below.
While the job is running, you can stream the live output of the job using the command below.
1
az ml job stream -n <run_name>
Copied!
In the current version of the SDK the command above does not work.
If you just want to see the status of the job use the command below.
1
az ml job show -n <run_name> --query status -o tsv
Copied!
The final step in the training scripts registers a PyTorch Model and a PyTorch model converted to ONNX. The names of the models are: SimpsonsClassification-onnx and SimpsonsClassification-pytorch
1
az ml model list -o table
Copied!

Checklist

Now you have 2 versioned models that can classify Simpson Images in your Azure Machine Learning Workspace.
You have:
  • Downloaded the training script
  • Create a job configuration
  • Run the job
  • Monitored the output
  • Validated that the models have been registered