Monitor Fine-Tuning Jobs

This article explains how to monitor fine-tuning jobs in Hyperstack AI Studio using both the API and the user interface. It covers how to check job status, track training and validation loss, and interpret visualizations to evaluate model performance throughout the training process.

Monitoring Using the API

This endpoint retrieves training details for any fine-tuned model such as training and validation loss, training status, and performance history. It's helpful for monitoring model performance and debugging training runs.

Replace the following variables before running the command:

API_KEY: Your AI Studio API key.
{MODEL_NAME}: Include the name of the model to retrieve training details for in the path of the request.

curl -X GET "https://api.genai.hyperstack.cloud/tailor/v1/named-training-info-log/{MODEL_NAME}" \
  -H "X-API-Key: API_KEY" \
  -H "Content-Type: application/json"

Required Parameters

model_name (string) – Name of the model to retrieve training logs for.

Response: Success

{
  "metrics": {
    "end_train_message": [
      "Training job ended"
    ],
    "end_train_status": [
      "dormant"
    ],
    "eval_loss": [
      3.7874512672424316,
      2.4864907264709473
    ],
    "eval_perplexity": [],
    "loss": [
      4.3348,
      4.3847,
      4.6717,
      3.1481,
      2.3838
    ],
    "perplexity": []
  },
  "status": "success"
}

Click to view descriptions of response fields

metrics `object`

Contains the training and evaluation metrics recorded during the fine-tuning process.

Show child attributes

end_train_message `array`

A message indicating how the training job concluded. Typically includes phrases like "Training job ended" or error descriptions if training was interrupted.

end_train_status `array`

The final status of the training pod. Common values include:

"dormant" – Training completed and resources were released.
"failed" – Training encountered an error and was terminated.

eval_loss `array`

An array showing validation loss values. Typically includes:

First value: loss before fine-tuning.
Second value: loss after fine-tuning.
Lower values generally indicate better generalization performance.

loss `array`

An array of training loss values recorded during different steps of the fine-tuning process. This sequence shows how the model's performance improved over time. The last value is the final training loss.

eval_perplexity `array`

(optional) Perplexity values on the validation set before and after fine-tuning. A lower value indicates more confident and accurate predictions. This field may be empty if perplexity is not computed.

perplexity `array`

(optional) Training perplexity recorded over steps. Not populated in all training runs.

status `string`

Indicates the result of the API call. "success" confirms that the training information was retrieved correctly.

Response: Failure

{
  "metrics": {
    "end_train_status": ["failed_training"],
    "loss": []
  },
  "status": "success"
}

Monitoring Using the UI

To track the progress and performance of your fine-tuned models, follow these steps in Hyperstack AI Studio:

Open the Model Details Page

Go to the My Models page and click on the fine-tuned model you want to monitor. This will take you to the model’s training details view.
Check Job Status

The status panel shows:
- Training Jobs – Jobs currently in progress.
- Completed Jobs – Finished jobs with full metrics available.
- Failed Jobs – Jobs that encountered errors during training.
Review Metrics

You’ll see metrics such as:
- Training Loss – How well the model fits your training data.
- Validation Loss – How well the model performs on unseen data.
Analyze Visualizations

The following charts help evaluate model performance:
- Performance Comparison Chart – Compares pre- and post-fine-tuning loss values.
- Model Performance Over Steps – Displays training loss reduction over time.

Example Metrics Display

Training Details
- Model Name: Legal-1.0
- Tags: legal, tax
- Base Model: Mistral-7B
- Training Status: Completed
- Training Duration: 1 hour 30 minutes
Current Metrics
- Training Loss: 5.6282
- Validation Loss: 10.8603
Hyperparameters Used
- Learning Rate: auto
- Batch Size: auto
- Epochs: auto
- Eval Split: 5% (auto)
Performance Comparison
- Training Loss Reduction: 5.2321
- Validation Loss Reduction: 0.0000

Example Charts

Performance Comparison (Start vs. End)

Metric	Before Fine-Tuning	After Fine-Tuning
Training Loss	10.860	5.628
Validation Loss	10.861	5.860

Model learning analysis

Tracking both training and validation loss helps you understand if the model is improving. A steady decrease in these values generally reflects successful training.

Key Benefits

Monitor regularly – Keep an eye on training progress and outcomes.
Compare fine-tuning runs – Analyze trends across different model runs.
Tune intelligently – Use the insights to improve data selection and hyperparameter settings.

Monitor Fine-Tuning Jobs

In this article​

Monitoring Using the API​

Required Parameters​

Response: Success​

metrics object​

end_train_message array​

end_train_status array​

eval_loss array​

loss array​

eval_perplexity array​

perplexity array​

status string​

Response: Failure​

Monitoring Using the UI​

Example Metrics Display​

Example Charts​

Performance Comparison (Start vs. End)​

Key Benefits​

In this article