Skip to main content

Monitor Fine-Tuning Jobs

This article explains how to monitor fine-tuning jobs in Hyperstack AI Studio using both the API and the user interface. It covers how to check job status, track training and validation loss, and interpret visualizations to evaluate model performance throughout the training process.

In this article


Monitoring Using the API

This endpoint retrieves training details for any fine-tuned model such as training and validation loss, training status, and performance history. It's helpful for monitoring model performance and debugging training runs.

Replace the following variables before running the command:

  • API_KEY: Your AI Studio API key.
  • {MODEL_NAME}: Include the name of the model to retrieve training details for in the path of the request.
curl -X GET "https://api.genai.hyperstack.cloud/tailor/v1/named-training-info-log/{MODEL_NAME}" \
-H "X-API-Key: API_KEY" \
-H "Content-Type: application/json"

Response: Success

{
"metrics": {
"end_train_message": [
"Training job ended"
],
"end_train_status": [
"dormant"
],
"eval_loss": [
3.7874512672424316,
2.4864907264709473
],
"eval_perplexity": [],
"loss": [
4.3348,
4.3847,
4.6717,
3.1481,
2.3838
],
"perplexity": []
},
"status": "success"
}
Click to view descriptions of response fields
metrics object

Contains the training and evaluation metrics recorded during the fine-tuning process.

Show child attributes
end_train_message array

A message indicating how the training job concluded. Typically includes phrases like "Training job ended" or error descriptions if training was interrupted.


end_train_status array

The final status of the training pod. Common values include:

  • "dormant" – Training completed and resources were released.
  • "failed" – Training encountered an error and was terminated.

eval_loss array

An array showing validation loss values. Typically includes:

  • First value: loss before fine-tuning.
  • Second value: loss after fine-tuning.
    Lower values generally indicate better generalization performance.

loss array

An array of training loss values recorded during different steps of the fine-tuning process. This sequence shows how the model's performance improved over time. The last value is the final training loss.


eval_perplexity array

(optional) Perplexity values on the validation set before and after fine-tuning. A lower value indicates more confident and accurate predictions. This field may be empty if perplexity is not computed.


perplexity array

(optional) Training perplexity recorded over steps. Not populated in all training runs.


status string

Indicates the result of the API call. "success" confirms that the training information was retrieved correctly.

Response: Failure

{
"metrics": {
"end_train_status": ["failed_training"],
"loss": []
},
"status": "success"
}

Monitoring Using the UI

To track the progress and performance of your fine-tuned models, follow these steps in Hyperstack AI Studio:

  1. Open the Model Details Page

    Go to the My Models page and click on the fine-tuned model you want to monitor. This will take you to the model’s training details view.

  2. Check Job Status

    The status panel shows:

    • Training Jobs – Jobs currently in progress.
    • Completed Jobs – Finished jobs with full metrics available.
    • Failed Jobs – Jobs that encountered errors during training.
  3. Review Metrics

    You’ll see metrics such as:

    • Training Loss – How well the model fits your training data.
    • Validation Loss – How well the model performs on unseen data.
  4. Analyze Visualizations

    The following charts help evaluate model performance:

    • Performance Comparison Chart – Compares pre- and post-fine-tuning loss values.
    • Model Performance Over Steps – Displays training loss reduction over time.

Example Metrics Display

  • Training Details

    • Model Name: Legal-1.0
    • Tags: legal, tax
    • Base Model: Mistral-7B
    • Training Status: Completed
    • Training Duration: 1 hour 30 minutes
  • Current Metrics

    • Training Loss: 5.6282
    • Validation Loss: 10.8603
  • Hyperparameters Used

    • Learning Rate: auto
    • Batch Size: auto
    • Epochs: auto
    • Eval Split: 5% (auto)
  • Performance Comparison

    • Training Loss Reduction: 5.2321
    • Validation Loss Reduction: 0.0000

Example Charts

Performance Comparison (Start vs. End)

MetricBefore Fine-TuningAfter Fine-Tuning
Training Loss10.8605.628
Validation Loss10.8615.860
Model learning analysis

Tracking both training and validation loss helps you understand if the model is improving. A steady decrease in these values generally reflects successful training.

Key Benefits

  • Monitor regularly – Keep an eye on training progress and outcomes.
  • Compare fine-tuning runs – Analyze trends across different model runs.
  • Tune intelligently – Use the insights to improve data selection and hyperparameter settings.