Monitor Fine-Tuning & Training Metrics

This article explains how to monitor and access training data for fine-tuning jobs in Hyperstack AI Studio using both the API and the user interface. It covers how to check job status, track training and validation loss, and interpret visualizations to evaluate model performance throughout the training process.

View Training Metrics Using the UI

You can monitor training metrics in Hyperstack AI Studio during and after the fine-tuning process. While training is in progress, metrics are displayed on the model’s detail page. Once training completes, the full set of data becomes available under the Training Metrics tab in the Model Evaluations section.

The Training Metrics page includes the following:

Final Metrics – Reports your model’s final training and validation loss values.
Hyperparameters Used – Lists the configuration settings used during training, such as learning rate and batch size.
Performance Comparison (Start/End) – Summarizes the change in loss values from before and after fine-tuning.
Performance Comparison (Loss Chart) – A bar chart showing how much loss decreased through training.
Model Performance Over Steps – A line graph tracking the training loss reduction over time.

Interpreting training metrics

See the Interpreting Training Metrics section for guidance on understanding your model’s fine-tuning results.

Monitor Training Metrics During Fine-Tuning

To view training metrics while a fine-tuning job is in progress, follow these steps:

Navigate to the Models page and select the fine-tuned model you want to monitor.
During training, real-time progress and metrics are shown directly on the model’s detail page.
- For help understanding these metrics, refer to the Interpreting Training Metrics section below.

Access Metrics After Training Completes

To review training metrics after a fine-tuning job has completed, follow these steps:

Once training finishes, navigate to the Models page and select the fine-tuned model you want to monitor.
Under the Model Evaluations section on the model’s detail page, click Training Metrics to access the full set of training results.
- For help understanding these metrics, refer to the Interpreting Training Metrics section below.

Interpreting Training Metrics

After training completes, the Training Metrics page displays a comprehensive summary of how your model performed during fine-tuning. The metrics and visualizations are organized into several key sections, each helping you assess different aspects of model behavior.

Final Metrics

Training Loss: Indicates how well the model fit your training data. Lower values reflect better performance. In many cases, values below 1.0 suggest strong learning.
Validation Loss: Measures how well the model generalizes to unseen data. Ideally, this should be close to the training loss. A large gap between the two may suggest overfitting.

Hyperparameters Used

These settings define the training configuration and can help explain why the model performed a certain way:

Learning Rate: The step size for model weight updates. Typical values are around 0.0001 for stable training.
Batch Size: Number of examples processed in one step. Smaller values (e.g., 4) are common in constrained environments.
Epochs: The number of full passes over the training data. More epochs can improve learning, but excessive values may overfit.
Percentage of Dataset for Eval: Fraction of data held out for validation—commonly 5%.
LoRA Rank (r): Controls the rank of inserted low-rank adapters. 32–64 is standard for balancing performance and resource usage.
LoRA Alpha: A scaling factor for LoRA updates. Larger values increase the effect of the fine-tuned weights.
LoRA Dropout: Helps prevent overfitting by adding noise. A value of 0.05 is commonly used.
Gradient Accumulation Steps: Number of steps before backpropagation. Useful for simulating larger batch sizes without increasing memory usage.
Micro Batch Size: Size of sub-batches within an accumulated step. Smaller values reduce memory load.

Performance Comparison (Start/End of Fine-Tuning)

This section summarizes the change in loss before and after training:

Training Loss Reduction: Indicates how much better the model performs on its training data post-fine-tuning.
Validation Loss Reduction: Reflects improved generalization. A strong decrease is desirable.

Performance Comparison (Loss Chart)

Bar chart showing pre- and post-training loss values:

Before Fine-Tuning (Gray): Baseline loss levels.
After Fine-Tuning (Blue): Final loss values after model updates.

Interpretation:

A visible drop in both bars indicates successful fine-tuning.
Minimal change may indicate ineffective training or data mismatch.

Model Performance Over Steps (Loss Curve)

Line chart visualizing how training loss changed over time:

A downward-sloping curve signals successful learning progression.
Spikes or instability can suggest noisy data or poor learning rates.
A flat or plateauing curve might indicate early convergence or underfitting.

Retrieve Training Metrics API

GET `https://api.ai.hyperstack.cloud/api/v1/named-training-info-log/{MODEL_NAME}`

This endpoint retrieves training details for any fine-tuned model such as training and validation loss, training status, and performance history. It's helpful for monitoring model performance and debugging training runs.

Replace the following variables before running the command:

API_KEY: Your API key.
{MODEL_NAME}: Include the name of the model to retrieve training details for in the path of the request.

curl -X GET "https://api.ai.hyperstack.cloud/api/v1/named-training-info-log/{MODEL_NAME}" \
  -H "X-API-Key: API_KEY" \
  -H "Content-Type: application/json"

Required Parameters

model_name (string) – Name of the model to retrieve training logs for.

Response

Success - 200 OK
{
  "metrics": {
    "end_train_message": [
      "Training job ended"
    ],
    "end_train_status": [
      "dormant"
    ],
    "eval_loss": [
      3.7874512672424316,
      2.4864907264709473
    ],
    "eval_perplexity": [],
    "loss": [
      4.3348,
      4.3847,
      4.6717,
      3.1481,
      2.3838
    ],
    "perplexity": []
  },
  "status": "success"
}

Click to view descriptions of response fields

metrics `object`

Contains the training and evaluation metrics recorded during the fine-tuning process.

Show child attributes

end_train_message `array`

A message indicating how the training job concluded. Typically includes phrases like "Training job ended" or error descriptions if training was interrupted.

end_train_status `array`

The final status of the training pod. Common values include:

"dormant" – Training completed and resources were released.
"failed" – Training encountered an error and was terminated.

eval_loss `array`

An array showing validation loss values. Typically includes:

First value: loss before fine-tuning.
Second value: loss after fine-tuning.
Lower values generally indicate better generalization performance.

loss `array`

An array of training loss values recorded during different steps of the fine-tuning process. This sequence shows how the model's performance improved over time. The last value is the final training loss.

eval_perplexity `array`

(optional) Perplexity values on the validation set before and after fine-tuning. A lower value indicates more confident and accurate predictions. This field may be empty if perplexity is not computed.

perplexity `array`

(optional) Training perplexity recorded over steps. Not populated in all training runs.

status `string`

Indicates the result of the API call. "success" confirms that the training information was retrieved correctly.

Failure
{
  "metrics": {
    "end_train_status": ["failed_training"],
    "loss": []
  },
  "status": "success"
}

Monitor Fine-Tuning & Training Metrics

In this article

View Training Metrics Using the UI

Monitor Training Metrics During Fine-Tuning

Access Metrics After Training Completes

Interpreting Training Metrics

Final Metrics

Hyperparameters Used

Performance Comparison (Start/End of Fine-Tuning)

Performance Comparison (Loss Chart)

Model Performance Over Steps (Loss Curve)

Retrieve Training Metrics API

GET `https://api.ai.hyperstack.cloud/api/v1/named-training-info-log/{MODEL_NAME}`

Required Parameters

Response

metrics `object`

end_train_message `array`

end_train_status `array`

eval_loss `array`

loss `array`

eval_perplexity `array`

perplexity `array`

status `string`

Back to top

Monitor Fine-Tuning & Training Metrics

In this article​

View Training Metrics Using the UI​

Monitor Training Metrics During Fine-Tuning​

Access Metrics After Training Completes​

Interpreting Training Metrics​

Final Metrics​

Hyperparameters Used​

Performance Comparison (Start/End of Fine-Tuning)​

Performance Comparison (Loss Chart)​

Model Performance Over Steps (Loss Curve)​

Retrieve Training Metrics API​

GET https://api.ai.hyperstack.cloud/api/v1/named-training-info-log/{MODEL_NAME}​

Required Parameters​

Response​

metrics object​

end_train_message array​

end_train_status array​

eval_loss array​

loss array​

eval_perplexity array​

perplexity array​

status string​

Back to top

In this article

View Training Metrics Using the UI

Monitor Training Metrics During Fine-Tuning

Access Metrics After Training Completes

Interpreting Training Metrics

Final Metrics

Hyperparameters Used

Performance Comparison (Start/End of Fine-Tuning)

Performance Comparison (Loss Chart)

Model Performance Over Steps (Loss Curve)

Retrieve Training Metrics API

GET `https://api.ai.hyperstack.cloud/api/v1/named-training-info-log/{MODEL_NAME}`

Required Parameters

Response

metrics `object`

end_train_message `array`

end_train_status `array`

eval_loss `array`

loss `array`

eval_perplexity `array`

perplexity `array`

status `string`