Monitor Fine-Tuning Jobs
This article explains how to monitor fine-tuning jobs in Hyperstack AI Studio using both the API and the user interface. It covers how to check job status, track training and validation loss, and interpret visualizations to evaluate model performance throughout the training process.
In this article
Monitoring Using the API
This endpoint retrieves training details for any fine-tuned model such as training and validation loss, training status, and performance history. It's helpful for monitoring model performance and debugging training runs.
Replace the following variables before running the command:
API_KEY
: Your AI Studio API key.{MODEL_NAME}
: Include the name of the model to retrieve training details for in the path of the request.
curl -X GET "https://api.genai.hyperstack.cloud/tailor/v1/named-training-info-log/{MODEL_NAME}" \
-H "X-API-Key: API_KEY" \
-H "Content-Type: application/json"
Required Parameters
model_name
(string)
– Name of the model to retrieve training logs for.
Response: Success
{
"metrics": {
"end_train_message": [
"Training job ended"
],
"end_train_status": [
"dormant"
],
"eval_loss": [
3.7874512672424316,
2.4864907264709473
],
"eval_perplexity": [],
"loss": [
4.3348,
4.3847,
4.6717,
3.1481,
2.3838
],
"perplexity": []
},
"status": "success"
}
Click to view descriptions of response fields
metrics object
Contains the training and evaluation metrics recorded during the fine-tuning process.
Show child attributes
end_train_message array
A message indicating how the training job concluded. Typically includes phrases like "Training job ended"
or error descriptions if training was interrupted.
end_train_status array
The final status of the training pod. Common values include:
"dormant"
– Training completed and resources were released."failed"
– Training encountered an error and was terminated.
eval_loss array
An array showing validation loss values. Typically includes:
- First value: loss before fine-tuning.
- Second value: loss after fine-tuning.
Lower values generally indicate better generalization performance.
loss array
An array of training loss values recorded during different steps of the fine-tuning process. This sequence shows how the model's performance improved over time. The last value is the final training loss.
eval_perplexity array
(optional) Perplexity values on the validation set before and after fine-tuning. A lower value indicates more confident and accurate predictions. This field may be empty if perplexity is not computed.
perplexity array
(optional) Training perplexity recorded over steps. Not populated in all training runs.
status string
Indicates the result of the API call. "success"
confirms that the training information was retrieved correctly.
Response: Failure
{
"metrics": {
"end_train_status": ["failed_training"],
"loss": []
},
"status": "success"
}
Monitoring Using the UI
To track the progress and performance of your fine-tuned models, follow these steps in Hyperstack AI Studio:
-
Open the Model Details Page
Go to the My Models page and click on the fine-tuned model you want to monitor. This will take you to the model’s training details view.
-
Check Job Status
The status panel shows:
- Training Jobs – Jobs currently in progress.
- Completed Jobs – Finished jobs with full metrics available.
- Failed Jobs – Jobs that encountered errors during training.
-
Review Metrics
You’ll see metrics such as:
- Training Loss – How well the model fits your training data.
- Validation Loss – How well the model performs on unseen data.
-
Analyze Visualizations
The following charts help evaluate model performance:
- Performance Comparison Chart – Compares pre- and post-fine-tuning loss values.
- Model Performance Over Steps – Displays training loss reduction over time.
Example Metrics Display
-
Training Details
- Model Name:
Legal-1.0
- Tags:
legal
,tax
- Base Model:
Mistral-7B
- Training Status:
Completed
- Training Duration:
1 hour 30 minutes
- Model Name:
-
Current Metrics
- Training Loss:
5.6282
- Validation Loss:
10.8603
- Training Loss:
-
Hyperparameters Used
- Learning Rate:
auto
- Batch Size:
auto
- Epochs:
auto
- Eval Split:
5% (auto)
- Learning Rate:
-
Performance Comparison
- Training Loss Reduction:
5.2321
- Validation Loss Reduction:
0.0000
- Training Loss Reduction:
Example Charts
Performance Comparison (Start vs. End)
Metric | Before Fine-Tuning | After Fine-Tuning |
---|---|---|
Training Loss | 10.860 | 5.628 |
Validation Loss | 10.861 | 5.860 |
Tracking both training and validation loss helps you understand if the model is improving. A steady decrease in these values generally reflects successful training.
Key Benefits
- Monitor regularly – Keep an eye on training progress and outcomes.
- Compare fine-tuning runs – Analyze trends across different model runs.
- Tune intelligently – Use the insights to improve data selection and hyperparameter settings.