Model Evaluations & Training Metrics

Hyperstack AI Studio provides multiple tools for evaluating fine-tuned models at different stages of the development lifecycle. Use Custom Evaluations to define your own evaluation criteria and data—ideal for assessing model behavior on specific tasks relevant to your use case. After training, you can run Benchmark Evaluations to test your model against standardized tasks in reasoning, mathematics, and knowledge recall, offering a baseline of performance across domains. To track training performance and spot issues early, use Training Metrics to monitor training and validation loss in real time or after training completes.

Create custom evaluation workflows to assess your model on defined criteria using selected logs and comparison models.

Run standardized tests to evaluate your model across core domains such as math, reasoning, and factual knowledge.

Monitor training and validation loss in real time or after training to understand your model’s learning progress.