Model Evaluations & Training Metrics
Hyperstack AI Studio provides multiple tools for evaluating fine-tuned models at different stages of the development lifecycle. Use Custom Evaluations to define your own evaluation criteria and data—ideal for assessing model behavior on specific tasks relevant to your use case. After training, you can run Benchmark Evaluations to test your model against standardized tasks in reasoning, mathematics, and knowledge recall, offering a baseline of performance across domains. To track training performance and spot issues early, use Training Metrics to monitor training and validation loss in real time or after training completes.