Skip to main content

AI Studio Inference & Playground

This article explains how to run model inference using Hyperstack AI Studio. Inference is the process of using trained models to generate human-like text based on input data. AI Studio offers inference as a service via a flexible API and an interactive Playground UI, enabling you to test base and fine-tuned models, configure generation parameters, and deploy custom adapters.

In this article


Model Inference

Inference is the process of using a trained model to make predictions or generate text based on input data. This page will guide you through the process of making API requests to the model.

Whether you're building a chatbot, generating text, or simply exploring the capabilities of Hyperstack AI Studio, this page will provide you with the foundational knowledge you need to get started. We'll cover the requirements for making API requests to the model.

Model Inference Using the API

To make requests to the model, you'll need to include your API key in the request headers:

Replace the following variables before running the command:

  • API_KEY: Your AI Studio API key.
  • model: The name of the model you want to use.
  • stream: - Set to true to return the response as a stream of data chunks as they are generated. Set to false to receive a single complete message once generation is finished.
  • messages: The prompt or user input for inference. For expected format, see here.
  • To control model behavior, see the Optional Parameters section below.
curl -X POST "https://api.genai.hyperstack.cloud/api/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "X-API-KEY: API_KEY" \
-d '{
"model": "your-model-name",
"messages": [
{"role": "user", "content": "YOUR TEXT HERE"}
],
"stream": true,
"max_tokens": 100,
"temperature": 0.5,
"top_p": 0.5,
"top_k": 40,
"presence_penalty": 0,
"repetition_penalty": 0.5
}'

Playground - Model Inference Using the UI

Interact with supported chat models and your trained models in a chat-style interface using Hyperstack AI Studio's Playground. Customize behavior with system prompts and compare models side-by-side.

Users can converse with supported chat models and models they have trained and deployed in a chat-style interface. Customize the behavior of the model using system prompts.

The playground provides a user-friendly interface for testing and experimenting with your models.

If a user would like to compare one model with another, they can select Compare Side-by-Side. Select the two models and evaluate the difference in answers.

Parameters

The playground allows you to configure several parameters that control the model's behavior:

  • System Prompt: Sets the initial context and instructions for the model, defining its role, behavior, and specific guidelines to follow.
  • Max Tokens: Controls the maximum length of the model's response. Higher values allow for longer responses, while lower values keep responses concise.
  • Temperature: A value between 0 and 1 that controls response randomness. Lower values (near 0) produce more focused, deterministic responses, while higher values (near 1) produce more creative, diverse outputs.
  • Presence Penalty: Influences the model's tendency to introduce new topics. Higher values encourage the model to cover new concepts rather than dwelling on previously mentioned information.
  • Repetition Penalty: Controls how strongly the model avoids repeating words or phrases. Higher values reduce the likelihood of repeated language patterns.

These parameters can be adjusted in real-time to adjust the model's behavior for your specific use case.