AI Studio Inference & Playground

This article explains how to run model inference using Hyperstack AI Studio. Inference is the process of using trained models to generate human-like text based on input data. AI Studio offers inference as a service via a flexible API and an interactive Playground UI, enabling you to test base and fine-tuned models, configure generation parameters, and deploy custom adapters.

Model Inference

Inference is the process of using a trained model to make predictions or generate text based on input data. This page will guide you through the process of making API requests to the model.

Whether you're building a chatbot, generating text, or simply exploring the capabilities of Hyperstack AI Studio, this page will provide you with the foundational knowledge you need to get started. We'll cover the requirements for making API requests to the model.

Model Inference Using the API

To make requests to the model, you'll need to include your API key in the request headers:

Replace the following variables before running the command:

API_KEY: Your AI Studio API key.
model: The name of the model you want to use.
stream: - Set to true to return the response as a stream of data chunks as they are generated. Set to false to receive a single complete message once generation is finished.
messages: The prompt or user input for inference. For expected format, see here.
To control model behavior, see the Optional Parameters section below.

curl -X POST "https://api.genai.hyperstack.cloud/api/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: API_KEY" \
  -d '{
    "model": "your-model-name",
    "messages": [
      {"role": "user", "content": "YOUR TEXT HERE"}
    ],
    "stream": true,
    "max_tokens": 100,
    "temperature": 0.5,
    "top_p": 0.5,
    "top_k": 40,
    "presence_penalty": 0,
    "repetition_penalty": 0.5
  }'

Required Parameters

model (string) – The name of the model you want to use.
messages (array of objects) – A sequence of message objects representing user/system dialogue history. Each message must include a role and content field.
stream (boolean) - If set to true, the API will return the response as a stream of data chunks as they are generated. If false, the response will be returned as a single complete message once generation is finished.

Optional Parameters

max_tokens (integer) – Maximum number of tokens to generate. Default: 100
temperature (float) – Controls randomness in output. Range: 0.0–2.0. Default: 1.0
top_p (float) – Nucleus sampling parameter. Range: 0.0–1.0. Default: 1.0
top_k (integer) – Limits sampling to top-k tokens. Default: 40
presence_penalty (float) – Penalizes tokens based on their presence in the text. Range: -2.0–2.0. Default: 0
repetition_penalty (float) – Penalizes repeated tokens. Range: 1.0–2.0. Default: 1.0

Playground - Model Inference Using the UI

Interact with supported chat models and your trained models in a chat-style interface using Hyperstack AI Studio's Playground. Customize behavior with system prompts and compare models side-by-side.

Users can converse with supported chat models and models they have trained and deployed in a chat-style interface. Customize the behavior of the model using system prompts.

The playground provides a user-friendly interface for testing and experimenting with your models.

If a user would like to compare one model with another, they can select Compare Side-by-Side. Select the two models and evaluate the difference in answers.

Parameters

The playground allows you to configure several parameters that control the model's behavior:

System Prompt: Sets the initial context and instructions for the model, defining its role, behavior, and specific guidelines to follow.
Max Tokens: Controls the maximum length of the model's response. Higher values allow for longer responses, while lower values keep responses concise.
Temperature: A value between 0 and 1 that controls response randomness. Lower values (near 0) produce more focused, deterministic responses, while higher values (near 1) produce more creative, diverse outputs.
Presence Penalty: Influences the model's tendency to introduce new topics. Higher values encourage the model to cover new concepts rather than dwelling on previously mentioned information.
Repetition Penalty: Controls how strongly the model avoids repeating words or phrases. Higher values reduce the likelihood of repeated language patterns.

These parameters can be adjusted in real-time to adjust the model's behavior for your specific use case.

AI Studio Inference & Playground

In this article​

Model Inference​

Model Inference Using the API​

Required Parameters​

Optional Parameters​

Playground - Model Inference Using the UI​

Parameters​

In this article