TensorLoop
API reference

Chat completions

POST /v1/chat/completions — the main inference endpoint, OpenAI-compatible.

POST/v1/chat/completions
Base URLhttps://litellm.tensorloop.tech
AuthAuthorization: Bearer YOUR_KEY
Content-Typeapplication/json
PricingPer upstream model — see /v1/models

The body matches OpenAI's chat completions schema. Minimum:

curl https://litellm.tensorloop.tech/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{ "role": "user", "content": "Hello" }]
  }'
from openai import OpenAI

client = OpenAI(
    base_url="https://litellm.tensorloop.tech/v1",
    api_key="YOUR_KEY",
)

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://litellm.tensorloop.tech/v1",
  apiKey: process.env.TENSORLOOP_KEY,
});

const resp = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);

Request parameters

modelstringrequired
Model ID. Must be in the bearer key's allowlist — see /v1/models.
messagesMessage[]required
Conversation array in OpenAI role/content format. See Message format below.
temperaturenumberdefault model-dependent
Sampling temperature. 0 is deterministic; higher values are more creative.
max_tokensinteger
Hard cap on the completion length. Counts against your key budget regardless of finish reason.
top_pnumber
Nucleus sampling. Mutually-exclusive with temperature in practice — change one, not both.
streambooleandefault false
If true, the response is a Server-Sent Events stream. See Streaming.
stopstring | string[]
Up to 4 stop sequences. The model stops generating when any are produced.
presence_penaltynumber
−2 to 2. Discourages tokens already present in the conversation.
frequency_penaltynumber
−2 to 2. Discourages tokens used many times.
response_formatobject
Set to {"type": "json_object"} for JSON mode. Model-dependent — only some models support it.
seedinteger
Deterministic sampling seed. Same seed + same prompt → reproducible output (best-effort).
logprobsboolean
Return per-token log probabilities. Useful for evaluation and ranking.
top_logprobsinteger
When logprobs is true, return this many alternative tokens per position. Max 20.
toolsTool[]
Tool / function definitions the model may invoke. See Tool calling.
tool_choicestring | objectdefault "auto"
How the model picks a tool. "auto", "none", "required", or { type: 'function', function: { name } }.
userstring
Stable user identifier for abuse monitoring. Passed through to the upstream provider.

Message format

A Message is one of:

type Message =
  | { role: "system" | "user" | "assistant"; content: string | ContentPart[] }
  | { role: "assistant"; content: string | null; tool_calls?: ToolCall[] }
  | { role: "tool"; tool_call_id: string; content: string };

system messages set the assistant's behavior. They should appear once at the start of the conversation.

Multimodal content parts

For vision-capable models, content can be an array of typed parts instead of a string:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "image_url", "image_url": { "url": "https://example.com/cat.png" } }
  ]
}

The image_url can be a public URL or a data: URI. Image size and count limits depend on the upstream model.

Tool result messages

After the model emits a tool_calls block, send the tool's output back as a role: "tool" message:

{ "role": "tool", "tool_call_id": "call_abc", "content": "{\"temp\": 22}" }

tool_call_id must match the id from the model's tool_calls. content is always a string — JSON-encode any structured payload.

Response

Response200Non-streaming
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

finish_reason is one of stop, length, tool_calls, or content_filter. usage always reflects what you'll be billed for — total_tokens × per-model rate.

Streaming response

Set "stream": true and the response becomes SSE chunks. See Streaming for the wire format and reconnection guidance.

Errors

See Errors for the full table. Common ones on this endpoint:

  • 401 — bad or revoked key.
  • 403 — model not in the key's allowlist.
  • 429 — RPM cap hit or budget exhausted. See Troubleshooting to distinguish.
  • 5xx — upstream model down; retry with backoff.

On this page