Chat completions

POST/v1/chat/completions


Base URL	`https://api.tensorloop.tech`
Auth	`Authorization: Bearer YOUR_KEY`
Content-Type	`application/json`
Pricing	Per upstream model — see /v1/models

The body matches OpenAI's chat completions schema. Minimum:

curl https://api.tensorloop.tech/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{ "role": "user", "content": "Hello" }]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://api.tensorloop.tech/v1",
    api_key="YOUR_KEY",
)

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.tensorloop.tech/v1",
  apiKey: process.env.TENSORLOOP_KEY,
});

const resp = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);

Request parameters

modelstringrequired

Model ID. Must be in the bearer key's allowlist — see /v1/models.

messagesMessage[]required

Conversation array in OpenAI role/content format. See Message format below.

temperaturenumberdefault model-dependent

Sampling temperature. 0 is deterministic; higher values are more creative.

max_tokensinteger

Hard cap on the completion length. Counts against your key budget regardless of finish reason.

top_pnumber

Nucleus sampling. Mutually-exclusive with temperature in practice — change one, not both.

streambooleandefault false

If true, the response is a Server-Sent Events stream. See Streaming.

stopstring | string[]

Up to 4 stop sequences. The model stops generating when any are produced.

presence_penaltynumber

−2 to 2. Discourages tokens already present in the conversation.

frequency_penaltynumber

−2 to 2. Discourages tokens used many times.

response_formatobject

Set to {"type": "json_object"} for JSON mode. Model-dependent — only some models support it.

seedinteger

Deterministic sampling seed. Same seed + same prompt → reproducible output (best-effort).

logprobsboolean

Return per-token log probabilities. Useful for evaluation and ranking.

top_logprobsinteger

When logprobs is true, return this many alternative tokens per position. Max 20.

toolsTool[]

Tool / function definitions the model may invoke. See Tool calling.

tool_choicestring | objectdefault "auto"

How the model picks a tool. "auto", "none", "required", or { type: 'function', function: { name } }.

userstring

Stable user identifier for abuse monitoring. Passed through to the upstream provider.

Message format

A Message is one of:

type Message =
  | { role: "system" | "user" | "assistant"; content: string | ContentPart[] }
  | { role: "assistant"; content: string | null; tool_calls?: ToolCall[] }
  | { role: "tool"; tool_call_id: string; content: string };

system messages set the assistant's behavior. They should appear once at the start of the conversation.

Multimodal content parts

For vision-capable models, content can be an array of typed parts instead of a string:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "image_url", "image_url": { "url": "https://example.com/cat.png" } }
  ]
}

The image_url can be a public URL or a data: URI. Image size and count limits depend on the upstream model.

Tool result messages

After the model emits a tool_calls block, send the tool's output back as a role: "tool" message:

{ "role": "tool", "tool_call_id": "call_abc", "content": "{\"temp\": 22}" }

tool_call_id must match the id from the model's tool_calls. content is always a string — JSON-encode any structured payload.

Response

Response200Non-streaming

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

finish_reason is one of stop, length, tool_calls, or content_filter. usage always reflects what you'll be billed for — total_tokens × per-model rate.

Streaming response

Set "stream": true and the response becomes SSE chunks. See Streaming for the wire format and reconnection guidance.

Errors

See Errors for the full table. Common ones on this endpoint:

401 — bad or revoked key.
403 — model not in the key's allowlist.
429 — RPM cap hit or budget exhausted. See Troubleshooting to distinguish.
5xx — upstream model down; retry with backoff.