Chat completions
POST /v1/chat/completions — the main inference endpoint, OpenAI-compatible.
| Base URL | https://litellm.tensorloop.tech |
| Auth | Authorization: Bearer YOUR_KEY |
| Content-Type | application/json |
| Pricing | Per upstream model — see /v1/models |
The body matches OpenAI's chat completions schema. Minimum:
curl https://litellm.tensorloop.tech/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{ "role": "user", "content": "Hello" }]
}'from openai import OpenAI
client = OpenAI(
base_url="https://litellm.tensorloop.tech/v1",
api_key="YOUR_KEY",
)
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://litellm.tensorloop.tech/v1",
apiKey: process.env.TENSORLOOP_KEY,
});
const resp = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);Request parameters
modelstringrequiredmessagesMessage[]requiredtemperaturenumberdefault model-dependentmax_tokensintegertop_pnumberstreambooleandefault falsestopstring | string[]presence_penaltynumberfrequency_penaltynumberresponse_formatobject{"type": "json_object"} for JSON mode. Model-dependent — only some models support it.seedintegerlogprobsbooleantop_logprobsintegertoolsTool[]tool_choicestring | objectdefault "auto""auto", "none", "required", or { type: 'function', function: { name } }.userstringMessage format
A Message is one of:
type Message =
| { role: "system" | "user" | "assistant"; content: string | ContentPart[] }
| { role: "assistant"; content: string | null; tool_calls?: ToolCall[] }
| { role: "tool"; tool_call_id: string; content: string };system messages set the assistant's behavior. They should appear once at the start of the conversation.
Multimodal content parts
For vision-capable models, content can be an array of typed parts instead of a string:
{
"role": "user",
"content": [
{ "type": "text", "text": "What's in this image?" },
{ "type": "image_url", "image_url": { "url": "https://example.com/cat.png" } }
]
}The image_url can be a public URL or a data: URI. Image size and count limits depend on the upstream model.
Tool result messages
After the model emits a tool_calls block, send the tool's output back as a role: "tool" message:
{ "role": "tool", "tool_call_id": "call_abc", "content": "{\"temp\": 22}" }tool_call_id must match the id from the model's tool_calls. content is always a string — JSON-encode any structured payload.
Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1700000000,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "..." },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20
}
}finish_reason is one of stop, length, tool_calls, or content_filter. usage always reflects what you'll be billed for — total_tokens × per-model rate.
Streaming response
Set "stream": true and the response becomes SSE chunks. See Streaming for the wire format and reconnection guidance.
Errors
See Errors for the full table. Common ones on this endpoint:
401— bad or revoked key.403— model not in the key's allowlist.429— RPM cap hit or budget exhausted. See Troubleshooting to distinguish.5xx— upstream model down; retry with backoff.