Gateway API
The Arkonova Gateway provides a unified OpenAI-compatible endpoint for accessing multiple AI providers through a single API key. Route requests to GPT-4o, Claude, Gemini, or any custom model without changing your client code.
Introduction
The Gateway is a reverse proxy that translates your requests into provider-specific calls, applying authentication, quota enforcement, routing policy, and telemetry collection transparently. From your client's perspective it looks like a standard OpenAI API.
Any library that supports a configurable base_url — the OpenAI Python/Node SDK,
LangChain, LiteLLM, LlamaIndex, and others — works with the Gateway without modifications.
POST /v1/chat/completions). Existing OpenAI integrations only need a
changed base_url and a new API key.
Authentication
All Gateway requests must carry an Arkonova API key in the standard
HTTP Authorization header. Keys are prefixed with ark-.
You can issue and manage API keys from your account dashboard. Each key can be scoped to specific providers or models, have token quotas, and optionally be restricted by IP range.
ark- key in client-side code. Keep it server-side or in environment variables.
Base URL
Set this as base_url in your SDK configuration. The gateway exposes
the same path structure as the OpenAI API, so /chat/completions,
/models, and /embeddings all work as expected.
Chat Completions
POST /gateway/v1/chat/completions
The primary endpoint. Accepts the standard OpenAI Chat Completions request body.
The model field determines which provider and model the request is routed to.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Required | Model ID, e.g. gpt-4o, claude-sonnet-4-6, gemini-2.0-flash |
| messages | array | Required | Array of message objects with role (system, user, assistant) and content |
| stream | boolean | Optional | If true, enables SSE token-by-token streaming. Default: false |
| temperature | number | Optional | Sampling temperature 0–2. Default: 1 |
| max_tokens | integer | Optional | Maximum tokens to generate |
| top_p | number | Optional | Nucleus sampling probability 0–1 |
| stop | string / array | Optional | Stop sequence(s) |
| n | integer | Optional | Number of completions to generate. Default: 1 |
| x-fallback | array | Optional | Gateway extension: fallback model IDs tried in order if primary fails, e.g. ["gpt-4o-mini", "claude-haiku-4-5"] |
Minimal Example
Response
On success the gateway returns the provider's response wrapped in the standard OpenAI response format:
x-gateway field in the response contains routing metadata:
which provider served the request and the measured round-trip latency.
This field is always present and does not affect OpenAI SDK compatibility.
Streaming
Setting "stream": true switches the response to Server-Sent Events (SSE).
The gateway normalizes the event format across all providers — clients receive the same
data: {...} chunks regardless of which provider handles the request.
Supported Models
Pass the model ID in the model field. The gateway resolves the provider automatically.
Aliases (e.g. gpt-4o) are forwarded as-is; the gateway infers the provider from the model name prefix.
For custom/self-hosted endpoints, prefix the model ID with custom: and configure the
target URL in your dashboard. Any OpenAI-compatible endpoint (Together AI, Fireworks, local Ollama)
can be registered this way.
Routing & Fallback
The gateway supports two routing strategies:
- Direct — request goes to the provider that matches the
modelfield. - Fallback chain — if the primary model fails (provider error, rate limit, timeout), the gateway retries the next model in the
x-fallbacklist.
Fallback Chain
Include an x-fallback field in the request body with an ordered list of backup model IDs.
If the primary model returns a 429, 500, or times out, the gateway automatically retries
with the next fallback — transparently, with no change to the response format.
Policy-Based Routing (Dashboard)
In the API key settings you can define a routing policy that applies to all requests from that key:
- Cost-optimized — route to the cheapest model that meets your quality threshold.
- Latency-optimized — always pick the fastest provider at the time of the request.
- Provider-pinned — lock a key to a specific provider regardless of model ID.
API Keys & Quotas
Each ark- key has independently configurable limits:
| Setting | Description |
|---|---|
| model_allowlist | Whitelist of model IDs this key may request. Requests for unlisted models are rejected with 403. |
| token_quota | Monthly token budget (prompt + completion). Requests exceeding the quota return 429. |
| rpm_limit | Max requests per minute. Excess requests return 429 with a Retry-After header. |
| ip_allowlist | Optional list of allowed source IPs in CIDR notation. Requests from other IPs are rejected with 403. |
| provider_lock | Force all requests through a single provider regardless of the model field. |
Error Codes
The gateway returns standard HTTP status codes. Error bodies follow the OpenAI error format:
| Status | Code | Description |
|---|---|---|
| 400 | invalid_request_error | Malformed request body — missing required fields or invalid types. |
| 401 | authentication_error | Missing or invalid Authorization header. |
| 403 | model_not_allowed | The requested model is not in the key's allowlist, or IP is not whitelisted. |
| 429 | quota_exceeded / rate_limited | Token quota exhausted or RPM limit hit. Check Retry-After header. |
| 502 | provider_error | Upstream provider returned an error and all fallbacks were exhausted. |
| 504 | provider_timeout | Upstream provider did not respond within the timeout window (30 s default). |