Gateway API

The Arkonova Gateway provides a unified OpenAI-compatible endpoint for accessing multiple AI providers through a single API key. Route requests to GPT-4o, Claude, Gemini, or any custom model without changing your client code.

Introduction

The Gateway is a reverse proxy that translates your requests into provider-specific calls, applying authentication, quota enforcement, routing policy, and telemetry collection transparently. From your client's perspective it looks like a standard OpenAI API.

Any library that supports a configurable base_url — the OpenAI Python/Node SDK, LangChain, LiteLLM, LlamaIndex, and others — works with the Gateway without modifications.

The Gateway is fully compatible with the OpenAI Chat Completions API (POST /v1/chat/completions). Existing OpenAI integrations only need a changed base_url and a new API key.

Authentication

All Gateway requests must carry an Arkonova API key in the standard HTTP Authorization header. Keys are prefixed with ark-.

HTTP Header

Authorization: Bearer ark-xxxxxxxxxxxxxxxxxxxxxxxx

You can issue and manage API keys from your account dashboard. Each key can be scoped to specific providers or models, have token quotas, and optionally be restricted by IP range.

Never embed your ark- key in client-side code. Keep it server-side or in environment variables.

Base URL

Gateway Base URL

https://arkonova.network/gateway/v1

Set this as base_url in your SDK configuration. The gateway exposes the same path structure as the OpenAI API, so /chat/completions, /models, and /embeddings all work as expected.

Chat Completions

POST /gateway/v1/chat/completions

The primary endpoint. Accepts the standard OpenAI Chat Completions request body. The model field determines which provider and model the request is routed to.

Request Body

Parameter	Type	Required	Description
model	string	Required	Model ID, e.g. `gpt-4o`, `claude-sonnet-4-6`, `gemini-2.0-flash`
messages	array	Required	Array of message objects with `role` (`system`, `user`, `assistant`) and `content`
stream	boolean	Optional	If `true`, enables SSE token-by-token streaming. Default: `false`
temperature	number	Optional	Sampling temperature 0–2. Default: `1`
max_tokens	integer	Optional	Maximum tokens to generate
top_p	number	Optional	Nucleus sampling probability 0–1
stop	string / array	Optional	Stop sequence(s)
n	integer	Optional	Number of completions to generate. Default: `1`
x-fallback	array	Optional	Gateway extension: fallback model IDs tried in order if primary fails, e.g. `["gpt-4o-mini", "claude-haiku-4-5"]`

Minimal Example

JSON Request

{ "model": "gpt-4o", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain API gateways in one sentence."} ] }

Response

On success the gateway returns the provider's response wrapped in the standard OpenAI response format:

JSON Response (200 OK)

{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1740000000, "model": "gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "An API gateway is a single entry point that routes client requests to one or more backend services." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 28, "completion_tokens": 22, "total_tokens": 50 }, "x-gateway": { "provider": "openai", "latency_ms": 312, "routed_model": "gpt-4o" } }

The extra x-gateway field in the response contains routing metadata: which provider served the request and the measured round-trip latency. This field is always present and does not affect OpenAI SDK compatibility.

Streaming

Setting "stream": true switches the response to Server-Sent Events (SSE). The gateway normalizes the event format across all providers — clients receive the same data: {...} chunks regardless of which provider handles the request.

SSE Stream — individual chunks

data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","choices":[{"delta":{"content":"An "},"index":0}]} data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","choices":[{"delta":{"content":"API "},"index":0}]} data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","choices":[{"delta":{"content":"gateway"},"index":0}]} data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop","index":0}]} data: [DONE]

Python — streaming with OpenAI SDK

import openai client = openai.OpenAI( base_url="https://arkonova.network/gateway/v1", api_key="ark-..." ) stream = client.chat.completions.create( model="claude-sonnet-4-6", messages=[{"role": "user", "content": "Tell me a short story."}], stream=True ) for chunk in stream: content = chunk.choices[0].delta.content if content: print(content, end="", flush=True)

Supported Models

Pass the model ID in the model field. The gateway resolves the provider automatically. Aliases (e.g. gpt-4o) are forwarded as-is; the gateway infers the provider from the model name prefix.

OpenAI

gpt-4o gpt-4o-mini gpt-4-turbo o1 o3 text-embedding-3-small

Anthropic

claude-opus-4-6 claude-sonnet-4-6 claude-haiku-4-5 claude-3-5-sonnet claude-3-5-haiku

Google

gemini-2.0-flash gemini-2.0-pro gemini-1.5-flash gemma-3

Custom / Self-Hosted

custom:<model-id> ollama:<model-id>

For custom/self-hosted endpoints, prefix the model ID with custom: and configure the target URL in your dashboard. Any OpenAI-compatible endpoint (Together AI, Fireworks, local Ollama) can be registered this way.

Routing & Fallback

The gateway supports two routing strategies:

Direct — request goes to the provider that matches the model field.
Fallback chain — if the primary model fails (provider error, rate limit, timeout), the gateway retries the next model in the x-fallback list.

Fallback Chain

Include an x-fallback field in the request body with an ordered list of backup model IDs. If the primary model returns a 429, 500, or times out, the gateway automatically retries with the next fallback — transparently, with no change to the response format.

JSON — with fallback chain

{ "model": "gpt-4o", "x-fallback": ["claude-sonnet-4-6", "gpt-4o-mini"], "messages": [{"role": "user", "content": "Hello"}] }

A common pattern is to put a frontier model first and a fast/cheap model last as the catch-all fallback. This maximizes quality while keeping availability at 100%.

Policy-Based Routing (Dashboard)

In the API key settings you can define a routing policy that applies to all requests from that key:

Cost-optimized — route to the cheapest model that meets your quality threshold.
Latency-optimized — always pick the fastest provider at the time of the request.
Provider-pinned — lock a key to a specific provider regardless of model ID.

API Keys & Quotas

Each ark- key has independently configurable limits:

Setting	Description
model_allowlist	Whitelist of model IDs this key may request. Requests for unlisted models are rejected with `403`.
token_quota	Monthly token budget (prompt + completion). Requests exceeding the quota return `429`.
rpm_limit	Max requests per minute. Excess requests return `429` with a `Retry-After` header.
ip_allowlist	Optional list of allowed source IPs in CIDR notation. Requests from other IPs are rejected with `403`.
provider_lock	Force all requests through a single provider regardless of the model field.

Error Codes

The gateway returns standard HTTP status codes. Error bodies follow the OpenAI error format:

Error Response Body

{ "error": { "message": "Model 'gpt-5' is not in the allowlist for this API key.", "type": "invalid_request_error", "code": "model_not_allowed" } }

Status	Code	Description
400	invalid_request_error	Malformed request body — missing required fields or invalid types.
401	authentication_error	Missing or invalid Authorization header.
403	model_not_allowed	The requested model is not in the key's allowlist, or IP is not whitelisted.
429	quota_exceeded / rate_limited	Token quota exhausted or RPM limit hit. Check `Retry-After` header.
502	provider_error	Upstream provider returned an error and all fallbacks were exhausted.
504	provider_timeout	Upstream provider did not respond within the timeout window (30 s default).

Examples

Python — OpenAI SDK

Python

import openai client = openai.OpenAI( base_url="https://arkonova.network/gateway/v1", api_key="ark-..." ) # Non-streaming response = client.chat.completions.create( model="claude-sonnet-4-6", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ] ) print(response.choices[0].message.content)

Python — with fallback chain

Python

import requests resp = requests.post( "https://arkonova.network/gateway/v1/chat/completions", headers={"Authorization": "Bearer ark-..."}, json={ "model": "gpt-4o", "x-fallback": ["claude-sonnet-4-6", "gpt-4o-mini"], "messages": [{"role": "user", "content": "Hello!"}] } ) data = resp.json() print(data["choices"][0]["message"]["content"]) print("Served by:", data["x-gateway"]["provider"])

curl

Shell

curl https://arkonova.network/gateway/v1/chat/completions \ -H "Authorization: Bearer ark-..." \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }'

Node.js — OpenAI SDK

JavaScript

import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://arkonova.network/gateway/v1", apiKey: "ark-...", }); const response = await client.chat.completions.create({ model: "gemini-2.0-flash", messages: [{ role: "user", content: "Hello!" }], }); console.log(response.choices[0].message.content);

LangChain (Python)

Python

from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="claude-sonnet-4-6", openai_api_base="https://arkonova.network/gateway/v1", openai_api_key="ark-..." ) result = llm.invoke("Explain what an API gateway does.") print(result.content)