R
Relayby Ai5labs
v0.1.0 live on PyPI

Every LLM,
one interface.

Production-grade Python library for routing across every major LLM provider. Define your model catalog in YAML; chat, stream, tool-call, and structure-output against OpenAI, Anthropic, Bedrock, Vertex, Gemini, Groq, and 12 more — without rewriting your code.

pip install ai5labs-relay
View on GitHub →
13
providers in v0.1 (12 OpenAI-compat + native Anthropic)
436
models in the catalog
5–19×
faster cold start vs LiteLLM
Apache-2.0
OSS, with patent grant

Built for production from day one

The features OSS gateways treat as "v2" — observability, audit logging, PII redaction, governance — ship in v0.1 of Relay.

MCP universal tool layer

Connect any Model Context Protocol server (GitHub, Slack, Postgres, Playwright) and use its tools against any provider — including providers without native MCP support.

Cross-provider tool compiler

JSON Schema once → compiles to OpenAI strict and Anthropic shapes today; Gemini, Bedrock, Cohere shapes ready for v0.2 native adapters. Mastra-style instruction injection for unsupported keywords.

Pydantic structured output

Pass a Pydantic model; get a validated instance back. Works against every provider. Auto-retry on validation failure.

Cost tracking with provenance

Every response carries source, confidence, and fetched_at. Live AWS Pricing / Azure Retail / OpenRouter, falling back to a maintained snapshot.

Circuit breakers + classified retries

Distinguish rate-limit, context-window, content-policy, and auth errors. Each gets the right behavior: retry, fall-back, or fail-fast.

OpenTelemetry GenAI

Native gen_ai.* spans, token-usage histograms, cost histograms. Works with Datadog, Honeycomb, Langfuse, Arize, Phoenix out of the box.

PII redaction + audit logs

Regex / Presidio redaction before the prompt leaves your process. Structured audit events to pluggable sinks (file, S3, Splunk, callback).

Caching done right

Hub-level exact-match cache plus Anthropic prompt-cache passthrough via CacheHint markers. Compose them; users decide.

400+ models, ranked

relay models compare sonnet 4o flash. relay models recommend --task code --budget cheap. Pick the right model with public benchmarks side-by-side.

Define once, route anywhere

Your model catalog lives in version-controlled YAML. Aliases likesmart andfast in code; the actual provider, model id, credentials, and routing strategy live in one file your team can review.

YAML
# models.yaml
version: 1

models:
  fast:   { target: groq/llama-3.3-70b-versatile,
            credential: $env.GROQ_API_KEY }
  smart:  { target: anthropic/claude-sonnet-4-5,
            credential: $env.ANTHROPIC_API_KEY }
  vision: { target: openai/gpt-4o-mini,
            credential: $env.OPENAI_API_KEY }

groups:
  default:
    strategy: fallback
    members: [smart, fast]
Python
from relay import Hub

async with Hub.from_yaml("models.yaml") as hub:
    resp = await hub.chat(
        "smart",
        messages=[{"role": "user", "content": "What is 2+2?"}],
    )
    print(resp.text, resp.cost_usd)

From pip install to first call in 60 seconds

Pydantic-typed responses, async-native, with cost provenance attached. Strict typing under mypy --strict. Tested on Python 3.10–3.13.

Streaming that doesn't lose tool-call deltas

Tool-call argument fragments are merged by index, not id — fixing theLiteLLM #20711 bug out of the gate. Hypothesis property tests verify the invariant.

Python
async for ev in hub.stream("smart", messages=[...]):
    if ev.type == "text_delta":
        print(ev.text, end="", flush=True)
    elif ev.type == "thinking_delta":      # Anthropic extended thinking
        ...
    elif ev.type == "end":
        print(f"\n[{ev.response.latency_ms:.0f}ms, "
              f"${ev.response.cost_usd:.4f}]")
Python
from relay import Hub
from relay.mcp import MCPManager

mcp = MCPManager()
await mcp.add_stdio(
    "github",
    command="npx",
    args=["-y", "@modelcontextprotocol/server-github"],
)

hub = Hub.from_yaml("models.yaml")
hub.attach_mcp(mcp)

# Tools from any MCP server work against ANY provider
tools = await hub.mcp_tools()
resp = await hub.chat("smart", messages=[...], tools=tools)

MCP servers + every model = your differentiator

Other gateways force you to pick MCP-aware providers. Relay translates MCP tool schemas into each provider's native shape, so a GitHub MCP server works against Bedrock Claude as easily as against OpenAI.

Pick the right model

The catalog ships with the library — 434 models, with public benchmark scores where the provider has published them. Rankings below are the 10 frontier models with full scores, sorted by composite quality index.

ModelQualityIn / Out per 1M
openai/o185$15.00 / $60.00
anthropic/claude-opus-4-580$15.00 / $75.00
google/gemini-2.5-pro80$1.25 / $10.00
openai/o3-mini78$1.10 / $4.40
deepseek/deepseek-reasoner76$0.55 / $2.19
anthropic/claude-sonnet-4-573$3.00 / $15.00
deepseek/deepseek-chat72$0.32 / $0.89
xai/grok-372$3.00 / $15.00
openai/gpt-4o71$2.50 / $10.00
anthropic/claude-3-5-sonnet-2024102270$3.00 / $15.00

Sourced from each provider's published numbers; verify before quoting. Browse all 434 models →

Two ways to route

Static, rule-based recommender ships free in the library. Per-query semantic routing — looking at the prompt and choosing the model automatically — is a hosted-gateway feature.

OSS · freeshipping in v0.1

Static recommender

Filter the catalog by task, budget, and required capabilities. Deterministic, offline, no LLM in the loop — same answer every time for the same constraints.

Shell
# Top 5 cheap code models, JSON for an agent
relay models recommend \
    --task code --budget cheap \
    --limit 5 --json

# In Python
from relay.catalog import get_catalog
top = sorted(
    (r for r in get_catalog().values()
     if r.benchmarks),
    key=lambda r: r.benchmarks.quality_index or 0,
    reverse=True,
)[:5]

Free forever. Apache-2.0. Runs offline against the catalog snapshot.

Hosted · coming soondesign-partner waitlist

Per-query semantic router

Looks at the actual prompt and picks the model from its content — intent, length, structured-output requirements, tool-use patterns, language. Re-evaluated as new models ship.

  • Classify task per request, not per config
  • Cost / quality knob per route
  • A/B test routing rules
  • Eval results refreshed with each model release

Zero gateway overhead

Whichever model you pick, Relay shouldn't slow it down. Three runs against an identical mock backend, vs raw httpx and LiteLLM:

MetricRelayLiteLLMVerdict
Cold start (import)110–152 ms1,304–2,078 ms5–19× faster
Streaming TTFT p5013.4–14.6 ms15.4–18.6 ms~13–27% faster
Chat overhead p502.2–3.1 ms3.3–13.4 msTied / occasionally faster
Chat overhead p99 stability19–23 ms range23–41 ms rangeConsistent tail

Single machine, single Python version, 1000 chat req @ concurrency 20, 50 ms mock backend. Run yourself — full methodology + raw numbers in BENCHMARKS.md.

Pricing

The library is free forever. The hosted gateway is in design-partner mode — join the waitlist for early access.

OSS Library
Free

Apache-2.0. The full feature set. Run it in your own infrastructure with your own provider keys.

  • All 18 providers
  • MCP universal tool layer
  • Cost tracking + provenance
  • OpenTelemetry instrumentation
  • PII redaction + audit + guardrails
  • Community support via GitHub
Hosted Gateway
Coming soon

BYOK proxy with multi-tenant ops, plus a per-query semantic router that picks the model from the prompt.

  • Everything in OSS, plus:
  • Per-query semantic routing (auto-pick model)
  • Multi-tenant proxy (FastAPI)
  • Web dashboard
  • Virtual keys + per-team budgets
  • Distributed rate limiting
  • First customers free during private beta
Enterprise
Talk to us

VPC deployment, SOC 2 attestation, BAA / DPA paperwork, 24/7 SLA, custom features.

  • Everything in Hosted, plus:
  • Self-hosted in your VPC
  • SOC 2, BAA, DPA
  • 24/7 on-call
  • Roadmap influence
  • Sigstore-attested releases

Get early access to the hosted gateway

We're onboarding 10 design partners for the first cohort. Free during the program; influence the roadmap.

We'll only email you about Relay updates. Unsubscribe with one click.