For finance, procurement, and engineering leadership

What would your LLM bill look like on India-hosted infrastructure?

Enter your current monthly burn and we'll estimate the spend-repatriation saving from routing OSS-eligible workloads to India-hosted Llama / DeepSeek / Qwen, with the remainder still served by Claude / Gemini via in-region endpoints. The math updates live; nothing leaves your browser.

Current monthly LLM spend (USD)

Roughly what you spend across all your LLM usage per month today.

% of that on frontier models (GPT-4, Claude Sonnet, Gemini Pro)

30%

The rest is assumed to be mini-class (GPT-4o-mini, Haiku, Flash).

Industry

DPDP + sector residency rules push routing to in-region OSS regardless of cost.

Calculations are illustrative. Assumes India-hosted Llama 3.3 70B at ₹30/M tokens (Relay list price) for OSS-routable workloads. Frontier workloads stay on your current vendor. INR/USD: 83.

Estimated monthly saving

₹0

That's 0% off your current bill. Annualised: ₹0.

Current monthly spend₹1.66 L

With Relay routing₹4.32 L

Implied token volume3578M / mo

Routed to India OSS (85%)3042M tokens

Stays on frontier537M tokens

Book a 30-minute call →

Or email us directly for a custom estimate with your real workload mix.

How we compute it

Your USD spend is reconstructed into implied tokens using the published rates for the model classes you indicate. We assume Relay routes a workload-appropriate share to India-hosted Llama 3.3 70B at our list price (₹30/M tokens). The frontier slice stays on your existing vendor; the saving is the difference.

Where the OSS-share number comes from

Industry presets reflect what our design-partner pilots are actually routing: BFSI 85% (DPDP + sector rules force the decision), healthcare 90% (patient data residency), AI-native 60% (more demanding mix). You can override by changing the industry to match your reality.

What the calculator does NOT include

Productivity gains from faster latency in-region. The cost of compliance / DPDP / RBI scrutiny if you stay on US-hosted endpoints. Engineering effort to migrate (Relay is drop-in for OpenAI / Anthropic SDK callers; budget ~30 min per service). Routing decisions made per-request can shift OSS-share higher than the preset.

Want a custom estimate with your real workload mix?

Send us your top three workload types and approximate per-month volume. We'll come back within 48 hours with a per-workload routing recommendation and a tighter saving estimate, under NDA if needed.

Request a tailored estimate →See enterprise tier →