Pricing

Three plans. GPU auto-detected. Unlimited queries on every plan.

Free

2M token context. Wikipedia knowledge base. Unlimited queries. No credit card.

What Your GPU Can Run

AlphaLlama runs models that normally need $45K+ in datacenter GPUs on your single consumer GPU.

Your GPUVRAMWhat runsExamples
RTX 3090 / 409024 GB~40B dense / 397B MoEQwen3.5 35B MoE (116 tok/s), Qwen3.5 27B, Gemma 27B
RTX 509032 GB~50B dense / 397B MoEQwen3.5 35B MoE, Mixtral 8x7B, Llama 4 Scout
A100 / H100 80GB80 GB~130B denseQwen3.5 122B MoE (54 tok/s), Llama 3.3 70B, Qwen3.5 72B
H200 141GB141 GB~230B denseLlama 4 Maverick, DeepSeek V4 Flash (236B)
B200 / MI300X 192GB192 GB~300B+ denseMost open-weight models fit natively

Business Tier

Professional/datacenter GPU detected? You're on Business tier — unlimited context at $0.30/MTok.

What's included

  • Unlimited context
  • API access (unlimited qps)
  • Multi-seat + admin panel
  • SSO + SLA (99.9%)
  • Custom private knowledge base

Research lab: $60/mo. Dev team (10): $750/mo. Enterprise (500): $30K/mo.

Get Started with Business