Pricing
Three plans. GPU auto-detected. Unlimited queries on every plan.
Free
2M token context. Wikipedia knowledge base. Unlimited queries. No credit card.
What Your GPU Can Run
AlphaLlama runs models that normally need $45K+ in datacenter GPUs on your single consumer GPU.
| Your GPU | VRAM | What runs | Examples |
|---|---|---|---|
| RTX 3090 / 4090 | 24 GB | ~40B dense / 397B MoE | Qwen3.5 35B MoE (116 tok/s), Qwen3.5 27B, Gemma 27B |
| RTX 5090 | 32 GB | ~50B dense / 397B MoE | Qwen3.5 35B MoE, Mixtral 8x7B, Llama 4 Scout |
| A100 / H100 80GB | 80 GB | ~130B dense | Qwen3.5 122B MoE (54 tok/s), Llama 3.3 70B, Qwen3.5 72B |
| H200 141GB | 141 GB | ~230B dense | Llama 4 Maverick, DeepSeek V4 Flash (236B) |
| B200 / MI300X 192GB | 192 GB | ~300B+ dense | Most open-weight models fit natively |
Business Tier
Professional/datacenter GPU detected? You're on Business tier — unlimited context at $0.30/MTok.
What's included
- ✓ Unlimited context
- ✓ API access (unlimited qps)
- ✓ Multi-seat + admin panel
- ✓ SSO + SLA (99.9%)
- ✓ Custom private knowledge base
Research lab: $60/mo. Dev team (10): $750/mo. Enterprise (500): $30K/mo.
Get Started with Business