Quick Start
1. Install AlphaChat
# Download from alphachat.com (coming soon) # Or build from source — see GitHub
Installs to ~/.alphachat/bin/. No root required. Supports Linux (x64/arm64), macOS (Intel/Apple Silicon), Windows (x64).
2. Run Your First Model
# Run a small model (4B — works on any GPU with 4GB+ VRAM) alphachat run qwen3.5:4b "hello" # Run a MoE model (35B MoE — 116 tok/s on RTX 3090) alphachat run qwen3.5:35b "explain quantum computing" # Run a 397B model on a single RTX 3090 (40 tok/s) alphachat run qwen3.5:397b "write a poem" # Interactive chat alphachat chat
AlphaLlama automatically downloads the model and optimizes it for your GPU. Models that don't fit in VRAM are handled automatically — no configuration needed.
3. Browse Available Models
# List all available models alphachat list # Pull a specific model alphachat pull qwen3.5:35b
See the full list on the Supported Models page.
4. Start a Local API Server
# Start an Ollama + OpenAI compatible API server
alphachat serve
# OpenAI-compatible endpoint (port 11434)
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen3.5:35b","messages":[{"role":"user","content":"hello"}]}'
# Ollama-compatible endpoints also work:
# /api/chat, /api/tags, /api/ps, /api/showYour local API server is OpenAI-compatible. Point any app that uses the OpenAI SDK to
5. Tools & Updates
# Update to the latest engine version alphachat update # Show system info alphachat info # Benchmark a model alphachat bench qwen3.5:4b
6. Pricing
docs.freeIfFits docs.paidIfExceeds
| docs.yourGpu | VRAM | docs.freeUpTo |
|---|---|---|
| RTX 3090 / 4090 | 24 GB | ~40B dense / 397B MoE |
| A100 / H100 | 80 GB | ~130B dense |
| docs.biggerModels | 1/6 docs.cloudCostPerHour | |
470x cheaper than H100 farms. Full pricing details.
Fully private. When running locally, your data never leaves your device. No cloud, no network, no third parties. Just your GPU.