AlphaChat — Run the Impossible. Privately.

1. Install AlphaChat

# Download from alphachat.com (coming soon)
# Or build from source — see GitHub

Installs to ~/.alphachat/bin/. No root required. Supports Linux (x64/arm64), macOS (Intel/Apple Silicon), Windows (x64).

2. Run Your First Model

# Run a small model (4B — works on any GPU with 4GB+ VRAM)
alphachat run qwen3.5:4b "hello"

# Run a MoE model (35B MoE — 116 tok/s on RTX 3090)
alphachat run qwen3.5:35b "explain quantum computing"

# Run a 397B model on a single RTX 3090 (40 tok/s)
alphachat run qwen3.5:397b "write a poem"

# Interactive chat
alphachat chat

AlphaLlama automatically downloads the model and optimizes it for your GPU. Models that don't fit in VRAM are handled automatically — no configuration needed.

3. Browse Available Models

# List all available models
alphachat list

# Pull a specific model
alphachat pull qwen3.5:35b

See the full list on the Supported Models page.

4. Start a Local API Server

# Start an Ollama + OpenAI compatible API server
alphachat serve

# OpenAI-compatible endpoint (port 11434)
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.5:35b","messages":[{"role":"user","content":"hello"}]}'

# Ollama-compatible endpoints also work:
# /api/chat, /api/tags, /api/ps, /api/show

Your local API server is OpenAI-compatible. Point any app that uses the OpenAI SDK to

5. Tools & Updates

# Update to the latest engine version
alphachat update

# Show system info
alphachat info

# Benchmark a model
alphachat bench qwen3.5:4b

6. Pricing

docs.freeIfFits docs.paidIfExceeds

docs.yourGpu	VRAM	docs.freeUpTo
RTX 3090 / 4090	24 GB	~40B dense / 397B MoE
A100 / H100	80 GB	~130B dense
docs.biggerModels	1/6 docs.cloudCostPerHour

470x cheaper than H100 farms. Full pricing details.