How to run Claude Code with Ozeki AI Gateway
This article explains how to run Claude Code using a local MiniMax-M2.1 model via the vLLM Anthropic API endpoint. It details the required hardware, installation steps for vLLM and the model, and configuration of Claude Code to connect to the local server. Following the guide enables developers to leverage a high‑performance local model for AI‑assisted coding without relying on external services.
You can run Claude Code with your own local MiniMax-M2.1 model using vLLM's native Anthropic API endpoint support.
Hardware Used
| Component | Specification |
|---|---|
| CPU | AMD Ryzen 9 7950X3D 16-Core Processor |
| Motherboard | ROG CROSSHAIR X670E HERO |
| GPU | Dual NVIDIA RTX Pro 6000 (96 GB VRAM each) |
| RAM | 192 GB DDR5 5200 (note the model does not use the RAM, it fits into VRAM entirely) |
Install vLLM Nightly
Prerequisite: Ubuntu 24.04 and the proper NVIDIA drivers
mkdir vllm-nightly
cd vllm-nightly
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install -U vllm \
--torch-backend=auto \
--extra-index-url https://wheels.vllm.ai/nightly
Download MiniMax-M2.1
Set up a separate environment for downloading models:
mkdir /models cd /models uv venv --python 3.12 --seed source .venv/bin/activate pip install huggingface_hub
Download the AWQ-quantized MiniMax-M2.1 model:
mkdir /models/awq
huggingface-cli download cyankiwi/MiniMax-M2.1-AWQ-4bit \
--local-dir /models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit
Start vLLM Server
From your vLLM environment, launch the server with the Anthropic-compatible endpoint:
cd ~/vllm-nightly
source .venv/bin/activate
vllm serve \
/models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit \
--served-model-name MiniMax-M2.1-AWQ \
--max-num-seqs 10 \
--max-model-len 128000 \
--gpu-memory-utilization 0.95 \
--tensor-parallel-size 2 \
--pipeline-parallel-size 1 \
--enable-auto-tool-choice \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--trust-remote-code \
--host 0.0.0.0 \
--port 8000
The server exposes /v1/messages (Anthropic-compatible) at http://localhost:8000.
Install Claude Code
Install Claude Code on macOS, Linux, or WSL:
curl -fsSL https://claude.ai/install.sh | bash
See the official Claude Code documentation for more details.
Configure Claude Code
Create settings.json
Create or edit ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:8000",
"ANTHROPIC_AUTH_TOKEN": "dummy",
"API_TIMEOUT_MS": "3000000",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"ANTHROPIC_MODEL": "MiniMax-M2.1-AWQ",
"ANTHROPIC_SMALL_FAST_MODEL": "MiniMax-M2.1-AWQ",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "MiniMax-M2.1-AWQ",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "MiniMax-M2.1-AWQ",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "MiniMax-M2.1-AWQ"
}
}
Skip Onboarding (Workaround for Bug)
Due to a known bug in Claude Code 2.0.65+, fresh installs may ignore settings.json during onboarding. Add hasCompletedOnboarding to ~/.claude.json:
# If ~/.claude.json doesn't exist, create it:
echo '{"hasCompletedOnboarding": true}' > ~/.claude.json
# If it exists, add the field manually or use jq:
jq '. + {"hasCompletedOnboarding": true}' ~/.claude.json > tmp.json && mv tmp.json ~/.claude.json
Run Claude Code
With vLLM running in one terminal, open another and run:
claude
Claude Code will now use your local MiniMax-M2.1 model! If you also want to configure the Claude Code VSCode extension, see here.