Skip to main content

Docker

Option A: Standalone ContextPilot Server

Run ContextPilot as its own container, install the hook into your existing engine container separately.

Build & Run

docker build -t contextpilot -f docker/Dockerfile .
docker run -p 8765:8765 contextpilot --infer-api-url http://<engine-host>:30000

Install the hook in your engine container

One-liner — no ContextPilot clone needed:

# Inside your SGLang/vLLM container:
curl -sL https://raw.githubusercontent.com/EfficientContext/ContextPilot/main/contextpilot/install_standalone.py | python3 -

Then launch the engine with CONTEXTPILOT_INDEX_URL pointing at the CP server:

CONTEXTPILOT_INDEX_URL=http://<contextpilot-host>:8765 python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --port 30000

Or add it to your engine Dockerfile:

RUN curl -sL https://raw.githubusercontent.com/EfficientContext/ContextPilot/main/contextpilot/install_standalone.py | python3 -
ENV CONTEXTPILOT_INDEX_URL=http://<contextpilot-host>:8765

Option B: All-in-One (Engine + ContextPilot)

Single container with both the engine and ContextPilot server.

Build

docker build -t contextpilot-sglang -f docker/Dockerfile.sglang .
docker build -t contextpilot-vllm -f docker/Dockerfile.vllm .

Pin a specific engine version:

docker build -t contextpilot-sglang -f docker/Dockerfile.sglang --build-arg SGLANG_VERSION=v0.5.0 .
docker build -t contextpilot-vllm -f docker/Dockerfile.vllm --build-arg VLLM_VERSION=v0.8.5 .

Run

SGLang:

docker run --gpus all --shm-size 32g --ipc=host \
-p 30000:30000 -p 8765:8765 \
-e HF_TOKEN=$HF_TOKEN \
contextpilot-sglang \
--model-path meta-llama/Llama-3.1-8B-Instruct --schedule-policy lpm

vLLM:

docker run --gpus all --ipc=host \
-p 8000:8000 -p 8765:8765 \
-e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN \
contextpilot-vllm \
Qwen/Qwen2.5-7B-Instruct --enable-prefix-caching

Everything after the image name is passed to the engine. Defaults are Qwen/Qwen2.5-7B-Instruct for both images.

GPU Selection

docker run --gpus '"device=2,3"' ...

Environment Variables

VariableDefaultDescription
CONTEXTPILOT_PORT8765ContextPilot HTTP server port
SGLANG_PORT30000SGLang serving port (all-in-one only)
VLLM_PORT8000vLLM serving port (all-in-one only)
HF_TOKEN--HuggingFace token (SGLang)
HUGGING_FACE_HUB_TOKEN--HuggingFace token (vLLM)

Verify

curl http://localhost:8765/health          # ContextPilot
curl http://localhost:30000/health # SGLang
curl http://localhost:8000/health # vLLM

Architecture

All-in-one images: The entrypoint starts the ContextPilot HTTP server in the background, then execs the engine as PID 1. docker stop sends SIGTERM to the engine for graceful shutdown. The .pth hook auto-activates monkey-patching since CONTEXTPILOT_INDEX_URL is set in the image.

Standalone image: Runs only the ContextPilot server as PID 1. The hook is installed separately in the engine environment via the one-liner above.