Skip to main content

Benchmarks

Performance benchmarks for ContextPilot.

GPU vs CPU Performance

ContextPilot supports both GPU and CPU for distance computation in context index construction.

Test Configuration

  • GPU: NVIDIA A6000
  • CPU: AMD EPYC 7313P 16-Core
  • Metric: Time to compute pairwise distances for context clustering

Results

ContextsGPU Time (s)CPU Time (s)Speedup
640.22 ± 0.300.20 ± 0.000.89x
1280.02 ± 0.000.28 ± 0.0017.40x
5120.05 ± 0.001.02 ± 0.0120.51x
4,0000.89 ± 0.0552.02 ± 0.0658.43x
8,0003.19 ± 0.22211.45 ± 1.1266.27x
12,0006.77 ± 0.45490.91 ± 7.9872.48x
100,000687.64 ± 0.02N/AN/A

Key Findings

  • GPU performance advantage scales with problem size
  • At 64 contexts, CPU is slightly faster (0.89x) due to GPU overhead
  • Crossover point: ~100-128 contexts
  • At 12k contexts: 72x speedup with GPU

Deployment Recommendations

ScenarioRecommendedRationale
< 128 contextsCPUGPU overhead exceeds computation benefit
≥ 128 contextsGPU17-72x speedup for batch processing
Production workloadsGPUCritical for high-throughput requirements
Development/testingCPUSimpler setup, no GPU dependency

End-to-End Performance

When integrated with SGLang or vLLM:

MetricImprovement
Cache hit rate4-13x
Prefill latency1.5-3.5x reduction
AccuracyMaintained or improved

See the main README for accuracy benchmarks on MT-RAG.


Running Your Own Benchmarks

# GPU vs CPU distance computation
python tests/test_gpu_distance_performance.py

# Full benchmark suite (scaling, clustering, scheduling)
python scripts/benchmark.py

# Quick benchmark with smaller sizes
python scripts/benchmark.py --quick

# Include GPU benchmarks
python scripts/benchmark.py --gpu

# Custom context sizes
python scripts/benchmark.py --sizes 100 500 1000 2000