mem0 + ContextPilot LoCoMo Benchmark

This example measures TTFT and answer accuracy (token-F1, LLM judge) with and without ContextPilot context reordering, using mem0 as the memory backend and an OpenAI-compatible inference engine (SGLang or vLLM).

Mem0 is an intelligent memory layer that facilitates memory storage and retrieval for agents.
Locomo is a long conversation benchmark used to test memory retrieval.

mem0_locomo_diagram

Setup

pip install mem0ai openai tqdm

# Install your inference engine:
# SGLang:
pip install "sglang>=0.5"
# or vLLM:
pip install vllm

Start servers

python -m contextpilot.server.http_server --port 8765

In a separate terminal, start your inference engine:

export CONTEXTPILOT_INDEX_URL=http://localhost:8765

# SGLang:
python -m sglang.launch_server --model <model> --port 30000
# or vLLM:
python -m vllm.entrypoints.openai.api_server --model <model> --port 30000 --enable-prefix-caching

Run

export OPENAI_API_KEY=<your API key>
python examples/mem0_locomo_example.py

Environment variables

Variable	Default	Description
`INFERENCE_URL`	`http://localhost:30000`	Inference engine endpoint (also accepts `SGLANG_URL` for backwards compatibility)
`CONTEXTPILOT_URL`	`http://localhost:8765`	ContextPilot server endpoint
`JUDGE_MODEL`	`gpt-4.1-2025-04-14`	OpenAI model for LLM judge
`LOCOMO_CONV_INDEX`	`0`	Which LoCoMo conversation to use
`LOCOMO_MAX_QA`	`150`	Max QA pairs to evaluate
`LOCOMO_MAX_TOKENS`	`32`	Max generation tokens
`LOCOMO_NUM_TURNS`	`150`	Multi-turn conversation length
`LOCOMO_TOP_K_LIST`	`20,100`	Comma-separated top-k values to benchmark

Results

LoCoMo conv 0, 102 memories, 150 turns:

k	mode	ttft	judge
20	baseline	0.0377s	0.440
20	reorder	0.0315s	0.460
100	baseline	0.1012s	0.437
100	reorder	0.0554s	0.420

General usage

Store and retrieve memories

from contextpilot.retriever import Mem0Retriever

retriever = Mem0Retriever(config={
    "llm": {"provider": "openai", "config": {"model": "gpt-4.1-mini-2025-04-14"}},
    "embedder": {"provider": "openai", "config": {"model": "text-embedding-3-small"}},
})

retriever.add_memory(
    [{"role": "user", "content": "I'm allergic to peanuts"},
     {"role": "assistant", "content": "Noted."}],
    user_id="user123",
)

results = retriever.search_queries(
    query_data=[{"text": "dietary restrictions?"}],
    user_id="user123", top_k=20,
)
corpus_map = retriever.get_corpus_map()

Reorder with the library

import contextpilot as cp

contexts = [r["top_k_doc_id"] for r in results]
engine = cp.ContextPilot(use_gpu=False)
reordered, order = engine.reorder(contexts)

Reorder via the server (enables KV-cache tracking)

import requests

requests.post("http://localhost:8765/reset")
resp = requests.post("http://localhost:8765/reorder", json={
    "contexts": contexts,
    "use_gpu": False,
    "linkage_method": "average",
    "alpha": 0.001,
}).json()

reordered = resp["reordered_contexts"]  # reordered doc ID lists

Multi-turn

Just call /reorder each turn — ContextPilot auto-detects whether the index exists and extends it incrementally:

for turn, query in enumerate(queries):
    results = retriever.search_queries(
        query_data=[{"text": query}], user_id="user123", top_k=20)
    resp = requests.post("http://localhost:8765/reorder", json={
        "contexts": [results[0]["top_k_doc_id"]],
        "use_gpu": False,
        "linkage_method": "average",
        "alpha": 0.0005,
    }).json()
    reordered_ids = resp["reordered_contexts"][0]

Setup​

Start servers​

Run​

Environment variables​

Results​

General usage​

Store and retrieve memories​

Reorder with the library​

Reorder via the server (enables KV-cache tracking)​

Multi-turn​