Multi-Turn Conversations

ContextPilot supports efficient multi-turn conversations with automatic context deduplication.

How It Works

Turn 1: Retrieve documents, reorder, and register in conversation history
Turn 2+: Identify overlapping documents, replace with location hints
Result: 30-60% reduction in redundant document processing

Turn 1: "What is ML?"
  └─ Retrieves: [doc_1, doc_2, doc_3, doc_4, doc_5]
  └─ engine.reorder(docs, conversation_id="user_42")
  └─ Full context sent to LLM

Turn 2: "How does it differ from DL?"
  └─ Retrieves: [doc_2, doc_3, doc_6, doc_7, doc_8]
  └─ engine.deduplicate(docs, conversation_id="user_42")
  └─ Overlapping: [doc_2, doc_3] → replaced with location hints
  └─ Only [doc_6, doc_7, doc_8] sent as full text

Safety: `conversation_id` is Required

.deduplicate() requires an explicit conversation_id. This prevents accidental cross-contamination between concurrent users sharing the same ContextPilot instance.

# ✅ Correct — each user has their own conversation_id
engine.deduplicate(docs, conversation_id="user_42")

# ❌ Error — missing conversation_id
engine.deduplicate(docs)  # TypeError: missing required argument

# ❌ Error — no prior .reorder() for this conversation
engine.deduplicate(docs, conversation_id="unknown")  # ValueError

Python API (Recommended)

Basic Multi-Turn Flow

import contextpilot as cp

engine = cp.ContextPilot(use_gpu=False)

# ═══════════════════════════════════════════════════════════
# Turn 1: Reorder and register docs under a conversation
# ═══════════════════════════════════════════════════════════
turn1_docs = [[4, 3, 1]]
reordered, indices = engine.reorder(turn1_docs, conversation_id="user_42")

# Send reordered docs to LLM...

# ═══════════════════════════════════════════════════════════
# Turn 2: Deduplicate — lightweight, no index operations
# ═══════════════════════════════════════════════════════════
turn2_docs = [[4, 3, 2]]
results = engine.deduplicate(turn2_docs, conversation_id="user_42")

r = results[0]
print(f"Overlapping: {r['overlapping_docs']}")  # [4, 3] — already sent
print(f"New docs:    {r['new_docs']}")          # [2]    — send this
print(f"Hints:       {r['reference_hints']}")   # hints for the LLM

# Build prompt with only new docs + reference hints for overlapping ones

# ═══════════════════════════════════════════════════════════
# Turn 3+: Continue the chain (history accumulates)
# ═══════════════════════════════════════════════════════════
turn3_docs = [[4, 2, 5]]
results3 = engine.deduplicate(turn3_docs, conversation_id="user_42")
r3 = results3[0]
print(f"Overlap: {r3['overlapping_docs']}")  # [4, 2] — from turns 1 & 2
print(f"New:     {r3['new_docs']}")          # [5]

Multiple Concurrent Users

Each user's history is isolated by conversation_id:

engine = cp.ContextPilot(use_gpu=False)

# User A Turn 1
engine.reorder([[1, 2, 3]], conversation_id="user_a")

# User B Turn 1 (completely independent)
engine.reorder([[10, 20, 30]], conversation_id="user_b")

# User A Turn 2 — only sees user A's history
results_a = engine.deduplicate([[1, 5]], conversation_id="user_a")
assert results_a[0]["overlapping_docs"] == [1]   # from user A's turn 1
assert results_a[0]["new_docs"] == [5]

# User B Turn 2 — only sees user B's history
results_b = engine.deduplicate([[10, 40]], conversation_id="user_b")
assert results_b[0]["overlapping_docs"] == [10]  # from user B's turn 1
assert results_b[0]["new_docs"] == [40]

Batch Deduplication

Deduplicate multiple contexts in a single call:

engine.reorder([[1, 2, 3, 4]], conversation_id="user_42")

# Multiple contexts in one deduplicate call
results = engine.deduplicate(
    [[1, 5], [3, 6]],
    conversation_id="user_42",
)

# results[0]: overlapping=[1], new=[5]
# results[1]: overlapping=[3], new=[6]

Custom Hint Templates

Customize reference hints to match your prompt format:

results = engine.deduplicate(
    [[1, 5]],
    conversation_id="user_42",
    hint_template="[See Doc {doc_id} from previous context]",
)
# hints: ["[See Doc 1 from previous context]"]

String Contexts

Works with string documents too:

engine.reorder(
    [["Photosynthesis converts sunlight...", "Chlorophyll absorbs light..."]],
    conversation_id="session_1",
)

results = engine.deduplicate(
    [["Photosynthesis converts sunlight...", "Mitosis is cell division..."]],
    conversation_id="session_1",
)
r = results[0]
print(r["overlapping_docs"])  # ["Photosynthesis converts sunlight..."]
print(r["new_docs"])          # ["Mitosis is cell division..."]

Pipeline API

For applications using RAGPipeline, deduplication is built in:

from contextpilot.pipeline import RAGPipeline, InferenceConfig

pipeline = RAGPipeline(
    retriever="bm25",
    corpus_path="corpus.jsonl",
    use_contextpilot=True,
    inference=InferenceConfig(
        model_name="Qwen/Qwen2.5-7B-Instruct",
        base_url="http://localhost:30000"
    )
)

conversation_id = "user_session_123"

# Turn 1
result1 = pipeline.process_conversation_turn(
    conversation_id=conversation_id,
    query="What is machine learning?",
    top_k=10,
    enable_deduplication=True,
    generate_response=True
)

# Turn 2 — overlapping docs are deduplicated
result2 = pipeline.process_conversation_turn(
    conversation_id=conversation_id,
    query="How does it differ from deep learning?",
    top_k=10,
    enable_deduplication=True,
    generate_response=True
)

# Reset conversation when done
pipeline.reset_conversation(conversation_id)

HTTP Server API

For production systems using the HTTP server, the /deduplicate endpoint provides the same functionality with request-ID-based conversation tracking.

Workflow

┌─────────────────────────────────────────────────────────────────────┐
│                         Multi-Turn Flow                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Turn 1:  POST /reorder                                             │
│           ├─ Build index                                            │
│           ├─ Register in conversation tracker                       │
│           └─ Return request_id (for linking turns)                  │
│                                                                     │
│  Turn 2+: POST /deduplicate  ← Lightweight! No index ops           │
│           ├─ Look up conversation history by parent_request_id      │
│           ├─ Find overlapping documents                             │
│           ├─ Generate reference hints                               │
│           └─ Return deduplicated context                            │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Example

import requests

INDEX_SERVER = "http://localhost:8765"

# Turn 1: POST /reorder
turn1_response = requests.post(
    f"{INDEX_SERVER}/reorder",
    json={"contexts": [[4, 3, 1]]},
).json()

turn1_request_id = turn1_response["request_ids"][0]

# Turn 2: POST /deduplicate (lightweight)
turn2_response = requests.post(
    f"{INDEX_SERVER}/deduplicate",
    json={
        "contexts": [[4, 3, 2]],
        "parent_request_ids": [turn1_request_id],
    }
).json()

result = turn2_response["results"][0]
print(f"Overlapping: {result['overlapping_docs']}")  # [4, 3]
print(f"New docs:    {result['new_docs']}")           # [2]

Why Use `/deduplicate` for Turn 2+?

Operation	`/reorder`	`/deduplicate`
Index build	✓	✗
Clustering	✓	✗
Search	✓	✗
Deduplication	✓	✓
Latency	~50-200ms	~1-5ms

For multi-turn conversations, Turn 2+ typically doesn't need index operations — just deduplication against conversation history. The /deduplicate endpoint is 10-100x faster.

Managing Conversations

Python API

Conversation state is tracked internally per conversation_id. No explicit reset needed — just use different IDs for different sessions.

Pipeline API

# Reset a specific conversation (frees memory)
pipeline.reset_conversation(conversation_id)

# Or reset all conversations
pipeline.reset_all_conversations()

HTTP Server API

# Reset all conversations and index
requests.post(f"{INDEX_SERVER}/reset")

Next Steps

API Reference - Full API documentation including HTTP endpoints

How It Works​

Safety: conversation_id is Required​

Python API (Recommended)​

Basic Multi-Turn Flow​

Multiple Concurrent Users​

Batch Deduplication​

Custom Hint Templates​

String Contexts​

Pipeline API​

HTTP Server API​

Workflow​

Example​

Why Use /deduplicate for Turn 2+?​

Managing Conversations​

Python API​

Pipeline API​

HTTP Server API​

Next Steps​

How It Works

Safety: `conversation_id` is Required

Python API (Recommended)

Basic Multi-Turn Flow

Multiple Concurrent Users

Batch Deduplication

Custom Hint Templates

String Contexts

Pipeline API

HTTP Server API

Workflow

Example

Why Use `/deduplicate` for Turn 2+?

Managing Conversations

Python API

Pipeline API

HTTP Server API

Next Steps