calibrate

Auto-tune search parameters by probing your git history. Samples recent commits, builds queries from their diffs, and grid-sweeps search params to find the combination that best retrieves the changed files.

Usage

bobbin calibrate [OPTIONS] [PATH]

Examples

bobbin calibrate                          # Quick calibration (20 samples)
bobbin calibrate --apply                  # Apply best config to .bobbin/calibration.json
bobbin calibrate --full                   # Extended sweep (recency + coupling params)
bobbin calibrate --full --resume          # Resume interrupted full sweep
bobbin calibrate --bridge-sweep           # Sweep bridge params using calibrated core
bobbin calibrate -n 50 --since "1 year"   # More samples, wider time range
bobbin calibrate --repo myproject         # Calibrate a specific repo in multi-repo setup

Options

Flag	Short	Description
`--samples <N>`	`-n`	Number of commits to sample (default: 20)
`--since <RANGE>`		Time range to sample from, git format (default: “6 months ago”)
`--search-limit <N>`		Max results per probe. Omit to sweep [10, 20, 30, 40]
`--budget <N>`		Budget lines per probe. Omit to sweep [150, 300, 500]
`--apply`		Write best config to `.bobbin/calibration.json`
`--full`		Extended sweep: also tunes recency and coupling parameters
`--resume`		Resume an interrupted `--full` sweep from cache
`--bridge-sweep`		Sweep bridge_mode + bridge_boost_factor only
`--repo <NAME>`		Repo to calibrate (for multi-repo setups)
`--source <DIR>`		Override source path for git sampling
`--verbose`		Show detailed per-commit results

How It Works

Sample N recent commits from git history (stratified across the time range)
Build queries from commit messages — each message becomes a search probe
Grid-sweep parameter combinations across all configured dimensions
Score each combination by precision, recall, and F1 against ground truth (modified files)
Rank by F1 and report the best configuration

With --apply, writes the best params to .bobbin/calibration.json, which takes precedence over config.toml values at search time.

Sample selection

Commits are filtered before sampling:

Included: Commits with 2–30 changed files within the --since window
Excluded: Merge commits, reverts, and noise commits (prefixes: chore:, ci:, docs:, style:, build:, release:, bump, auto-merge, update dependency)
Sampling: Evenly-spaced picks across filtered candidates (stratified, not random)

If >50% of sampled commits have very short messages (<20 chars) or generic text (“fix”, “wip”, “temp”), calibration warns that accuracy may be reduced.

Scoring

Each probe scores the context bundle returned for a commit message query against the files actually modified in that commit:

precision = |injected ∩ truth| / |injected|
recall    = |injected ∩ truth| / |truth|
f1        = 2 × precision × recall / (precision + recall)

Configs are ranked by average F1 across all sampled commits.

Sweep Modes

Core sweep (default)

Sweeps 5 core parameter dimensions:

Parameter	Values
`semantic_weight`	0.0, 0.3, 0.5, 0.7, 0.9
`doc_demotion`	0.1, 0.3, 0.5
`search_limit`	10, 20, 30, 40 (or CLI override)
`budget_lines`	150, 300, 500 (or CLI override)
`rrf_k`	60.0 (fixed)

Total: 180 configs × N commits = ~3,600 probes at default 20 samples. Takes a few minutes.

Full sweep (`--full`)

Extends the core sweep with recency, coupling depth, and bridge parameters:

Additional parameter	Values
`recency_half_life_days`	7, 14, 30, 90
`recency_weight`	0.0, 0.15, 0.30, 0.50
`coupling_depth`	500, 2000, 5000, 20000
`bridge_mode`	Off, Inject, Boost, BoostInject
`bridge_boost_factor`	0.15, 0.3, 0.5

Total: ~960 configs × 4 coupling depths × N commits. Significantly longer (~15-30 min). Re-indexes coupling data per depth, so each depth is a separate probe run.

Bridge sweep (`--bridge-sweep`)

Requires an existing calibration.json from a prior core or full sweep. Uses the calibrated core params and only sweeps bridge mode + boost factor (7 configs). Very fast.

calibration.json

The output file contains:

{
  "calibrated_at": "2026-03-22T12:00:00Z",
  "snapshot": {
    "chunk_count": 5103,
    "file_count": 312,
    "primary_language": "rust",
    "repo_age_days": 180,
    "recent_commit_rate": 2.5
  },
  "best_config": {
    "semantic_weight": 0.3,
    "doc_demotion": 0.1,
    "rrf_k": 60.0,
    "budget_lines": 300,
    "search_limit": 40,
    "bridge_mode": "inject"
  },
  "top_results": [ ... ],
  "sample_count": 20,
  "probe_count": 3600
}

Precedence: calibration.json > config.toml > compiled defaults. All search and context operations read calibration.json if present.

Auto-recalibration

The CalibrationGuard triggers automatic recalibration during indexing when:

First run: No prior calibration exists
Chunk count changed >20%: Significant codebase growth or shrinkage
Primary language changed: Project shifted languages
>30 days since last calibration: Stale calibration

Cache and `--resume`

Full sweeps can be interrupted. The cache is saved after each coupling depth completes to .bobbin/calibration_cache.json. Use --resume to pick up where you left off — previously completed depths are restored from cache, and only remaining depths are re-run. Cache is cleared on successful completion.

Output

Shows a ranked table of parameter combinations with recall scores. The top result is the recommended configuration. Use --json for machine-readable output.

Keyboard shortcuts

Bobbin Documentation