Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Core Concepts

Bobbin’s design is built around a few key ideas. Understanding them will help you get the most out of the tool.

Chunks

A chunk is a semantic unit of code extracted from a source file. Rather than indexing entire files or arbitrary line ranges, bobbin uses tree-sitter to parse source code into meaningful structural units:

Chunk TypeLanguagesExample
functionRust, TypeScript, Python, Go, Java, C++fn parse_config(...)
methodTypeScript, Java, C++class.handleRequest()
classTypeScript, Python, Java, C++class AuthService
structRust, Go, C++struct Config
enumRust, Java, C++enum Status
interfaceTypeScript, Javainterface Handler
traitRusttrait Serialize
implRustimpl Config
moduleRustmod auth
sectionMarkdown## Architecture
tableMarkdownMarkdown tables
code_blockMarkdownFenced code blocks

Files that don’t match a supported language fall back to line-based chunking (50 lines per chunk with 10-line overlap).

See Chunk Types Reference for the complete list.

Embeddings

Each chunk is converted into a 384-dimensional vector using the all-MiniLM-L6-v2 model, run locally via ONNX Runtime. These vectors capture semantic meaning — similar code produces similar vectors, even when the wording differs.

Bobbin supports contextual embedding enrichment: before computing a chunk’s vector, it can prepend surrounding lines for additional context. This is configurable per language in [embedding.context].

Search Modes

Bobbin offers three search modes:

ModeHow It WorksBest For
Hybrid (default)Combines semantic + keyword via RRFGeneral-purpose queries
SemanticVector similarity (ANN) onlyConceptual queries (“authentication logic”)
KeywordFull-text search (FTS) onlyExact identifiers (“handleRequest”)

Reciprocal Rank Fusion (RRF) merges the ranked results from both semantic and keyword search. The semantic_weight config (default: 0.7) controls the balance.

See Search Modes Reference for details.

Temporal Coupling

Temporal coupling measures how often two files change together in git history. If auth.rs and middleware.rs frequently appear in the same commits, they have high coupling — modifying one likely means you should look at the other.

Bobbin analyzes git history (configurable depth, default: 1000 commits) and stores coupling scores in SQLite. This data powers:

  • bobbin related <file> — list files coupled to a given file
  • bobbin context <query> — automatically expand search results with coupled files

Context Assembly

The context command combines search and coupling into a single context bundle:

  1. Search: Find chunks matching your query
  2. Expand: Add temporally coupled files for each match
  3. Deduplicate: Remove redundant chunks across files
  4. Budget: Trim to fit a line budget (default: 500 lines)

The result is a focused set of code that’s relevant to a task — ideal for feeding to an AI agent or understanding a change’s scope.

Hotspots

A hotspot is a file that is both frequently changed (high churn) and complex (high AST complexity). Hotspot score is the geometric mean of normalized churn and complexity. These files represent the riskiest parts of a codebase — they change often and are hard to change safely.

Storage

Bobbin uses two storage backends:

StoreTechnologyContents
PrimaryLanceDBChunks, vector embeddings, full-text search index
MetadataSQLiteTemporal coupling data, file metadata

All data lives in .bobbin/ within your repository. Nothing is sent externally.

Next Steps