Core Concepts
Bobbin’s design is built around a few key ideas. Understanding them will help you get the most out of the tool.
Chunks
A chunk is a semantic unit of code extracted from a source file. Rather than indexing entire files or arbitrary line ranges, bobbin uses tree-sitter to parse source code into meaningful structural units:
| Chunk Type | Languages | Example |
|---|---|---|
function | Rust, TypeScript, Python, Go, Java, C++ | fn parse_config(...) |
method | TypeScript, Java, C++ | class.handleRequest() |
class | TypeScript, Python, Java, C++ | class AuthService |
struct | Rust, Go, C++ | struct Config |
enum | Rust, Java, C++ | enum Status |
interface | TypeScript, Java | interface Handler |
trait | Rust | trait Serialize |
impl | Rust | impl Config |
module | Rust | mod auth |
section | Markdown | ## Architecture |
table | Markdown | Markdown tables |
code_block | Markdown | Fenced code blocks |
Files that don’t match a supported language fall back to line-based chunking (50 lines per chunk with 10-line overlap).
See Chunk Types Reference for the complete list.
Embeddings
Each chunk is converted into a 384-dimensional vector using the all-MiniLM-L6-v2 model, run locally via ONNX Runtime. These vectors capture semantic meaning — similar code produces similar vectors, even when the wording differs.
Bobbin supports contextual embedding enrichment: before computing a chunk’s vector, it can prepend surrounding lines for additional context. This is configurable per language in [embedding.context].
Search Modes
Bobbin offers three search modes:
| Mode | How It Works | Best For |
|---|---|---|
| Hybrid (default) | Combines semantic + keyword via RRF | General-purpose queries |
| Semantic | Vector similarity (ANN) only | Conceptual queries (“authentication logic”) |
| Keyword | Full-text search (FTS) only | Exact identifiers (“handleRequest”) |
Reciprocal Rank Fusion (RRF) merges the ranked results from both semantic and keyword search. The semantic_weight config (default: 0.7) controls the balance.
See Search Modes Reference for details.
Temporal Coupling
Temporal coupling measures how often two files change together in git history. If auth.rs and middleware.rs frequently appear in the same commits, they have high coupling — modifying one likely means you should look at the other.
Bobbin analyzes git history (configurable depth, default: 1000 commits) and stores coupling scores in SQLite. This data powers:
bobbin related <file>— list files coupled to a given filebobbin context <query>— automatically expand search results with coupled files
Context Assembly
The context command combines search and coupling into a single context bundle:
- Search: Find chunks matching your query
- Expand: Add temporally coupled files for each match
- Deduplicate: Remove redundant chunks across files
- Budget: Trim to fit a line budget (default: 500 lines)
The result is a focused set of code that’s relevant to a task — ideal for feeding to an AI agent or understanding a change’s scope.
Hotspots
A hotspot is a file that is both frequently changed (high churn) and complex (high AST complexity). Hotspot score is the geometric mean of normalized churn and complexity. These files represent the riskiest parts of a codebase — they change often and are hard to change safely.
Storage
Bobbin uses two storage backends:
| Store | Technology | Contents |
|---|---|---|
| Primary | LanceDB | Chunks, vector embeddings, full-text search index |
| Metadata | SQLite | Temporal coupling data, file metadata |
All data lives in .bobbin/ within your repository. Nothing is sent externally.
Next Steps
- Agent Setup — connect bobbin to AI coding tools
- Searching Guide — advanced search techniques
- Architecture Overview — deeper dive into internals