Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Storage & Data Flow

Bobbin uses a dual-storage architecture: LanceDB as the primary store for chunks, embeddings, and full-text search, with SQLite handling temporal coupling data and global metadata.

LanceDB (Primary Storage)

All chunk data, embeddings, and full-text search live in LanceDB:

chunks table:
  - id: string            # SHA256-based unique chunk ID
  - vector: float[384]    # MiniLM embedding
  - repo: string          # Repository name (for multi-repo support)
  - file_path: string     # Relative file path
  - file_hash: string     # Content hash for incremental indexing
  - language: string      # Programming language
  - chunk_type: string    # function, method, class, section, etc.
  - chunk_name: string?   # Function/class/section name (nullable)
  - start_line: uint32    # Starting line number
  - end_line: uint32      # Ending line number
  - content: string       # Original chunk content
  - full_context: string? # Context-enriched text used for embedding (nullable)
  - indexed_at: string    # Timestamp

LanceDB also maintains an FTS index on the content field for keyword search.

SQLite (Auxiliary)

SQLite only stores temporal coupling data and global metadata:

-- Temporal coupling (git co-change relationships)
CREATE TABLE coupling (
    file_a TEXT NOT NULL,
    file_b TEXT NOT NULL,
    score REAL,
    co_changes INTEGER,
    last_co_change INTEGER,
    PRIMARY KEY (file_a, file_b)
);

-- Global metadata
CREATE TABLE meta (
    key TEXT PRIMARY KEY,
    value TEXT
);

Key Types

Chunk

A semantic unit extracted from source code:

#![allow(unused)]
fn main() {
struct Chunk {
    id: String,           // SHA256-based unique ID
    file_path: String,    // Source file path
    chunk_type: ChunkType,// function, class, struct, etc.
    name: Option<String>, // Function/class name
    start_line: u32,      // Starting line number
    end_line: u32,        // Ending line number
    content: String,      // Actual code content
    language: String,     // Programming language
}
}

SearchResult

#![allow(unused)]
fn main() {
struct SearchResult {
    chunk: Chunk,              // The matched chunk
    score: f32,                // Relevance score
    match_type: Option<MatchType>, // How it was matched
}
}

Hybrid Search (RRF)

The hybrid search combines semantic (vector) and keyword (FTS) results using Reciprocal Rank Fusion:

RRF_score = semantic_weight / (k + semantic_rank) + keyword_weight / (k + keyword_rank)

Where:

  • k = 60 (standard RRF constant)
  • semantic_weight from config (default 0.7)
  • keyword_weight = 1 - semantic_weight

Results that appear in both searches get boosted scores and are marked as [hybrid] matches.