Indexing Documentation
Bobbin isn’t just for source code. Its markdown parser understands headings, tables, code blocks, and YAML frontmatter, making it a powerful tool for indexing documentation repos, wikis, and knowledge bases built with tools like mdBook, Sphinx, MkDocs, or Docusaurus.
Why index your docs?
Documentation scattered across repos, wikis, and knowledge bases is hard to search. A keyword search for “authentication” returns hundreds of hits without context. Bobbin’s semantic search lets you ask questions like “how does the OAuth flow work” and get back the specific documentation sections that answer your question.
Getting started
Step 1: Initialize bobbin
cd ~/docs-repo
bobbin init
Step 2: Configure for documentation
Markdown files (**/*.md) are included by default. For a pure docs repo, you can tighten the include patterns and skip irrelevant directories:
[index]
include = [
"**/*.md",
]
exclude = [
"**/node_modules/**",
"**/build/**",
"**/dist/**",
"**/.git/**",
"**/site/**", # MkDocs build output
"**/book/**", # mdBook build output
]
Step 3: Index
bobbin index
Bobbin parses each markdown file into structural chunks:
- Sections — content under each heading, with the full heading hierarchy preserved
- Tables — standalone table chunks for easy lookup
- Code blocks — fenced code blocks extracted as individual chunks
- Frontmatter — YAML frontmatter metadata captured as
docchunks
Step 4: Search
bobbin search "how does deployment work"
How markdown parsing works
Bobbin uses pulldown-cmark to parse markdown into structural chunks. Understanding the chunk types helps you write more effective queries.
Sections
Every heading creates a section chunk containing the heading and all content up to the next heading of the same or higher level. Section names include the full heading hierarchy:
# API Reference → "API Reference"
## Authentication → "API Reference > Authentication"
### OAuth Flow → "API Reference > Authentication > OAuth Flow"
## Rate Limiting → "API Reference > Rate Limiting"
This means searching for “Authentication > OAuth” matches exactly the right section, even in a large document with many headings.
Frontmatter
YAML frontmatter at the top of a markdown file is extracted as a doc chunk:
---
title: Deployment Guide
tags: [ops, deployment, kubernetes]
status: published
---
# Deployment Guide
...
The frontmatter chunk captures the full YAML block, so you can search for metadata like tags and titles.
Tables
Markdown tables are extracted as standalone table chunks named after their parent section:
## HTTP Methods
| Method | Description | Idempotent |
|--------|---------------|------------|
| GET | Retrieve | Yes |
| POST | Create | No |
| PUT | Replace | Yes |
| DELETE | Remove | Yes |
This table becomes a chunk named “HTTP Methods (table)”, searchable independently from the surrounding text.
Code blocks
Fenced code blocks are extracted as code_block chunks, named by their language tag:
## Installation
```bash
pip install mypackage
```
This produces a chunk named “code: bash” linked to the “Installation” section.
Searching documentation
By section content
Natural-language queries work well for finding documentation sections:
bobbin search "how to configure the database connection"
bobbin search "troubleshooting SSL certificate errors"
By chunk type
Filter to specific markdown chunk types with --type:
# Find documentation sections about authentication
bobbin search "authentication" --type section
# Find tables (API references, config options, comparison charts)
bobbin search "configuration options" --type table
# Find code examples
bobbin search "kubernetes deployment" --type code_block
# Find frontmatter (metadata, tags, document properties)
bobbin search "status: draft" --type doc
Grep for exact content
Use bobbin grep when you know the exact text:
# Find all documents tagged with "deprecated"
bobbin grep "deprecated" --type doc
# Find sections mentioning a specific API endpoint
bobbin grep "/api/v2/users" --type section
# Find all bash code examples
bobbin grep "#!/bin/bash" --type code_block
Documentation repo patterns
mdBook
mdBook projects have a src/ directory with markdown files and a SUMMARY.md:
[index]
include = ["**/*.md"]
exclude = ["**/book/**"] # Skip build output
bobbin index --source ./src
MkDocs
MkDocs uses docs/ as the source directory:
[index]
include = ["**/*.md"]
exclude = ["**/site/**"] # Skip build output
bobbin index --source ./docs
Sphinx (with MyST)
Sphinx projects using MyST markdown have .md files alongside .rst:
[index]
include = ["**/*.md"]
exclude = ["**/_build/**"]
Wiki repositories
Git-based wikis (GitHub Wiki, GitLab Wiki) are plain directories of markdown files:
git clone https://github.com/org/repo.wiki.git
cd repo.wiki
bobbin init && bobbin index
Tuning for documentation
Contextual embeddings
For documentation-heavy projects, contextual embedding enrichment improves search quality by including surrounding context when computing each chunk’s vector:
[embedding.context]
context_lines = 5
enabled_languages = ["markdown"]
This is enabled for markdown by default. Increase context_lines if your documents have short sections that need more surrounding context to be meaningful.
Semantic weight
Documentation queries tend to be natural language rather than exact identifiers. Consider raising the semantic weight:
[search]
semantic_weight = 0.8
Practical workflows
Searching a knowledge base
You maintain an internal knowledge base with hundreds of documents. A new team member asks “how do we handle incident response?”
bobbin search "incident response process"
bobbin search "on-call rotation and escalation" --type section
bobbin search "severity levels" --type table
Finding outdated documentation
Grep for markers that indicate staleness:
bobbin grep "TODO" --type section
bobbin grep "FIXME" --type section
bobbin grep "deprecated" --type doc
bobbin grep "status: draft" --type doc
Cross-referencing code and docs
Index both your code and documentation into the same bobbin store:
bobbin index --repo api --source ~/projects/api
bobbin index --repo docs --source ~/projects/docs
Now search across both:
# Find the code for something mentioned in docs
bobbin search "rate limiting implementation" --repo api
# Find the docs for something you see in code
bobbin search "RateLimiter configuration" --repo docs
Next steps
- Searching — general search techniques
- Multi-Repo — indexing docs alongside code repos
- Chunk Types — full reference for markdown chunk types
- Configuration Reference — tuning settings for doc-heavy projects