Archive Integration
Bobbin can index structured markdown records alongside code — agent memories, communication logs, HLA records, or any collection of markdown files with YAML frontmatter. Archive records are searchable via the same search and context APIs as code.
Configuration
Archive sources are configured in .bobbin/config.toml:
[archive]
enabled = true
webhook_secret = "" # Optional: Forgejo webhook auth token
[[archive.sources]]
name = "pensieve"
path = "/var/lib/bobbin/archives/pensieve"
schema = "agent-memory"
name_field = "agent"
[[archive.sources]]
name = "hla"
path = "/var/lib/bobbin/archives/hla"
schema = "human-intent"
name_field = "channel"
Source fields
| Field | Type | Description |
|---|---|---|
name | string | Source label — used as language tag in chunks and as a search filter |
path | string | Filesystem path to the directory of markdown records |
schema | string | YAML frontmatter value to match (e.g., "agent-memory") — files without this in frontmatter are skipped |
name_field | string | Optional frontmatter field used to prefix chunk names (e.g., "channel" → "telegram/{record_id}") |
Record Format
Archive records are markdown files with YAML frontmatter:
---
schema: agent-memory
id: mem-2026-0322-abc
timestamp: 2026-03-22T12:00:00Z
agent: stryder
tags: [bobbin, search-quality]
---
## Context
Discovered that tag effects only apply via /context endpoint, not /search.
The CLI returns raw LanceDB scores without boosts.
The frontmatter must contain the schema value matching your source config. Other fields (id, timestamp, etc.) are extracted as metadata.
Field handling
id— Record identifier (used in chunk ID generation)timestamp— Parsed for date-based file path grouping (YYYY/MM/DD/)source:block — Nested keys are flattened (e.g.,source:\n channel: telegrambecomes fieldchannel)- Chunk IDs — Generated via
SHA256(source:id:timestamp)for deduplication
Searching Archives
Archive records appear in regular search results. Filter by source name:
bobbin search "agent memory about search quality" --repo pensieve
HTTP API
| Endpoint | Description |
|---|---|
GET /archive/search?q=<query>&source=<name>&limit=10 | Search archive records |
GET /archive/entry/{id} | Fetch a single record by ID |
GET /archive/recent?days=30&source=<name> | Recent records with optional date range |
Web UI
Toggle “Include archive” in the Search tab to merge archive results into code search.
Webhook Integration
For automatic re-indexing when archive sources are updated via git push:
[archive]
webhook_secret = "your-secret-token"
Configure a Forgejo/Gitea push webhook pointing to POST /webhook/push. When a push event matches a configured repo, bobbin triggers an incremental re-index of that source.
Use Cases
- Agent memories (pensieve): Index agent context snapshots for cross-agent search
- Communication logs (HLA): Index human-agent interaction records
- Knowledge bases: Index structured documentation collections
- Incident records: Index postmortem and investigation reports
Indexing
Archive sources are indexed alongside code during bobbin index. The --force flag re-indexes all records:
bobbin index /var/lib/bobbin --force
Records are chunked like markdown files — headings create chunk boundaries, with frontmatter metadata preserved as chunk attributes.