Memory System
Zubo has a persistent semantic memory system that combines vector embeddings with full-text search. It automatically remembers conversations, ingested documents, and facts you teach it — and retrieves relevant context for every message. This means your agent gets smarter over time, building up a knowledge base that is always available.
Copy-Paste Task Cards
Secure API Access
Enable API auth and create a key before exposing ports beyond localhost.
zubo config set auth.enabled true
zubo auth create-key my-app
Set Local Model Fallback
Keep responses available during provider outages or API quota issues.
zubo config set failover '["openai","ollama"]'
zubo config set providers.ollama.model llama3.3
Common Errors
If auth.enabled is true, all /api/* calls require Authorization: Bearer <key>. Create a new key if needed.
curl -H "Authorization: Bearer YOUR_KEY" http://localhost:3000/api/dashboard/status
Use failover and switch temporarily to a responsive provider or smaller model. Check logs for repeated timeout patterns.
zubo model openai/gpt-4o-mini
zubo logs --follow
If local providers fail, ensure the runtime is running and a model is installed.
ollama serve
ollama pull llama3.3
Memory Quick Checks
- If recall feels weak, tune
memoryRetrieval.contextTopKandmemoryRetrieval.minConfidence. - Use the dashboard Memory search to inspect match type, confidence, and reason tags for each result.
- Keep long-lived identity/preferences in
MEMORY.md; use normal conversation memory for evolving details. - When importing files, verify chunk count and source metadata so you can prune or re-index cleanly later.
How Memory Works
Every piece of content that enters the memory system follows the same pipeline:
- Content arrives — This can be a conversation message, an uploaded document, or an explicit memory write via the
memory_writetool. - Text is chunked — The content is split into segments of approximately 400 tokens (~1600 characters) with an overlap of approximately 80 tokens (~320 characters) between consecutive chunks.
- Chunks are embedded — Each chunk is converted into a 384-dimensional vector using the all-MiniLM-L6-v2 ONNX model. This captures the semantic meaning of the text.
- Storage — Chunks, their embeddings, and metadata are stored in SQLite. A full-text search index is updated via triggers.
- Retrieval — On every incoming message, Zubo automatically searches memory for relevant context using hybrid search.
- Context injection — The top matching results are injected into the LLM context alongside the user's message, giving the agent access to relevant knowledge.
Here is a simplified view of the data flow:
Content --> Chunker --> Embedder --> SQLite (chunks + embeddings)
|
Query --> Hybrid Search <----------------+
(60% Vector + 40% FTS)
|
Top results --> LLM Context
Memory Storage
Zubo stores memory in two complementary layers:
1. File-Based Storage
Memory files live at ~/.zubo/workspace/memory/ and come in two forms:
- MEMORY.md — This is the always-loaded memory file. Its contents are included in the system prompt on every single message. Use it for core facts that should always be available: your name, preferences, project context, and important rules. You can edit it directly in a text editor or via the dashboard's Memory panel.
- Dated files — Files named with the pattern
YYYY-MM-DD.md(for example,2024-01-15.md) are created automatically when the agent writes new memories during conversations. These files are indexed into the database for search but are not loaded into the system prompt by default.
2. Database Storage
The memory_chunks table in SQLite stores all chunked content with their vector embeddings, source file references, timestamps, and full-text search index entries. This is the primary storage layer that powers memory search. It is fully managed by Zubo — you do not need to interact with it directly.
Search
Zubo supports three search modes, each suited to different scenarios:
Full-Text Search (FTS)
- Uses SQLite FTS5 with BM25 ranking for relevance scoring.
- Fast keyword matching — ideal for exact terms, names, and specific phrases.
- The FTS index is synchronized automatically via database triggers whenever chunks are inserted or deleted.
- Used for synchronous lookups during message handling where speed is critical.
Vector Search
- Uses the all-MiniLM-L6-v2 ONNX model to generate 384-dimensional embeddings.
- Performs cosine similarity matching to find semantically related content.
- Better for conceptual and semantic queries — for example, searching for "my programming preferences" will find chunks about Rust even if the word "preferences" does not appear in the stored text.
- The embedding model (~23MB) is automatically downloaded on first startup and cached at
~/.zubo/models/all-MiniLM-L6-v2.
Hybrid Search
- Combines both methods: 60% vector score + 40% FTS score.
- Provides the best of both worlds — keyword precision for exact matches plus semantic understanding for conceptual queries.
- Each match now includes explainability metadata: match type (
fts,vector,hybrid), confidence score, and reason tags. - Used for asynchronous operations such as dedicated memory search requests.
- Falls back to FTS-only mode if the embedder is unavailable (for example, if the model has not been downloaded yet).
You can tune retrieval behavior in ~/.zubo/config.json via memoryRetrieval.contextTopK and memoryRetrieval.minConfidence, or from the dashboard under Settings → General → Memory Retrieval.
Document Ingestion
You can upload documents to populate Zubo's memory with external knowledge. The following file formats are supported:
| Format | Extension | Notes |
|---|---|---|
| Plain text | .txt | Direct indexing, no preprocessing needed. |
| Markdown | .md | Direct indexing, preserves structure. |
| CSV | .csv | Parsed as text with rows preserved. |
.pdf | Requires pdf-parse (auto-installed on first PDF upload). | |
| Word | .docx | Requires mammoth (auto-installed on first DOCX upload). |
| Excel | .xlsx | Requires xlsx (SheetJS). Each sheet is converted to CSV. |
| PowerPoint | .pptx | Requires jszip. Text extracted from each slide. |
| JSON | .json | Pretty-printed before indexing. |
| XML | .xml | Tags stripped, text content extracted. |
| YAML | .yaml, .yml | Direct indexing. |
| Code | .ts, .js, .py, .sh | Direct indexing with syntax preserved. |
There are three ways to upload documents:
- Dashboard UI — Drag and drop files onto the Memory panel or use the file picker. This is the easiest method.
- API — Send a
POSTrequest to/api/uploadwith a multipart form body. Maximum file size is 50MB. - memory_write tool — For text content that is not in a file, the agent can use the
memory_writetool to save it directly to memory.
Chunking Strategy
The chunker is responsible for splitting content into segments that are small enough to embed meaningfully but large enough to preserve context. Here is how it works:
- Target chunk size: ~400 tokens (~1600 characters).
- Overlap: ~80 tokens (~320 characters) between consecutive chunks. This ensures that information at chunk boundaries is not lost.
- Smart boundary detection: The chunker tries to split at natural boundaries in this priority order: paragraph breaks, newlines, sentence endings, and finally arbitrary character positions as a last resort.
- Source tracking: Each chunk records which source file it came from, enabling provenance tracking and targeted deletion.
This strategy ensures that each chunk is a coherent unit of information that can be meaningfully compared via vector similarity, while the overlap prevents important context from falling between the cracks.
Memory Pruning
To keep the database fast and the storage footprint reasonable, Zubo automatically prunes old memory chunks when the total count exceeds a configurable limit:
- Default limit: 10,000 chunks.
- Pruning behavior: The oldest chunks (by insertion timestamp) are deleted first when the limit is exceeded.
- Trigger: Pruning runs automatically after each memory write operation.
- Configuration: The limit is adjustable via the
pruneOldChunks(db, maxChunks)function in the codebase.
In practice, 10,000 chunks represents a substantial amount of knowledge — roughly equivalent to several hundred pages of text. For most personal assistant use cases, you will never hit this limit.
Using Memory
Memory works automatically in the background, but you can also interact with it directly.
Teaching Your Agent
Tell Zubo facts and it will remember them for future conversations:
You: "Remember that my favorite programming language is Rust"
Zubo: "Got it — I'll remember that your favorite language is Rust."
The agent uses the memory_write tool to save this fact. It will be retrievable in future sessions via semantic search.
Searching Memory
You can ask Zubo to recall information it has stored:
You: "What do you remember about my preferences?"
Zubo: "Based on my memory, I know that your favorite programming
language is Rust. You prefer metric units and Markdown
formatting. Your timezone is America/New_York."
Memory search also happens automatically on every message. You do not need to explicitly ask the agent to check its memory — it does so as part of normal message processing.
Via the Dashboard
- The Memory panel shows recent memory chunks with source information and, for search results, confidence + match rationale.
- Use the search bar to search memory by keyword or phrase.
- Edit MEMORY.md directly in the dashboard to update always-loaded context.
Memory Tools
Zubo provides three built-in tools for memory operations. These are available to the main agent and to any sub-agent that lists them in its ## Tools section:
| Tool | Description |
|---|---|
memory_write | Save a fact, note, or piece of content to persistent memory. The content is chunked, embedded, and indexed automatically. |
memory_search | Search memory for relevant information using hybrid search. Returns the top matching chunks with their source and relevance score. |
memory_prune | Manage memory hygiene. Delete memories by ID, keyword, or age. Remove duplicates. View memory stats (total chunks, date range, storage size). Requires confirmation before deleting. |
Knowledge Graph
In addition to vector and full-text search, Zubo maintains a knowledge graph that stores structured entities and the relationships between them. While the chunk-based memory system excels at free-form text retrieval, the knowledge graph captures discrete facts in a queryable entity → relation → entity format. Entities have a name, a type (such as person, project, org, or concept), and optional key-value properties. Relationships link two entities with a labeled edge (for example, works_at, manages, or uses). When you mention a known entity in a message, Zubo automatically looks up its graph context and injects relevant relationships into the LLM prompt — giving the agent structured awareness alongside its semantic memory.
Two built-in tools give the agent full read/write access to the graph:
| Tool | Description |
|---|---|
kg_query | Search for entities by name, retrieve full entity details and relations, or export a subgraph. Supports actions: search, get, relations, graph. |
kg_update | Add or remove entities and relationships. Supports actions: add_entity, add_relation, remove_entity, remove_relation. Entities are upserted by name+type. |
Example interaction:
You: "I just started working at Acme Corp on the Atlas project."
Zubo: "Noted! I've added that to your knowledge graph."
# Under the hood, Zubo calls kg_update twice:
# add_relation: You --works_at--> Acme Corp (org)
# add_relation: You --works_on--> Atlas (project)
You: "What do you know about my work?"
# Zubo calls kg_query { action: "relations", name: "You" }
Zubo: "You work at Acme Corp and are on the Atlas project."
How Zubo Learns Over Time
Zubo's memory is built on three complementary layers, each serving a different purpose. Together they give the agent short-term recall, long-term semantic understanding, and structured knowledge about the people, projects, and concepts in your world.
1. Session Memory
Every conversation is persisted as JSONL in ~/.zubo/sessions/. The last 50 messages are loaded as context for each conversation turn. This gives Zubo short-term memory within a conversation — it remembers what you said earlier in the chat without needing to search the database.
2. Semantic Memory
When the agent calls memory_write, facts are:
- Written to dated markdown files in
~/.zubo/workspace/memory/ - Chunked and embedded using all-MiniLM-L6-v2 (384-dimension vectors)
- Indexed in FTS5 for keyword search
- Searchable via hybrid search (60% vector + 40% FTS)
The agent is instructed to save facts proactively — when you mention your name, job, preferences, or projects, the agent stores it immediately. Over time this builds a rich knowledge base that makes every future conversation more informed.
3. Knowledge Graph
Structured entity-relationship data provides a third dimension of recall:
- Entities have a name, type (
person,project,org,concept), and key-value properties. - Relationships link entities with labeled edges (
works_on,knows,manages,part_of). - When you mention a known entity in conversation, its graph context is automatically injected into the system prompt.
- This gives the agent structured recall beyond text similarity — it understands how things relate to each other, not just what was said about them.
Cross-Channel Memory
All three layers — session memory, semantic memory, and the knowledge graph — are shared across all channels. A fact learned on Telegram is instantly available on Discord, Slack, WhatsApp, Signal, Email, and WebChat. You never have to repeat yourself when switching between channels.
Best Practices
- Use MEMORY.md for core facts — Put information that should always be available in
MEMORY.md: your name, key preferences, project context, and important rules. This file is loaded into the system prompt on every message. - Let dated memory files manage themselves — The agent creates and populates dated memory files (e.g.,
2024-01-15.md) automatically. Avoid editing them manually unless you need to correct something specific. - Upload important documents early — The sooner your agent has context about your projects, preferences, and domain knowledge, the more useful it will be. Upload key documents during initial setup.
- Memory search is automatic — You do not need to say "check your memory" before every question. Zubo searches memory on every message as part of its standard processing pipeline.
- Use document upload for large knowledge bases — For substantial amounts of information (documentation, reference materials, project specs), use the file upload feature rather than trying to teach the agent through conversation.
- The 10k chunk limit keeps things fast — Old, less-relevant memories are pruned automatically. If you find important information being lost, consider adding it to
MEMORY.mdwhere it will always be available. - CPU inference works well — If you are running Zubo without a GPU, the embedding model still works via CPU inference. The all-MiniLM-L6-v2 model is small and fast enough that embedding latency is not noticeable in practice.