Memory System
Zubo has a persistent semantic memory system that combines vector embeddings with full-text search. It automatically remembers conversations, ingested documents, and facts you teach it — and retrieves relevant context for every message. This means your agent gets smarter over time, building up a knowledge base that is always available.
How Memory Works
Every piece of content that enters the memory system follows the same pipeline:
- Content arrives — This can be a conversation message, an uploaded document, or an explicit memory write via the
memory_writetool. - Text is chunked — The content is split into segments of approximately 400 tokens (~1600 characters) with an overlap of approximately 80 tokens (~320 characters) between consecutive chunks.
- Chunks are embedded — Each chunk is converted into a 384-dimensional vector using the all-MiniLM-L6-v2 ONNX model. This captures the semantic meaning of the text.
- Storage — Chunks, their embeddings, and metadata are stored in SQLite. A full-text search index is updated via triggers.
- Retrieval — On every incoming message, Zubo automatically searches memory for relevant context using hybrid search.
- Context injection — The top matching results are injected into the LLM context alongside the user's message, giving the agent access to relevant knowledge.
Here is a simplified view of the data flow:
Content --> Chunker --> Embedder --> SQLite (chunks + embeddings)
|
Query --> Hybrid Search <----------------+
(60% Vector + 40% FTS)
|
Top results --> LLM Context
Memory Storage
Zubo stores memory in two complementary layers:
1. File-Based Storage
Memory files live at ~/.zubo/workspace/memory/ and come in two forms:
- MEMORY.md — This is the always-loaded memory file. Its contents are included in the system prompt on every single message. Use it for core facts that should always be available: your name, preferences, project context, and important rules. You can edit it directly in a text editor or via the dashboard's Memory panel.
- Dated files — Files named with the pattern
YYYY-MM-DD.md(for example,2024-01-15.md) are created automatically when the agent writes new memories during conversations. These files are indexed into the database for search but are not loaded into the system prompt by default.
2. Database Storage
The memory_chunks table in SQLite stores all chunked content with their vector embeddings, source file references, timestamps, and full-text search index entries. This is the primary storage layer that powers memory search. It is fully managed by Zubo — you do not need to interact with it directly.
Search
Zubo supports three search modes, each suited to different scenarios:
Full-Text Search (FTS)
- Uses SQLite FTS5 with BM25 ranking for relevance scoring.
- Fast keyword matching — ideal for exact terms, names, and specific phrases.
- The FTS index is synchronized automatically via database triggers whenever chunks are inserted or deleted.
- Used for synchronous lookups during message handling where speed is critical.
Vector Search
- Uses the all-MiniLM-L6-v2 ONNX model to generate 384-dimensional embeddings.
- Performs cosine similarity matching to find semantically related content.
- Better for conceptual and semantic queries — for example, searching for "my programming preferences" will find chunks about Rust even if the word "preferences" does not appear in the stored text.
- The embedding model (~23MB) is automatically downloaded on first startup and cached at
~/.orba/models/all-MiniLM-L6-v2.
Hybrid Search
- Combines both methods: 60% vector score + 40% FTS score.
- Provides the best of both worlds — keyword precision for exact matches plus semantic understanding for conceptual queries.
- Used for asynchronous operations such as dedicated memory search requests.
- Falls back to FTS-only mode if the embedder is unavailable (for example, if the model has not been downloaded yet).
Document Ingestion
You can upload documents to populate Zubo's memory with external knowledge. The following file formats are supported:
| Format | Extension | Notes |
|---|---|---|
| Plain text | .txt | Direct indexing, no preprocessing needed. |
| Markdown | .md | Direct indexing, preserves structure. |
| CSV | .csv | Parsed as text with rows preserved. |
.pdf | Requires pdf-parse (auto-installed on first PDF upload). | |
| Word | .docx | Requires mammoth (auto-installed on first DOCX upload). |
| JSON | .json | Pretty-printed before indexing. |
| XML | .xml | Tags stripped, text content extracted. |
| YAML | .yaml, .yml | Direct indexing. |
| Code | .ts, .js, .py, .sh | Direct indexing with syntax preserved. |
There are three ways to upload documents:
- Dashboard UI — Drag and drop files onto the Memory panel or use the file picker. This is the easiest method.
- API — Send a
POSTrequest to/api/uploadwith a multipart form body. Maximum file size is 50MB. - memory_write tool — For text content that is not in a file, the agent can use the
memory_writetool to save it directly to memory.
Chunking Strategy
The chunker is responsible for splitting content into segments that are small enough to embed meaningfully but large enough to preserve context. Here is how it works:
- Target chunk size: ~400 tokens (~1600 characters).
- Overlap: ~80 tokens (~320 characters) between consecutive chunks. This ensures that information at chunk boundaries is not lost.
- Smart boundary detection: The chunker tries to split at natural boundaries in this priority order: paragraph breaks, newlines, sentence endings, and finally arbitrary character positions as a last resort.
- Source tracking: Each chunk records which source file it came from, enabling provenance tracking and targeted deletion.
This strategy ensures that each chunk is a coherent unit of information that can be meaningfully compared via vector similarity, while the overlap prevents important context from falling between the cracks.
Memory Pruning
To keep the database fast and the storage footprint reasonable, Zubo automatically prunes old memory chunks when the total count exceeds a configurable limit:
- Default limit: 10,000 chunks.
- Pruning behavior: The oldest chunks (by insertion timestamp) are deleted first when the limit is exceeded.
- Trigger: Pruning runs automatically after each memory write operation.
- Configuration: The limit is adjustable via the
pruneOldChunks(db, maxChunks)function in the codebase.
In practice, 10,000 chunks represents a substantial amount of knowledge — roughly equivalent to several hundred pages of text. For most personal assistant use cases, you will never hit this limit.
Using Memory
Memory works automatically in the background, but you can also interact with it directly.
Teaching Your Agent
Tell Zubo facts and it will remember them for future conversations:
You: "Remember that my favorite programming language is Rust"
Zubo: "Got it — I'll remember that your favorite language is Rust."
The agent uses the memory_write tool to save this fact. It will be retrievable in future sessions via semantic search.
Searching Memory
You can ask Zubo to recall information it has stored:
You: "What do you remember about my preferences?"
Zubo: "Based on my memory, I know that your favorite programming
language is Rust. You prefer metric units and Markdown
formatting. Your timezone is America/New_York."
Memory search also happens automatically on every message. You do not need to explicitly ask the agent to check its memory — it does so as part of normal message processing.
Via the Dashboard
- The Memory panel shows recent memory chunks with timestamps and source information.
- Use the search bar to search memory by keyword or phrase.
- Edit MEMORY.md directly in the dashboard to update always-loaded context.
Memory Tools
Zubo provides two built-in tools for memory operations. These are available to the main agent and to any sub-agent that lists them in its ## Tools section:
| Tool | Description |
|---|---|
memory_write | Save a fact, note, or piece of content to persistent memory. The content is chunked, embedded, and indexed automatically. |
memory_search | Search memory for relevant information using hybrid search. Returns the top matching chunks with their source and relevance score. |
Best Practices
- Use MEMORY.md for core facts — Put information that should always be available in
MEMORY.md: your name, key preferences, project context, and important rules. This file is loaded into the system prompt on every message. - Let dated memory files manage themselves — The agent creates and populates dated memory files (e.g.,
2024-01-15.md) automatically. Avoid editing them manually unless you need to correct something specific. - Upload important documents early — The sooner your agent has context about your projects, preferences, and domain knowledge, the more useful it will be. Upload key documents during initial setup.
- Memory search is automatic — You do not need to say "check your memory" before every question. Zubo searches memory on every message as part of its standard processing pipeline.
- Use document upload for large knowledge bases — For substantial amounts of information (documentation, reference materials, project specs), use the file upload feature rather than trying to teach the agent through conversation.
- The 10k chunk limit keeps things fast — Old, less-relevant memories are pruned automatically. If you find important information being lost, consider adding it to
MEMORY.mdwhere it will always be available. - CPU inference works well — If you are running Zubo without a GPU, the embedding model still works via CPU inference. The all-MiniLM-L6-v2 model is small and fast enough that embedding latency is not noticeable in practice.