The fastest and smallest code intelligence engine for AI coding agents — a high-performance Zig rewrite of the original C implementation.
Indexes codebases into a persistent SQLite property-graph knowledge graph using tree-sitter AST parsing across 30+ languages, with regex fallback for universal coverage. Ships as a single 2.2 MB static binary with zero runtime dependencies.
The original C implementation bundles 158 tree-sitter grammars into the binary (257 MB). Our Zig rewrite loads grammars at runtime and uses a leaner single-pass indexing pipeline.
| Metric | C Original | Zig Rewrite | Improvement |
|---|---|---|---|
| Binary size | 257 MB* | 2.2 MB | 117× smaller |
| Memory (idle) | ~120 MB | ~8 MB | 15× smaller |
| Memory (indexing) | ~500 MB | ~50 MB | 10× smaller |
| Compiled languages | baseline | 2.5× faster | avg across 14 repos |
| Startup time | ~200ms | ~10ms | 20× faster |
| Grammar loading | Compiled-in | Runtime (dlopen) | Modular |
*The C binary embeds 158 compiled tree-sitter grammars; ours loads them on demand.
- 14 MCP tools — full protocol parity with the C original
- Tree-sitter AST parsing — 30+ languages with compiled grammar
.dylib/.sofiles - Regex fallback — works without grammars: Zig, Go, Rust, TypeScript, Python, C/C++
- SQLite property graph — FTS5 full-text search, WAL mode, ACID-safe
- BM25 ranking — structural boosting (Functions +10, Routes +8, Classes +5)
- Cypher-like queries —
MATCH (n:Function)-[r:CALLS]->(m) RETURN n, r, m - Leiden community detection — de-facto module boundaries from call graphs
- Call chain tracing — BFS traversal with risk-classified hop distances
- Git change analysis — co-change coupling scores, impacted symbols
- Architecture Decision Records — full CRUD for ADRs
- Graph-augmented grep — text search enriched with structural context
- Single-pass AST walker —
walkAll()checks all node types in one tree traversal
Requires Zig 0.16.0 or later.
git clone https://github.com/dajneem23/z-codebase-mcp.git
cd z-codebase-mcp
# Debug build
zig build # → zig-out/bin/cbm (~10 MB)
# Release build (recommended)
zig build -Doptimize=ReleaseFast # → zig-out/bin/cbm (~2.2 MB)Add to your agent's MCP configuration:
{
"mcpServers": {
"codebase-memory-mcp": {
"command": "/path/to/zig-out/bin/cbm"
}
}
}For Claude Code, add to .mcp.json in your project root or ~/.claude/claude_desktop_config.json.
All 14 tools can be invoked directly from the command line:
# Index a repository
cbm cli index_repository '{"repo_path":"/path/to/repo"}'
# Search by name or keyword
cbm cli search_graph '{"project":"myproject","query":"handleRequest"}'
# Trace call chains
cbm cli trace_path '{"project":"myproject","function_name":"main","mode":"calls"}'
# Architecture overview
cbm cli get_architecture '{"project":"myproject"}'
# Cypher queries
cbm cli query_graph '{"project":"myproject","query":"MATCH (n:Function)-[r:CALLS]->(m) RETURN n.name, m.name LIMIT 10"}'
# Git change impact analysis
cbm cli detect_changes '{"project":"myproject","depth":20}'
# Architecture Decision Records
cbm cli manage_adr '{"project":"myproject","mode":"update","title":"Use SQLite for storage","content":"..."}'| Tool | Description |
|---|---|
index_repository |
Index a codebase into the knowledge graph. Supports full, moderate, fast, and cross-repo-intelligence modes. |
index_status |
Check indexing progress and stats for a project. |
list_projects |
List all indexed projects with metadata. |
delete_project |
Remove a project and all its nodes, edges, and metadata. |
| Tool | Description |
|---|---|
search_graph |
BM25 full-text search with structural boosting. Three modes: natural language query, regex name_pattern, and semantic_query vector search. Supports pagination via offset/limit. |
search_code |
Graph-augmented pattern search (grep enriched with call-graph context). Returns deduplicated results ranked by structural importance. |
query_graph |
Cypher-like queries against the property graph. Supports MATCH, WHERE, RETURN, ORDER BY, LIMIT, and aggregation functions. |
get_code_snippet |
Read source code for a specific symbol by qualified name. Returns precise file + line ranges. |
| Tool | Description |
|---|---|
trace_path |
BFS call-chain traversal with inbound/outbound/both directions. Modes: calls, data_flow, cross_service. Optional risk-classified hop distances (CRITICAL/HIGH/MEDIUM/LOW). |
get_architecture |
High-level architecture overview: packages, services, dependencies, hotspots, and Leiden community detection clusters. |
detect_changes |
Git history analysis — maps changed files to impacted symbols, computes co-change coupling scores. |
get_graph_schema |
Returns all node labels and edge types in the knowledge graph. |
| Tool | Description |
|---|---|
manage_adr |
CRUD for Architecture Decision Records. Supports get, update, and sections modes. |
ingest_traces |
Ingest runtime traces (HTTP requests, async events, data flows) to enrich the knowledge graph. |
Project, Package, Folder, File, Module, Class, Function, Method, Interface, Enum, Type, Route, Variable
| Edge | Description |
|---|---|
CALLS |
Function/method invocation |
HTTP_CALLS |
HTTP request/response between services |
ASYNC_CALLS |
Message queue / event-driven communication |
IMPLEMENTS |
Class implements an interface |
DEFINES |
File/module defines a symbol |
DEFINES_METHOD |
Class defines a method |
OVERRIDE |
Method overrides a parent method |
IMPORTS |
Module import relationship |
USAGE |
General usage (type references, variable access) |
FILE_CHANGES_WITH |
Co-change coupling from git history |
CONTAINS_FILE |
Package/folder contains a file |
CONTAINS_FOLDER |
Folder contains a sub-folder |
CONTAINS_PACKAGE |
Project contains a package |
HANDLES |
Route handler binding |
Run these via query_graph:
-- All HTTP calls between services
MATCH (a)-[r:HTTP_CALLS]->(b)
RETURN a.name, b.name, r.url_path, r.method
LIMIT 20
-- Functions with "Handler" in their name
MATCH (f:Function)
WHERE f.name =~ '.*Handler.*'
RETURN f.name, f.file_path, f.start_line
-- What does main() call (up to 3 hops)?
MATCH (n:Function {name: 'main'})-[r:CALLS*1..3]->(m)
RETURN n.name, m.name, length(r) AS hops
-- Most-called functions (high fan-in)
MATCH (a)-[r:CALLS]->(b:Function)
RETURN b.name, count(r) AS callers
ORDER BY callers DESC
LIMIT 20
-- Dead code: functions with no callers and no call targets
MATCH (f:Function)
WHERE NOT (()-[:CALLS]->(f)) AND NOT (f)-[:CALLS]->()
RETURN f.name, f.file_path| Tier | Languages |
|---|---|
| Compiled | Go, Python, Rust, Zig, C, C++, JavaScript, TypeScript, TSX |
Grammar .dylib files are compiled from source via tree-sitter generate and loaded at runtime via dlopen. To compile grammars:
cd grammars
for lang in go python rust zig c cpp javascript typescript tsx; do
git clone --depth 1 "https://github.com/tree-sitter/tree-sitter-${lang}" "$lang" 2>/dev/null || true
(cd "$lang" && tree-sitter generate 2>/dev/null && cp *.dylib "../${lang}.dylib") &
done
waitFor languages without compiled grammars, the regex-based fallback_extractor provides broad coverage:
| Quality | Languages |
|---|---|
| Excellent | Zig, Go, Rust, C, C++ — compiled languages parse cleanly |
| Good | Python, TypeScript, JavaScript, Java, C#, Kotlin |
| Functional | Ruby, PHP, Swift, Lua, Scala, Haskell, OCaml |
The regex fallback detects functions, methods, classes, interfaces, enums, and intra-file call edges. It pre-filters lines with isInterestingLine() (skips blanks, comments, imports) for speed.
| Repository | Language | Files | Symbols | Edges | Time | Files/s | Syms/s |
|---|---|---|---|---|---|---|---|
| z-codebase-mcp | Zig | 43 | 817 | 2,052 | 0.24s | 179 | 3,404 |
| arrow-rs (93 MB) | Rust | 559 | 13,830 | 114,425 | 2.93s | 191 | 4,720 |
| caddy | Go | 326 | 1,761 | 26,514 | 0.69s | 472 | 2,552 |
| traefik | Go | 937 | 10,648 | 86,073 | 2.57s | 364 | 4,143 |
| ollama | Go | 874 | 12,762 | 297,172 | 2.80s | 312 | 4,558 |
| aseprite (194 MB) | C/C++ | 5,156 | 12,761 | — | ~8s | 644 | 1,595 |
| react | JS | 4,584 | 72K | 155K | 23s | 199 | 3,130 |
| bun (280 MB) | Zig | 12,087 | 841,526 | — | ~140s | 86 | 6,011 |
| v8 (247 MB) | C++ | 16,170 | 577,075 | 872,711 | 85s | 190 | 6,789 |
| Repository | Lang | Files | Zig | C | Speedup | Zig Syms | C Syms | C Edges |
|---|---|---|---|---|---|---|---|---|
| bitcoin | C++ | 1,993 | 3.66s | 41.87s | 11.4× | 6,900 | 36,293 | 175,127 |
| node | C++/JS | 34,650 | 252s | 1,154s | 4.5× | 1,195,988 | 1,064,070 | 2,386,368 |
| ollama | Go | 874 | 2.80s | 8.33s | 3.0× | 12,762 | 149,083 | 297,172 |
| claude-code | TS | 26 | 0.20s | 0.44s | 2.2× | 656 | 3,024 | 3,822 |
| reth | Rust | 1,266 | 4.18s | 8.83s | 2.1× | 17,748 | 54,778 | 187,578 |
| caddy | Go | 326 | 0.69s | 1.37s | 2.0× | 1,761 | 4,767 | 26,514 |
| rustdesk | Rust | 322 | 1.20s | 2.43s | 2.0× | 5,074 | 14,827 | 53,786 |
| golang/go | Go | 10,648 | 25s | 48s | 1.9× | 81,880 | 247,574 | 1,474,135 |
| traefik | Go | 937 | 2.57s | 4.96s | 1.9× | 10,648 | 26,693 | 86,073 |
| ziglang/zig | Zig | 16,987 | 51s | 86s | 1.7× | 174,073 | 886,039 | 1,358,190 |
| arrow-rs | Rust | 559 | 3.08s | 4.94s | 1.6× | 13,830 | 20,538 | 114,425 |
| rust-lang/rust | Rust | 36,422 | 97s | 111s | 1.1× | 239,973 | 385,572 | 1,573,773 |
| v8 | C++ | 16,170 | 115s | 130s | 1.1× | 577,075 | 251,175 | 872,711 |
| bun | Zig | 12,087 | 279s | — | — | 841,526 | — | — |
| aseprite | C/C++ | 5,156 | ~8s | — | — | 12,761 | — | — |
| Repository | Lang | Files | Zig | C | Ratio | Zig Syms | C Syms | C Edges |
|---|---|---|---|---|---|---|---|---|
| react | JS | 4,584 | 9s | 0.4× | 51,722 | 155,390 | ||
| vue | JS | 425 | 0.92s | 0.5× | 3,769 | 11,288 | ||
| TypeScript | TS | 39,283 | 83s | 0.7× | 294,716 | 794,793 | ||
| three.js | JS | 1,623 | 10s | 0.5× | 55,850 | 188,605 | ||
| electron | C++/JS | 1,509 | 12s | 3.6s | 0.3× | 73,266 | 23,012 | 54,691 |
🎉 Tree-sitter JS grammar added! React: 81s → 23s (3.4×), vue: 7.6s → 1.9s (4×), three.js: 49s → 21s (2.3×). Strikethrough = regex fallback; bold = with tree-sitter grammar.
| Repository | Lang | Files | Zig | C | Ratio | Zig Syms | C Syms | C Edges |
|---|---|---|---|---|---|---|---|---|
| union | Rust/Zig | 2,023 | 10.2s | 9.3s | 0.9× | 53,872 | 64,533 | 196,186 |
| Category | Zig Wins | C Wins | Avg Zig Speedup |
|---|---|---|---|
| Compiled (C++, Rust, Go, Zig) | 14/14 | 0/14 | 2.5× |
| Interpreted (JS, TS) | 1/6 | 5/6 | 0.3×* |
| Mixed (C++/JS) | 1/1 | 0/1 | 4.5× |
| Overall | 16/22 | 5/22 | 1.8× |
*The JS/TS gap closes when tree-sitter grammars are installed (React: 81s → 23s with grammar).
| Metric | Zig (ReleaseFast) | C Original | Ratio |
|---|---|---|---|
| Binary size | 2.2 MB | 257 MB | 117× smaller |
| Memory (idle) | ~8 MB | ~120 MB | 15× smaller |
| Memory (indexing) | ~50 MB | ~500 MB | 10× smaller |
Key insight: Zig excels at compiled languages (2.5× avg speedup on Rust, Go, C++, Zig repos). For JS/TS, tree-sitter grammars close the gap — the C original bundles 158 grammars internally while Zig loads them at runtime.
Full benchmarks: BENCH.md
src/
├── main.zig # Entry point — MCP server + CLI dispatcher
├── mcp/ # JSON-RPC 2.0 framework
│ ├── protocol.zig # MCP message types (initialize, tools/list, tools/call)
│ ├── tool.zig # ToolMeta struct + handler dispatch
│ └── transport.zig # Raw POSIX I/O stdio transport (C read/write)
├── db/ # SQLite persistence layer
│ ├── connection.zig # Connection + prepared statement wrappers
│ ├── pool.zig # Thread-safe connection pool (semaphore-guarded)
│ ├── schema.zig # 13 tables + FTS5 + triggers (C-original-compatible)
│ └── cypher.zig # Cypher→SQL translator
├── c/ # C library bindings
│ ├── sqlite.zig # SQLite3 (amalgamation, WAL, 64MB cache)
│ └── treesitter.zig # Tree-sitter v0.26.9 (grammar loading via dlopen)
├── indexer/ # 9-stage indexing pipeline
│ ├── pipeline.zig # Orchestrator: scan → parse → extract → write
│ ├── scanner.zig # File walker (C opendir/readdir + stat)
│ ├── parser.zig # Tree-sitter grammar loading + AST parsing
│ ├── extractor.zig # Single-pass AST walker (walkAll)
│ ├── fallback_extractor.zig # Regex extractor (always available)
│ └── symbols.zig # IndexSymbol, IndexEdge, FileInfo types
├── search/ # Search engine
│ ├── bm25.zig # FTS5 BM25 with structural boosting
│ └── grep.zig # Graph-augmented pattern search
├── analysis/ # Graph analytics
│ └── leiden.zig # Leiden community detection (modularity optimization)
├── tools/ # 14 MCP tool implementations
│ └── registry.zig # Comptime dispatch table
├── model/ # Domain types
│ └── graph.zig # NodeLabel, EdgeType enums, Node, Edge, Project
└── util/ # Utilities
├── log.zig # Structured logging
├── alloc.zig # Arena + tracking allocator helpers
└── json.zig # JSON parse/serialize helpers
See DEV.md for detailed module documentation and PLAN.md for the implementation roadmap.
| Variable | Description | Default |
|---|---|---|
CBM_CACHE_DIR |
Override cache directory | ~/.cache/codebase-memory-mcp/ |
CBM_LOG_LEVEL |
Log verbosity: debug, info, warn, error, none |
warn |
All data stored at ~/.cache/codebase-memory-mcp/:
_config.db— project registry (project names, root paths, file/edge counts){project-hash}.db— per-project knowledge graphs (nodes, edges, FTS5 index)
SQLite databases use WAL journal mode for concurrent reads during indexing. The config database is ACID-safe across restarts.
After indexing a large repo, save the graph for your team:
# Index the repo
cbm cli index_repository '{"repo_path":"/path/to/large-repo"}'
# The DB is at ~/.cache/codebase-memory-mcp/{repo}.db
# Compress and commit to the repo
cp ~/.cache/codebase-memory-mcp/large-repo.db .codebase-memory/graph.db
zstd -9 .codebase-memory/graph.db -o .codebase-memory/graph.db.zst
git add .codebase-memory/graph.db.zst
git commit -m "chore: shared codebase graph"
# Teammates: decompress and index from artifact
zstd -d .codebase-memory/graph.db.zst -o ~/.cache/codebase-memory-mcp/large-repo.db
cbm cli search_graph '{"project":"large-repo","query":"handleRequest"}'
# → instant results, no re-indexing neededAdd to .gitattributes to prevent merge conflicts:
.codebase-memory/graph.db.zst merge=ours
Set CBM_LOG_LEVEL=debug for verbose output during indexing and querying. Useful for profiling slow operations or debugging extraction issues.
CBM_LOG_LEVEL=debug cbm cli index_repository '{"repo_path":"/path/to/repo"}' 2> index.log| Library | Version | Notes |
|---|---|---|
| SQLite | 3.49.1 | Amalgamation, compiled with -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 |
| Tree-sitter | 0.26.9 | Core lib only (lib/src/*.c); grammars loaded at runtime |
All compiled statically via build.zig addCSourceFile. The binary has zero runtime dependencies.
zig build test # 16 tests, 0 leaks (DebugAllocator)- Read DEV.md for architecture docs and the Zig 0.16 API cheat sheet
- Ensure
zig build testpasses with 0 leaks - Follow existing module patterns (all tools in
src/tools/, DB ops insrc/db/) - Update PLAN.md if adding new capabilities
- 100% local processing — your code never leaves your machine
- Zero network calls — no telemetry, no analytics, no phoning home
- Static binary — all dependencies vendored and compiled at build time
- No runtime interpreters — no Node.js, Python, or other runtime needed
- DeusData — creator of the original codebase-memory-mcp C implementation. This Zig rewrite builds on their excellent architecture, MCP protocol design, and property graph schema. The original's 14-tool design, Cypher query engine, and Leiden clustering approach are faithfully reproduced here.
MIT — see LICENSE for full text.