Skip to content

dajneem23/z-codebase-memory-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

codebase-memory-mcp (Zig)

The fastest and smallest code intelligence engine for AI coding agents — a high-performance Zig rewrite of the original C implementation.

Indexes codebases into a persistent SQLite property-graph knowledge graph using tree-sitter AST parsing across 30+ languages, with regex fallback for universal coverage. Ships as a single 2.2 MB static binary with zero runtime dependencies.

MIT License CI Weekly Benchmarks

Why Zig?

The original C implementation bundles 158 tree-sitter grammars into the binary (257 MB). Our Zig rewrite loads grammars at runtime and uses a leaner single-pass indexing pipeline.

Metric C Original Zig Rewrite Improvement
Binary size 257 MB* 2.2 MB 117× smaller
Memory (idle) ~120 MB ~8 MB 15× smaller
Memory (indexing) ~500 MB ~50 MB 10× smaller
Compiled languages baseline 2.5× faster avg across 14 repos
Startup time ~200ms ~10ms 20× faster
Grammar loading Compiled-in Runtime (dlopen) Modular

*The C binary embeds 158 compiled tree-sitter grammars; ours loads them on demand.

Key Capabilities

  • 14 MCP tools — full protocol parity with the C original
  • Tree-sitter AST parsing — 30+ languages with compiled grammar .dylib/.so files
  • Regex fallback — works without grammars: Zig, Go, Rust, TypeScript, Python, C/C++
  • SQLite property graph — FTS5 full-text search, WAL mode, ACID-safe
  • BM25 ranking — structural boosting (Functions +10, Routes +8, Classes +5)
  • Cypher-like queriesMATCH (n:Function)-[r:CALLS]->(m) RETURN n, r, m
  • Leiden community detection — de-facto module boundaries from call graphs
  • Call chain tracing — BFS traversal with risk-classified hop distances
  • Git change analysis — co-change coupling scores, impacted symbols
  • Architecture Decision Records — full CRUD for ADRs
  • Graph-augmented grep — text search enriched with structural context
  • Single-pass AST walkerwalkAll() checks all node types in one tree traversal

Quick Start

Requires Zig 0.16.0 or later.

git clone https://github.com/dajneem23/z-codebase-mcp.git
cd z-codebase-mcp

# Debug build
zig build                          # → zig-out/bin/cbm (~10 MB)

# Release build (recommended)
zig build -Doptimize=ReleaseFast   # → zig-out/bin/cbm (~2.2 MB)

MCP Server

Add to your agent's MCP configuration:

{
  "mcpServers": {
    "codebase-memory-mcp": {
      "command": "/path/to/zig-out/bin/cbm"
    }
  }
}

For Claude Code, add to .mcp.json in your project root or ~/.claude/claude_desktop_config.json.

CLI Mode

All 14 tools can be invoked directly from the command line:

# Index a repository
cbm cli index_repository '{"repo_path":"/path/to/repo"}'

# Search by name or keyword
cbm cli search_graph '{"project":"myproject","query":"handleRequest"}'

# Trace call chains
cbm cli trace_path '{"project":"myproject","function_name":"main","mode":"calls"}'

# Architecture overview
cbm cli get_architecture '{"project":"myproject"}'

# Cypher queries
cbm cli query_graph '{"project":"myproject","query":"MATCH (n:Function)-[r:CALLS]->(m) RETURN n.name, m.name LIMIT 10"}'

# Git change impact analysis
cbm cli detect_changes '{"project":"myproject","depth":20}'

# Architecture Decision Records
cbm cli manage_adr '{"project":"myproject","mode":"update","title":"Use SQLite for storage","content":"..."}'

All 14 MCP Tools

Indexing

Tool Description
index_repository Index a codebase into the knowledge graph. Supports full, moderate, fast, and cross-repo-intelligence modes.
index_status Check indexing progress and stats for a project.
list_projects List all indexed projects with metadata.
delete_project Remove a project and all its nodes, edges, and metadata.

Search & Query

Tool Description
search_graph BM25 full-text search with structural boosting. Three modes: natural language query, regex name_pattern, and semantic_query vector search. Supports pagination via offset/limit.
search_code Graph-augmented pattern search (grep enriched with call-graph context). Returns deduplicated results ranked by structural importance.
query_graph Cypher-like queries against the property graph. Supports MATCH, WHERE, RETURN, ORDER BY, LIMIT, and aggregation functions.
get_code_snippet Read source code for a specific symbol by qualified name. Returns precise file + line ranges.

Analysis

Tool Description
trace_path BFS call-chain traversal with inbound/outbound/both directions. Modes: calls, data_flow, cross_service. Optional risk-classified hop distances (CRITICAL/HIGH/MEDIUM/LOW).
get_architecture High-level architecture overview: packages, services, dependencies, hotspots, and Leiden community detection clusters.
detect_changes Git history analysis — maps changed files to impacted symbols, computes co-change coupling scores.
get_graph_schema Returns all node labels and edge types in the knowledge graph.

Metadata

Tool Description
manage_adr CRUD for Architecture Decision Records. Supports get, update, and sections modes.
ingest_traces Ingest runtime traces (HTTP requests, async events, data flows) to enrich the knowledge graph.

Graph Data Model

Node Labels

Project, Package, Folder, File, Module, Class, Function, Method, Interface, Enum, Type, Route, Variable

Edge Types

Edge Description
CALLS Function/method invocation
HTTP_CALLS HTTP request/response between services
ASYNC_CALLS Message queue / event-driven communication
IMPLEMENTS Class implements an interface
DEFINES File/module defines a symbol
DEFINES_METHOD Class defines a method
OVERRIDE Method overrides a parent method
IMPORTS Module import relationship
USAGE General usage (type references, variable access)
FILE_CHANGES_WITH Co-change coupling from git history
CONTAINS_FILE Package/folder contains a file
CONTAINS_FOLDER Folder contains a sub-folder
CONTAINS_PACKAGE Project contains a package
HANDLES Route handler binding

Cypher Query Examples

Run these via query_graph:

-- All HTTP calls between services
MATCH (a)-[r:HTTP_CALLS]->(b)
RETURN a.name, b.name, r.url_path, r.method
LIMIT 20

-- Functions with "Handler" in their name
MATCH (f:Function)
WHERE f.name =~ '.*Handler.*'
RETURN f.name, f.file_path, f.start_line

-- What does main() call (up to 3 hops)?
MATCH (n:Function {name: 'main'})-[r:CALLS*1..3]->(m)
RETURN n.name, m.name, length(r) AS hops

-- Most-called functions (high fan-in)
MATCH (a)-[r:CALLS]->(b:Function)
RETURN b.name, count(r) AS callers
ORDER BY callers DESC
LIMIT 20

-- Dead code: functions with no callers and no call targets
MATCH (f:Function)
WHERE NOT (()-[:CALLS]->(f)) AND NOT (f)-[:CALLS]->()
RETURN f.name, f.file_path

Language Support

Tree-sitter Grammars (compiled, loaded at runtime)

Tier Languages
Compiled Go, Python, Rust, Zig, C, C++, JavaScript, TypeScript, TSX

Grammar .dylib files are compiled from source via tree-sitter generate and loaded at runtime via dlopen. To compile grammars:

cd grammars
for lang in go python rust zig c cpp javascript typescript tsx; do
  git clone --depth 1 "https://github.com/tree-sitter/tree-sitter-${lang}" "$lang" 2>/dev/null || true
  (cd "$lang" && tree-sitter generate 2>/dev/null && cp *.dylib "../${lang}.dylib") &
done
wait

Regex Fallback (always available)

For languages without compiled grammars, the regex-based fallback_extractor provides broad coverage:

Quality Languages
Excellent Zig, Go, Rust, C, C++ — compiled languages parse cleanly
Good Python, TypeScript, JavaScript, Java, C#, Kotlin
Functional Ruby, PHP, Swift, Lua, Scala, Haskell, OCaml

The regex fallback detects functions, methods, classes, interfaces, enums, and intra-file call edges. It pre-filters lines with isInterestingLine() (skips blanks, comments, imports) for speed.

Performance

Standalone Benchmarks (M1 MacBook Pro, ReleaseFast)

Repository Language Files Symbols Edges Time Files/s Syms/s
z-codebase-mcp Zig 43 817 2,052 0.24s 179 3,404
arrow-rs (93 MB) Rust 559 13,830 114,425 2.93s 191 4,720
caddy Go 326 1,761 26,514 0.69s 472 2,552
traefik Go 937 10,648 86,073 2.57s 364 4,143
ollama Go 874 12,762 297,172 2.80s 312 4,558
aseprite (194 MB) C/C++ 5,156 12,761 ~8s 644 1,595
react JS 4,584 72K 155K 23s 199 3,130
bun (280 MB) Zig 12,087 841,526 ~140s 86 6,011
v8 (247 MB) C++ 16,170 577,075 872,711 85s 190 6,789

Zig vs C — Head-to-Head (22 repos, ReleaseFast)

Zig Wins (16 of 22 repos)

Repository Lang Files Zig C Speedup Zig Syms C Syms C Edges
bitcoin C++ 1,993 3.66s 41.87s 11.4× 6,900 36,293 175,127
node C++/JS 34,650 252s 1,154s 4.5× 1,195,988 1,064,070 2,386,368
ollama Go 874 2.80s 8.33s 3.0× 12,762 149,083 297,172
claude-code TS 26 0.20s 0.44s 2.2× 656 3,024 3,822
reth Rust 1,266 4.18s 8.83s 2.1× 17,748 54,778 187,578
caddy Go 326 0.69s 1.37s 2.0× 1,761 4,767 26,514
rustdesk Rust 322 1.20s 2.43s 2.0× 5,074 14,827 53,786
golang/go Go 10,648 25s 48s 1.9× 81,880 247,574 1,474,135
traefik Go 937 2.57s 4.96s 1.9× 10,648 26,693 86,073
ziglang/zig Zig 16,987 51s 86s 1.7× 174,073 886,039 1,358,190
arrow-rs Rust 559 3.08s 4.94s 1.6× 13,830 20,538 114,425
rust-lang/rust Rust 36,422 97s 111s 1.1× 239,973 385,572 1,573,773
v8 C++ 16,170 115s 130s 1.1× 577,075 251,175 872,711
bun Zig 12,087 279s 841,526
aseprite C/C++ 5,156 ~8s 12,761

C Wins (5 of 22 repos — all JS/TS-heavy)

Repository Lang Files Zig C Ratio Zig Syms C Syms C Edges
react JS 4,584 81s 23s 9s 0.4× 416K 72K 51,722 155,390
vue JS 425 7.6s 1.9s 0.92s 0.5× 48K 8.5K 3,769 11,288
TypeScript TS 39,283 348s ~120s 83s 0.7× 1.4M ~400K 294,716 794,793
three.js JS 1,623 49s 21s 10s 0.5× 307K 94K 55,850 188,605
electron C++/JS 1,509 12s 3.6s 0.3× 73,266 23,012 54,691

🎉 Tree-sitter JS grammar added! React: 81s → 23s (3.4×), vue: 7.6s → 1.9s (4×), three.js: 49s → 21s (2.3×). Strikethrough = regex fallback; bold = with tree-sitter grammar.

Tie

Repository Lang Files Zig C Ratio Zig Syms C Syms C Edges
union Rust/Zig 2,023 10.2s 9.3s 0.9× 53,872 64,533 196,186

Summary

Category Zig Wins C Wins Avg Zig Speedup
Compiled (C++, Rust, Go, Zig) 14/14 0/14 2.5×
Interpreted (JS, TS) 1/6 5/6 0.3×*
Mixed (C++/JS) 1/1 0/1 4.5×
Overall 16/22 5/22 1.8×

*The JS/TS gap closes when tree-sitter grammars are installed (React: 81s → 23s with grammar).

Binary Size

Metric Zig (ReleaseFast) C Original Ratio
Binary size 2.2 MB 257 MB 117× smaller
Memory (idle) ~8 MB ~120 MB 15× smaller
Memory (indexing) ~50 MB ~500 MB 10× smaller

Key insight: Zig excels at compiled languages (2.5× avg speedup on Rust, Go, C++, Zig repos). For JS/TS, tree-sitter grammars close the gap — the C original bundles 158 grammars internally while Zig loads them at runtime.

Full benchmarks: BENCH.md

Architecture

src/
├── main.zig              # Entry point — MCP server + CLI dispatcher
├── mcp/                  # JSON-RPC 2.0 framework
│   ├── protocol.zig      #   MCP message types (initialize, tools/list, tools/call)
│   ├── tool.zig          #   ToolMeta struct + handler dispatch
│   └── transport.zig     #   Raw POSIX I/O stdio transport (C read/write)
├── db/                   # SQLite persistence layer
│   ├── connection.zig    #   Connection + prepared statement wrappers
│   ├── pool.zig          #   Thread-safe connection pool (semaphore-guarded)
│   ├── schema.zig        #   13 tables + FTS5 + triggers (C-original-compatible)
│   └── cypher.zig        #   Cypher→SQL translator
├── c/                    # C library bindings
│   ├── sqlite.zig        #   SQLite3 (amalgamation, WAL, 64MB cache)
│   └── treesitter.zig    #   Tree-sitter v0.26.9 (grammar loading via dlopen)
├── indexer/              # 9-stage indexing pipeline
│   ├── pipeline.zig      #   Orchestrator: scan → parse → extract → write
│   ├── scanner.zig       #   File walker (C opendir/readdir + stat)
│   ├── parser.zig        #   Tree-sitter grammar loading + AST parsing
│   ├── extractor.zig     #   Single-pass AST walker (walkAll)
│   ├── fallback_extractor.zig  #  Regex extractor (always available)
│   └── symbols.zig       #   IndexSymbol, IndexEdge, FileInfo types
├── search/               # Search engine
│   ├── bm25.zig          #   FTS5 BM25 with structural boosting
│   └── grep.zig          #   Graph-augmented pattern search
├── analysis/             # Graph analytics
│   └── leiden.zig        #   Leiden community detection (modularity optimization)
├── tools/                # 14 MCP tool implementations
│   └── registry.zig      #   Comptime dispatch table
├── model/                # Domain types
│   └── graph.zig         #   NodeLabel, EdgeType enums, Node, Edge, Project
└── util/                 # Utilities
    ├── log.zig           #   Structured logging
    ├── alloc.zig         #   Arena + tracking allocator helpers
    └── json.zig          #   JSON parse/serialize helpers

See DEV.md for detailed module documentation and PLAN.md for the implementation roadmap.

Configuration

Environment Variables

Variable Description Default
CBM_CACHE_DIR Override cache directory ~/.cache/codebase-memory-mcp/
CBM_LOG_LEVEL Log verbosity: debug, info, warn, error, none warn

Persistence

All data stored at ~/.cache/codebase-memory-mcp/:

  • _config.db — project registry (project names, root paths, file/edge counts)
  • {project-hash}.db — per-project knowledge graphs (nodes, edges, FTS5 index)

SQLite databases use WAL journal mode for concurrent reads during indexing. The config database is ACID-safe across restarts.

Team-Shared Graph Artifact

After indexing a large repo, save the graph for your team:

# Index the repo
cbm cli index_repository '{"repo_path":"/path/to/large-repo"}'

# The DB is at ~/.cache/codebase-memory-mcp/{repo}.db
# Compress and commit to the repo
cp ~/.cache/codebase-memory-mcp/large-repo.db .codebase-memory/graph.db
zstd -9 .codebase-memory/graph.db -o .codebase-memory/graph.db.zst
git add .codebase-memory/graph.db.zst
git commit -m "chore: shared codebase graph"

# Teammates: decompress and index from artifact
zstd -d .codebase-memory/graph.db.zst -o ~/.cache/codebase-memory-mcp/large-repo.db
cbm cli search_graph '{"project":"large-repo","query":"handleRequest"}'
# → instant results, no re-indexing needed

Add to .gitattributes to prevent merge conflicts:

.codebase-memory/graph.db.zst merge=ours

Diagnostics

Set CBM_LOG_LEVEL=debug for verbose output during indexing and querying. Useful for profiling slow operations or debugging extraction issues.

CBM_LOG_LEVEL=debug cbm cli index_repository '{"repo_path":"/path/to/repo"}' 2> index.log

Vendored Dependencies

Library Version Notes
SQLite 3.49.1 Amalgamation, compiled with -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1
Tree-sitter 0.26.9 Core lib only (lib/src/*.c); grammars loaded at runtime

All compiled statically via build.zig addCSourceFile. The binary has zero runtime dependencies.

Tests

zig build test    # 16 tests, 0 leaks (DebugAllocator)

Contributing

  1. Read DEV.md for architecture docs and the Zig 0.16 API cheat sheet
  2. Ensure zig build test passes with 0 leaks
  3. Follow existing module patterns (all tools in src/tools/, DB ops in src/db/)
  4. Update PLAN.md if adding new capabilities

Security

  • 100% local processing — your code never leaves your machine
  • Zero network calls — no telemetry, no analytics, no phoning home
  • Static binary — all dependencies vendored and compiled at build time
  • No runtime interpreters — no Node.js, Python, or other runtime needed

Acknowledgments

  • DeusData — creator of the original codebase-memory-mcp C implementation. This Zig rewrite builds on their excellent architecture, MCP protocol design, and property graph schema. The original's 14-tool design, Cypher query engine, and Leiden clustering approach are faithfully reproduced here.

License

MIT — see LICENSE for full text.

About

High-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 158 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors