Skip to content

feat: replace LSP enrichment with SCIP integration#39

Merged
mehrdadpfg merged 2 commits intomainfrom
feat/scip-integration
Mar 13, 2026
Merged

feat: replace LSP enrichment with SCIP integration#39
mehrdadpfg merged 2 commits intomainfrom
feat/scip-integration

Conversation

@mehrdadpfg
Copy link
Contributor

@mehrdadpfg mehrdadpfg commented Mar 13, 2026

Summary

Replace the entire LSP-based enrichment pipeline with compiler-grade SCIP (Source Code Intelligence Protocol) for precise, language-agnostic cross-reference edges.

Why SCIP over LSP?

LSP enrichment had fundamental limitations:

  • Required running language servers (pyright, tsserver) as child processes — fragile, slow, version-dependent
  • Could only resolve references one-at-a-time via textDocument/definition
  • No batch mode: O(n) server roundtrips for n references
  • Type information was lossy and incomplete

SCIP indexes are produced by compiler-frontend indexers (rust-analyzer, scip-typescript, scip-python, scip-java, etc.) and contain every definition and reference in a single protobuf file — the same data Sourcegraph uses for code navigation.

What changed

New: SCIP pipeline (codemem-engine/src/index/scip/)

Module Purpose
mod.rs (reader) Parse .scip protobuf → ScipDefinition, ScipReference, ScipExternal structs. Handles symbol string parsing, kind inference, range extraction, definition extent inference
graph_builder.rs Build graph nodes + edges from parsed SCIP data. Multi-layer edge fusion with ast-grep, intra-class edge collapsing, Tier 3 node folding, fan-out limiting
orchestrate.rs Auto-detect installed SCIP indexers, run them, cache results at ~/.codemem/scip-cache/{namespace}/

Multi-layer edge fusion

ast-grep (pattern-based, confidence 0.10) and SCIP (compiler-grade, confidence 0.15) edges are fused:

  • When both sources agree on the same (src, dst, relationship) triple, confidences sum (0.25) and the ast-grep edge is superseded
  • SCIP-only edges get 0.15, ast-grep-only edges keep 0.10
  • fuse_edges() returns superseded ast-grep edge IDs for removal from storage

Node tiering & collapsing

  • Tier 3 nodes (fields, type parameters, enum variants) are folded into parent metadata instead of creating separate graph nodes — reduces graph noise significantly
  • Intra-class edges between methods of the same class/trait/interface/enum are collapsed into parent metadata (e.g., method_a CALLS method_b within the same impl block becomes metadata on the class node)
  • Module-level children are not collapsed — only class-like parents trigger collapsing

New relationship types

Added TypeDefinition, Reads, Writes, and Overrides to RelationshipType with configurable edge weights in GraphConfig.

Stale node cleanup

On re-index, SCIP nodes from previously-seen files that no longer appear in the new SCIP index are removed, along with their orphan scip-doc:* memories.

persist_graph_only fast path

New engine method for --skip-embed workflows: persists graph nodes and edges without embedding or cross-repo linking. Useful for rapid iteration during development.

Removed: LSP pipeline

  • Deleted index/lsp/ (mod.rs, pyright.rs, tsserver.rs) — 898 lines
  • Deleted persistence/lsp.rs — 445 lines
  • Deleted LSP tests (lsp_tests, pyright_tests, tsserver_tests) — 183 lines

Removed: 6 heavy MCP tools

Removed tools that performed long-running analysis inside the MCP server (blocking the assistant):

  • enrich_codebase, analyze_codebase, enrich_git_history
  • enrich_security, enrich_performance
  • index_codebase (moved to CLI-only via codemem analyze)

Tool count: 32 → 26. These operations now run exclusively through the CLI where progress can be displayed and timeouts aren't a concern.

New: 8 backfilled ADRs (docs/adr/)

Architecture Decision Records covering major design decisions from project history:

  1. Graph + Vector hybrid architecture
  2. Workspace consolidation (6 crates)
  3. Zero-unwrap lock safety
  4. Claude Code hooks integration
  5. MCP tool consolidation (32 → 26)
  6. Incremental re-indexing
  7. Lazy engine initialization
  8. SCIP over LSP (this PR)

Housekeeping

  • Removed examples/ directory (assets/ is the canonical location for agent/skill definitions)
  • Updated README tool count (32 → 26)
  • Updated SKILL.md quick-reference to remove deleted tools
  • Added SCIP edge colors to UI graph constants
  • Added scip and dirs dependencies to codemem-engine

Test plan

  • All 138 existing tests pass
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo fmt --all -- --check clean
  • New tests for SCIP reader (scip_reader_tests.rs — 181 lines)
  • New tests for SCIP graph builder (scip_graph_builder_tests.rs — 850 lines)
    • Intra-class edge collapsing
    • Module children NOT collapsed
    • Tier 3 node folding
    • Fan-out limiting
    • Definition extent inference
  • New tests for SCIP orchestrator (scip_orchestrate_tests.rs — 206 lines)
    • Language detection
    • Config command parsing
    • Empty config → autodetect
  • New persistence tests (246 lines)
    • fuse_edges returns superseded ast-grep IDs
    • Stale SCIP node cleanup on re-index
    • persist_graph_only skips embeddings

🤖 Generated with Claude Code

…lity)

Replace the LSP-based enrichment pipeline with compiler-grade SCIP
(Source Code Intelligence Protocol) for precise cross-reference edges.

Key changes:
- Add SCIP reader, graph builder, and orchestrator modules
- Multi-layer edge fusion: ast-grep (0.10) + SCIP (0.15) with additive confidence
- Intra-class edge collapsing for Class/Trait/Interface/Enum parents
- Tier 3 node folding (fields, type_params, enum_variants → parent metadata)
- SCIP cache at ~/.codemem/scip-cache/{namespace}/ with save-after-success
- Stale SCIP node cleanup on re-index with orphan doc memory deletion
- Edge fusion deduplication: fuse_edges returns superseded ast-grep IDs
- persist_graph_only fast path for --skip-embed
- Remove 6 heavy MCP tools (enrich_codebase, analyze_codebase, etc.)
- Remove LSP modules and tests, add SCIP tests
- Backfill 8 ADRs from git history
- Remove examples/ directory (assets/ is canonical)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mehrdadpfg mehrdadpfg merged commit b50dd29 into main Mar 13, 2026
9 of 11 checks passed
@mehrdadpfg mehrdadpfg deleted the feat/scip-integration branch March 13, 2026 21:41
@github-actions github-actions bot mentioned this pull request Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant