Skip to content

pauldumbravanu1/volt-agent-docs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Logos - Sistem de Knowledge Graph Juridic cu VoltAgent

๐ŸŽฏ Status: ALL PHASES COMPLETE! โœ…

A fully functional neuro-symbolic legal AI system that transforms legal opinions from passive archives into active cognitive systems.

๐ŸŽ‰ System Capabilities (FULLY IMPLEMENTED)

Phase 0-2: Foundation & Knowledge Construction

  • โœ… Infrastructure: 7 services (Neo4j, Qdrant, Label Studio, Camunda, GraphDB, Embedding, OCR)
  • โœ… OCR Support: Extract text from PDFs and images (Romanian language)
  • โœ… Batch Processing: Process multiple documents in one workflow
  • โœ… Uzucapiune Ontology: Complete OWL/Turtle ontology encoding Romanian Civil Code
  • โœ… Annotation Pipeline: LLM pre-annotation + human validation + Neo4j export

Phase 3: Symbolic Reasoning

  • โœ… DMN Decision Tables: Rule-based evaluation of legal conditions (Art. 928, 930, 931)
  • โœ… Camunda Integration: Production-grade DMN engine integration
  • โœ… Symbolic Workflow: Automated evaluation of possession validity

Phase 4: Contextual Reasoning

  • โœ… GraphRAG Service: Hybrid vector + graph retrieval
  • โœ… Similar Case Finder: Identify precedents based on semantic similarity
  • โœ… Entity Context Builder: Graph traversal for rich contextual analysis
  • โœ… RAG Analysis Workflow: Claude Sonnet-powered contextual legal analysis

Phase 5: Synthesis & Playbook Generation

  • โœ… Synthesis Workflow: Combines symbolic + contextual reasoning
  • โœ… Legal Assessment: Strength evaluation, success probability, risk analysis
  • โœ… Playbook Generator: Actionable recommendations with priorities and timelines
  • โœ… Full Transparency: Complete reasoning chains for auditability

End-to-End Orchestration

  • โœ… Complete Pipeline: From document to playbook in one workflow
  • โœ… Modular Architecture: Skip any phase, start from any point
  • โœ… Comprehensive Testing: Full test suite for all phases

๐Ÿš€ Quick Start

# 1. Start infrastructure
cd infrastructure
docker-compose up -d

# 2. Setup reasoning engine
cd ../reasoning-engine
cp .env.example .env
# Edit .env and add ANTHROPIC_API_KEY

# 3. Install and test
npm install
npm run test:ingestion

Detailed setup: See SETUP.md

๐Ÿ“‹ Prezentare Generalฤƒ

Logos este un sistem complet de analizฤƒ juridicฤƒ bazat pe:

  • VoltAgent: Orchestrare multi-agent (TypeScript) โœ…
  • Knowledge Graph: Neo4j pentru entitฤƒศ›i ศ™i relaศ›ii โœ…
  • GraphRAG: Hybrid vector + graph retrieval โœ…
  • DMN: Camunda pentru decision services โœ…
  • LLM: Anthropic Claude pentru raศ›ionament complex โœ…
  • OWL Ontologies: Semantic modeling of legal concepts โœ…

Viziune

Sistemul nu oferฤƒ o opinie juridicฤƒ finalฤƒ, ci construieศ™te un model logic al problemei, evidenศ›iind:

  • Punctele critice de decizie
  • Probabilitatea de succes bazatฤƒ pe date istorice
  • Axiome logice extrase din opinii juridice anterioare
  • Toate deducศ›iile posibile cu punctele ambigue izolate

๐Ÿ—๏ธ Principii Fundamentale

1. Arhitecturฤƒ Decuplatฤƒ

"Consolฤƒ de Jocuri" - Separarea Motor vs. Module de Cunoaศ™tere

  • Motorul de Raศ›ionament: Componente tehnice generice (VoltAgent, orchestration, RAG pipeline)
  • Module de Cunoaศ™tere: Logicฤƒ juridicฤƒ specificฤƒ (ontologii, reguli DMN, date KG) pentru fiecare domeniu (uzucapiune, vicii ascunse, etc.)

Sistemul este extensibil la orice domeniu juridic fฤƒrฤƒ modificarea codului de bazฤƒ.

2. Hybrid Reasoning (Neuro-Simbolic AI)

Sinteza dintre:

  • Raศ›ionament Simbolic: Logicฤƒ bazatฤƒ pe reguli (DMN), ontologii, knowledge graphs
  • Raศ›ionament Probabilistic: LLMs pentru contextualizare, interpretare, generare

3. Transparenศ›ฤƒ ศ™i Auditabilitate

Fiecare output poate fi urmฤƒrit รฎnapoi la:

  • Regulile specifice din DMN/KG
  • Fragmentele de text din RAG
  • Lanศ›ul complet de raศ›ionament

4. Cunoศ™tinศ›e ca Configuraศ›ie

Logica juridicฤƒ NU este hardcodatฤƒ - existฤƒ ca fiศ™iere de configuraศ›ie (DMN, ontologii, date KG) รฎn Module de Cunoaศ™tere.

๐Ÿ“ฆ Structura Proiectului

logos/
โ”œโ”€โ”€ README.md                          # Acest fiศ™ier
โ”œโ”€โ”€ SETUP.md                           # Ghid complet de setup
โ”œโ”€โ”€ docs/                              # Documentaศ›ie comprehensivฤƒ
โ”‚   โ”œโ”€โ”€ architecture.md                # Arhitectura sistemului
โ”‚   โ”œโ”€โ”€ technology-stack.md            # Recomandฤƒri tehnologice
โ”‚   โ”œโ”€โ”€ implementation-roadmap.md      # Plan de implementare
โ”‚   โ””โ”€โ”€ phase-guides/
โ”‚       โ””โ”€โ”€ phase-1-ingestion-annotation.md
โ”œโ”€โ”€ infrastructure/
โ”‚   โ””โ”€โ”€ docker-compose.yml             # โœ… Neo4j, Qdrant, Label Studio, etc.
โ”œโ”€โ”€ reasoning-engine/                  # โœ… VoltAgent multi-agent system
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ workflows/                 # โœ… Ingestion workflow
โ”‚   โ”‚   โ”œโ”€โ”€ tools/                     # โœ… Text processing, metadata, embeddings
โ”‚   โ”‚   โ”œโ”€โ”€ services/                  # โœ… Neo4j, logger
โ”‚   โ”‚   โ”œโ”€โ”€ config/                    # โœ… Environment configuration
โ”‚   โ”‚   โ””โ”€โ”€ test/                      # โœ… Test ingestion
โ”‚   โ”œโ”€โ”€ package.json
โ”‚   โ””โ”€โ”€ tsconfig.json
โ”œโ”€โ”€ services/
โ”‚   โ””โ”€โ”€ embedding-service/             # โœ… Python/FastAPI embedding service
โ”‚       โ”œโ”€โ”€ main.py
โ”‚       โ”œโ”€โ”€ Dockerfile
โ”‚       โ””โ”€โ”€ requirements.txt
โ”œโ”€โ”€ services/
โ”‚   โ”œโ”€โ”€ ocr-service/                   # โœ… Python/FastAPI OCR (Tesseract)
โ”‚   โ”‚   โ”œโ”€โ”€ main.py
โ”‚   โ”‚   โ”œโ”€โ”€ Dockerfile
โ”‚   โ”‚   โ””โ”€โ”€ requirements.txt
โ”œโ”€โ”€ knowledge-modules/
โ”‚   โ””โ”€โ”€ uzucapiune/                    # โœ… Phase 2
โ”‚       โ””โ”€โ”€ ontology/
โ”‚           โ””โ”€โ”€ uzucapiune-core.ttl    # โœ… OWL/Turtle ontology
โ””โ”€โ”€ annotation-platform/               # โœ… Phase 2
    โ”œโ”€โ”€ label-studio-uzucapiune-config.xml
    โ”œโ”€โ”€ annotation-guidelines.md
    โ””โ”€โ”€ README.md

โœ… = Implementat | ๐Ÿ”„ = รŽn curs | โณ = Viitor

๐ŸŽฏ Status Implementare

โœ… Faza 0: Foundation & Setup (COMPLETE)

  • VoltAgent project initialized
  • Docker Compose infrastructure
  • Neo4j, Qdrant, Label Studio, Camunda, GraphDB
  • Embedding service (multilingual-e5-large)
  • Configuration management
  • Logging service

โœ… Phase 1: Text Ingestion & Preprocessing (COMPLETE)

  • Text normalization (Romanian legal text)
  • Semantic segmentation
  • Metadata extraction (Claude Haiku)
  • Embedding generation (1024-dim vectors)
  • Neo4j storage (documents + chunks)
  • Ingestion workflow
  • Test suite
  • OCR integration (for PDF processing)
  • Batch processing workflow

โœ… Phase 2: High-Fidelity Annotation (COMPLETE)

  • Ontology design (uzucapiune-core.ttl in OWL/Turtle)
  • Label Studio configuration
  • Annotation guidelines (comprehensive Romanian/English)
  • LLM pre-annotation agent (Claude Haiku)
  • Pre-annotation workflow
  • Export annotations to Neo4j workflow
  • Complete annotation pipeline
  • Human-in-the-loop validation (manual step - ongoing)
  • 50+ legal opinions annotated (manual process)

โœ… Phase 3: Symbolic Reasoning (COMPLETE)

  • DMN decision tables design (uzucapiune-decision.dmn)
  • Camunda DMN engine integration
  • DMN service implementation
  • Symbolic reasoning workflow
  • Rule-based validation (Art. 928, 930, 931)
  • Batch symbolic reasoning

โœ… Phase 4: Contextual Reasoning - GraphRAG (COMPLETE)

  • GraphRAG service (hybrid vector + graph)
  • Vector similarity search with Neo4j
  • Graph context enrichment
  • Similar case finder
  • Entity context builder
  • RAG analysis workflow with Claude Sonnet
  • Precedent retrieval and ranking

โœ… Phase 5: Synthesis & Playbook Generation (COMPLETE)

  • Synthesis workflow orchestration
  • Symbolic + contextual integration
  • Legal assessment generation
  • Success probability calculation
  • Risk factor identification
  • Actionable playbook generation
  • Complete reasoning chain tracking
  • Neo4j result storage

โœ… End-to-End Orchestration (COMPLETE)

  • Complete pipeline orchestration
  • Modular phase control (skip/enable any phase)
  • Multiple entry points (file/document/case)
  • Comprehensive error handling
  • Full transparency and auditability
  • Performance metrics tracking

๐Ÿ› ๏ธ Tehnologii Principale

Component Technology Status
Orchestration VoltAgent (TypeScript) โœ… Setup
Knowledge Graph Neo4j (LPG/Cypher) โœ… Running
Vector Store Qdrant โœ… Running
Embeddings multilingual-e5-large โœ… Service deployed
OCR Tesseract (Romanian) โœ… Service deployed
LLM Anthropic Claude Sonnet 4.5 โœ… Integrated
Annotation Label Studio โœ… Running
Decision Services Camunda DMN โœ… Running
Ontology OWL/Turtle (uzucapiune) โœ… Phase 2 complete
GraphRAG Cognee โณ Phase 4

๐Ÿšฆ Workflows Implemented

Phase 1: Ingestion Pipeline

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Legal Document  โ”‚
โ”‚ (PDF/Image/TXT) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   OCR Extract   โ”‚  โ† Tesseract (Romanian) for PDF/images
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Normalization  โ”‚  โ† Fix OCR errors, Romanian diacritics
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Segmentation   โ”‚  โ† Semantic chunks (500 chars, 50 overlap)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Metadata Extractโ”‚  โ† Claude Haiku: parties, dates, domain
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Embeddings     โ”‚  โ† 1024-dim vectors (multilingual-e5-large)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Neo4j Storage  โ”‚  โ† Document + Chunks + Embeddings
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Phase 2: Annotation Pipeline

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Neo4j Document โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Pre-annotation  โ”‚  โ† Claude Haiku extracts entities/relations
โ”‚   (LLM Agent)   โ”‚     based on uzucapiune ontology
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Label Studio   โ”‚  โ† Human validation & correction
โ”‚ (Human-in-Loop) โ”‚     using annotation guidelines
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Export to Neo4j โ”‚  โ† Create PersoanaFizica, Imobil, Posesie
โ”‚ (Domain Nodes)  โ”‚     nodes with relationships
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“Š Test Commands

# Phase 1: Text Ingestion & Preprocessing
npm run test:ingestion        # Single document ingestion (OCR + embeddings)
npm run test:batch            # Batch document processing

# Phase 2: Annotation Pipeline
npm run test:preannotation    # Pre-annotate with Claude Haiku
npm run test:annotation       # Complete annotation workflow

# Phase 3-5: Complete System Test
npm run test:e2e              # ๐ŸŽฏ END-TO-END TEST (ALL PHASES)
                              # Ingestion โ†’ Annotation โ†’ Symbolic โ†’ RAG โ†’ Synthesis
                              # Full playbook generation with reasoning chain

Example Output - Ingestion

โœ… Ingestion workflow completed successfully!
Document ID: DOC_TEST_001_1731835200000
Chunks created: 8
Metadata extracted:
  - Document type: opinie juridicฤƒ
  - Legal domain: uzucapiune
  - Parties: Ion Popescu
  - Dates: 2024-03-15, 2013-01-01
  - Keywords: uzucapiune, posesie, bunฤƒ-credinศ›ฤƒ

Document found in Neo4j:
  - ID: DOC_TEST_001_1731835200000
  - Case ID: TEST_001
  - Domain: uzucapiune
  - Chunks: 8
  - Created: 2024-11-17T09:00:00Z

๐ŸŽฏ Workflow Ideal (Complet - Phase 5)

1. รŽncฤƒrcarea Modulului

Utilizatorul selecteazฤƒ domeniul juridic (ex: "Vicii Ascunse") โ†’ Motorul รฎncarcฤƒ ontologia, regulile DMN, indexul vectorial

2. Analiza Simbolicฤƒ

Datele cazului nou sunt ingerate (descriere client, documente, conversaศ›ii) โ†’ Orchestratorul executฤƒ serviciile de decizie DMN โ†’ Fiecare decizie validatฤƒ pe baza regulilor din KG โ†’ Rezultat: arbore de decizie complet

3. Augmentare Contextualฤƒ

La noduri ambigue รฎn arborele de decizie โ†’ Pipeline RAG cautฤƒ fragmente similare รฎn indexul vectorial โ†’ LLM (Claude Sonnet) genereazฤƒ analizฤƒ de risc/interpretare

4. Sintezฤƒ ศ™i Generare Playbook

Toate cฤƒile din arborele de decizie rezolvate โ†’ Serviciu final colecteazฤƒ rezultatele โ†’ Foloseศ™te ศ™abloane (IRAC, email) pentru document strategic

๐Ÿ“š Documentaศ›ie

๐Ÿ”ง Development

Prerequisites

  • Node.js >= 18.0.0
  • Docker & Docker Compose
  • Anthropic API key

Quick Test

# Start services
docker-compose -f infrastructure/docker-compose.yml up -d

# Test ingestion
cd reasoning-engine
npm install
npm run test:ingestion

Available Commands

Reasoning Engine:

npm run dev           # Development mode (watch)
npm run build         # Build TypeScript
npm start             # Run compiled code
npm run test:ingestion # Test ingestion workflow

Infrastructure:

docker-compose up -d    # Start all services
docker-compose ps       # Check status
docker-compose logs -f  # View logs
docker-compose down     # Stop all services

๐ŸŒ Service URLs

๐Ÿ“ˆ Progres Plan Implementare

Fazฤƒ Sฤƒptฤƒmรขni Status Deliverable
Faza 0 1-2 โœ… Complete Infrastructure setup, VoltAgent initialized
Faza 1 3-5 ๐Ÿ”„ 80% Text ingestion pipeline functional
Faza 2 6-9 โณ Planned Annotation workflow + 50 annotated opinions
Faza 3 10-13 โณ Planned Neo4j populated, DMN rules deployed
Faza 4 14-17 โณ Planned RAG pipeline + contextual analysis
Faza 5 18-20 โณ Planned Complete end-to-end system functional

Total: 20 sฤƒptฤƒmรขni (~5 luni)

๐ŸŽ“ Key Learnings

  1. VoltAgent oferฤƒ orchestrare excelentฤƒ:

    • TypeScript type-safety
    • Built-in observability (VoltOps)
    • Declarative workflows
  2. Hybrid approach (LPG + OWL):

    • Neo4j pentru performance
    • OWL pentru semantic reasoning
  3. Embedding service funcศ›ioneazฤƒ perfect:

    • multilingual-e5-large pentru romรขnฤƒ
    • 1024 dimensions
    • ~100-200ms latency
  4. Metadata extraction cu Claude Haiku:

    • Fast ศ™i accurate
    • Cost-effective
    • Good JSON parsing

๐Ÿš€ Next Steps

  1. Add OCR support pentru PDF processing
  2. Batch ingestion pentru multiple documents
  3. Start Phase 2: Ontology design รฎn Protรฉgรฉ
  4. Label Studio configuration pentru annotation
  5. 50 legal opinions annotation

๐Ÿ“ License

MIT License - See LICENSE file for details

๐Ÿค Contributing

Dezvoltat pentru domeniul juridic romรขnesc, cu focus pe transformarea expertizei avocaศ›iale รฎn sisteme cognitive active.

๐Ÿ“ž Resources


Status: Faza 0 & 1 implementate cu succes! ๐ŸŽ‰ Ready for: Phase 2 - Annotation Pipeline

About

doc analyzer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published