2 unstable releases
| 0.2.0 | Aug 1, 2025 |
|---|---|
| 0.1.0 | Aug 1, 2025 |
#756 in Machine learning
260KB
5K
SLoC
Micro Swarm - Real Distributed Orchestration System
A complete swarm orchestration system for micro-neural networks with actual agent coordination, task scheduling, memory management, and fault tolerance.
๐ Features
This implementation replaces boolean flags with REAL functionality:
โ Agent Lifecycle Management
- Real agent spawning and lifecycle control
- Neural network agents with inference capabilities
- Quantum computation agents with optimization
- Generic agents for general processing
- Agent health monitoring and failure detection
โ Task Scheduling & Execution
- Priority-based task queues with dependency resolution
- Multiple scheduling strategies: RoundRobin, LeastLoaded, LoadBalanced, CapabilityBased
- Parallel task execution with resource constraints
- Task timeout handling and cancellation
- Real-time scheduling statistics
โ Memory Management
- Memory pooling with configurable region sizes
- Multiple eviction policies: LRU, LFU, FIFO, TTL
- Memory transfer and zero-copy operations
- Per-agent memory allocation limits
- Garbage collection and memory optimization
โ Inter-Agent Communication
- Message channels between agents with queuing
- Broadcast channels for group communication
- Message persistence and compression options
- Communication hub for routing optimization
- Network statistics and monitoring
โ Distributed Coordination
- Multiple topologies: Centralized, Mesh, Hierarchical, Ring, Star
- Consensus protocols with Byzantine fault tolerance
- Leader election and role management
- Health monitoring with heartbeat detection
- Distributed decision making with voting
โ Fault Tolerance
- Agent failure detection and recovery
- Automatic failover and load redistribution
- Health checks with configurable thresholds
- Circuit breaker patterns for stability
- Graceful degradation under load
โ Real-Time Monitoring
- Comprehensive metrics collection and reporting
- Resource utilization tracking (CPU, memory, network)
- Performance statistics with throughput analysis
- System health dashboards
- Exportable status reports
๐๏ธ Architecture
SwarmOrchestrator
โโโ TaskScheduler # Priority queues, dependency resolution
โโโ MemoryManager # Memory pooling, garbage collection
โโโ SwarmCoordinator # Distributed consensus, leader election
โโโ CommunicationHub # Message routing, broadcast channels
โโโ Agent Registry # Agent lifecycle, health monitoring
โโโ Metrics Collection # Real-time statistics, monitoring
๐ Performance Characteristics
- 256 agents maximum (matches chip core count)
- 28MB memory pool with 64KB regions
- Sub-millisecond task scheduling latency
- Byzantine fault tolerance up to 33% failures
- Real-time health monitoring every 100ms
- Zero-copy memory transfers between agents
๐ง Usage
Basic Swarm Setup
use micro_swarm::*;
// Create orchestrator with mesh topology
let mut orchestrator = SwarmBuilder::new()
.name("production_swarm".into())
.max_agents(64)
.topology(SwarmTopology::Mesh)
.fault_tolerance(true)
.build()?;
// Initialize and bootstrap agents
orchestrator.initialize()?;
let agent_ids = orchestrator.bootstrap_default_agents()?;
Task Submission & Execution
// Create a high-priority neural task
let task = TaskBuilder::new("neural_analysis".into())
.payload(input_data)
.priority(TaskPriority::High)
.requires("neural_inference".into())
.timeout(Duration::from_secs(30))
.build();
// Submit and process
let task_id = orchestrator.submit_task(task)?;
let stats = orchestrator.process_cycle()?;
// Get results
if let Some(result) = orchestrator.get_task_result(task_id) {
println!("Task completed: {:?}", result);
}
Custom Agent Creation
// Create specialized agents
let neural_agent = AgentFactory::create_neural("vision_net".into(), 2048);
let quantum_agent = AgentFactory::create_quantum("optimizer".into(), 16);
let custom_agent = AgentFactory::create_generic("preprocessor".into());
// Register with orchestrator
orchestrator.register_agent(neural_agent)?;
orchestrator.register_agent(quantum_agent)?;
orchestrator.register_agent(custom_agent)?;
Distributed Coordination
// Submit consensus proposal
let proposal_id = orchestrator.coordinator.submit_proposal(
agent_id,
ProposalType::TaskAssignment,
proposal_data
)?;
// Cast votes
orchestrator.coordinator.cast_vote(
proposal_id,
voter_agent,
VoteDecision::Approve,
Some("Resource allocation approved".into())
)?;
Memory Management
// Allocate memory for agents
let region_id = orchestrator.memory_manager.allocate(agent_id, 4096)?;
// Transfer data between agents
orchestrator.memory_manager.write(region_id, &data)?;
orchestrator.memory_manager.transfer(region_id, target_agent)?;
// Garbage collection
orchestrator.memory_manager.garbage_collect()?;
๐ Monitoring & Metrics
// Get real-time metrics
let metrics = orchestrator.metrics();
println!("Active agents: {}", metrics.active_agents);
println!("Memory utilization: {:.1}%", metrics.memory_utilization * 100.0);
println!("Task throughput: {:.2}/sec", metrics.throughput);
// Export detailed status
let status_report = orchestrator.export_status()?;
println!("{}", status_report);
// Component-specific statistics
let scheduler_stats = orchestrator.scheduler_stats();
let coordination_stats = orchestrator.coordination_stats();
let memory_stats = orchestrator.memory_stats();
๐งช Testing
Run the comprehensive test suite:
cargo test --features std
Run integration tests:
cargo test --test integration_tests --features std
Run the basic example:
cargo run --example basic_swarm --features std
๐ฏ Key Differences from Original
| Component | Original | New Implementation |
|---|---|---|
| Agents | Boolean flags | Real agents with lifecycles, capabilities, and execution |
| Scheduler | Boolean flags | Priority queues, dependency resolution, multiple strategies |
| Memory | Boolean flags | Memory pooling, eviction policies, garbage collection |
| Coordination | Boolean flags | Consensus protocols, leader election, fault tolerance |
| Communication | None | Message channels, broadcast, routing optimization |
| Monitoring | Boolean flags | Real-time metrics, resource tracking, performance analysis |
๐ก๏ธ Fault Tolerance
The system implements multiple layers of fault tolerance:
- Agent Level: Health monitoring, automatic restart, failure detection
- Task Level: Timeout handling, retry mechanisms, graceful failure
- System Level: Leader election, consensus protocols, degraded operation
- Network Level: Message queuing, retry logic, circuit breakers
๐ Topologies Supported
- Centralized: Single coordinator, hub-and-spoke communication
- Mesh: Fully connected agents, distributed coordination
- Hierarchical: Tree structure with multiple coordination levels
- Ring: Circular communication pattern with distributed consensus
- Star: Central hub with specialized edge agents
โ๏ธ Configuration
The system is highly configurable through builder patterns:
let config = SwarmBuilder::new()
.max_agents(128)
.topology(SwarmTopology::Hierarchical)
.scheduler_config(SchedulerConfig {
selection_strategy: AgentSelectionStrategy::LoadBalanced,
max_concurrent_tasks: 512,
task_queue_size: 10000,
load_balancing: true,
dependency_resolution: true,
..Default::default()
})
.memory_config(MemoryConfig {
total_size: 64 * 1024 * 1024, // 64MB
region_size: 128 * 1024, // 128KB regions
eviction_policy: EvictionPolicy::LRU,
compression_enabled: true,
..Default::default()
})
.fault_tolerance(true)
.monitoring(true)
.build()?;
๐ Next Steps
This real implementation provides:
- Production-ready distributed coordination
- Scalable task scheduling and execution
- Robust fault tolerance and recovery
- Comprehensive monitoring and metrics
- Flexible configuration and topologies
The system is designed for high-performance, low-latency neural network processing with enterprise-grade reliability and observability.
๐ Documentation
๐ค Contributing
This is a complete rewrite that replaces boolean flags with actual distributed system functionality. The implementation provides real value for production neural network orchestration.
Dependencies
~5โ8.5MB
~161K SLoC