Pearl is a production-ready reinforcement learning and contextual bandit agent library built for real-world sequential decision making. It is organized around modular components—policy learners, replay buffers, exploration strategies, safety modules, and history summarizers—that snap together to form reliable agents with clear boundaries and strong defaults. The library implements classic and modern algorithms across two regimes: contextual bandits (e.g., LinUCB, LinTS, SquareCB, neural bandits) and fully sequential RL (e.g., DQN, PPO-style policy optimization), with attention to practical concerns like nonstationarity and dynamic action spaces. Tutorials demonstrate end-to-end workflows on OpenAI Gym tasks and contextual-bandit setups derived from tabular datasets, emphasizing reproducibility and clear baselines. Pearl’s design favors clarity and deployability: metrics, logging, and evaluation harnesses are integrated so you can monitor learning, compare agents, and catch regressions.
Features
- Modular agent stack with policy learners, exploration, safety, and replay buffers
- Algorithms spanning contextual bandits and sequential RL in one codebase
- Support for nonstationary settings and dynamic action spaces
- Clear tutorials for Gym tasks and bandit problems using real datasets
- Built-in evaluation, logging, and benchmarking utilities
- Practical defaults aimed at production readiness and reproducibility