NVIDIA NeMo-Aligner

Latest News

We released Nemotron-4-340B Base, Instruct, Reward. The Instruct and Reward variants are trained in Nemo-Aligner. Please see the Helpsteer2 paper for more details on the reward model training.
We are excited to announce the beta release of accelerated generation support in our RLHF pipeline using TensorRT-LLM. While this feature is still a work in progress, it already provides a significant speedup to RLHF training. For more information, please refer to our RLHF documentation.
NeMo-Aligner Paper is now out on arxiv!

Introduction

NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state of the art model alignment algorithms such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless and helpful. Users can do end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource efficient manner. For more technical details, please refer to our paper.

NeMo-Aligner toolkit is built using the NeMo Toolkit which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross compatible with the NeMo ecosystem; allowing for inference deployment and further customization.

The toolkit is currently in it's early stages, and we are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful and reliable models.

Key features

SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF.
- Llama3-70B-SteerLM-Chat aligned with NeMo Aligner.
- Corresponding reward model Llama3-70B-SteerLM-RM
- Learn more at our SteerLM and HelpSteer2 papers.
Supervised Fine Tuning
Reward Model Training
Reinforcement Learning from Human Feedback using the PPO Algorithm
- Llama3-70B-PPO-Chat aligned with NeMo Aligner.
Direct Preference Optimization as described in Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Llama3-70B-DPO-Chat aligned with NeMo Aligner.
Self-Play Fine-Tuning (SPIN) as described in Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Learn More

Latest Release

For the latest stable release please see the releases page. All releases come with a pre-built container. Changes within each release will be documented in CHANGELOG.

Installing your own environment

Requirements

NeMo-Aligner has the same requirements as the NeMo Toolkit Requirements with the addition of PyTriton.

Installation

Please follow the same steps as the NeMo Toolkit Installation Guide but run the following after installing NeMo

pip install nemo-aligner

or if you prefer to install the latest commit

pip install .

Docker Containers

We provide an official NeMo-Aligner Dockerfile which is based on stable, tested versions of NeMo, Megatron-LM, and TransformerEngine. The goal of this Dockerfile is stability, so it may not track the very latest versions of those 3 packages. You can access our Dockerfile here

Alternatively, you can build the NeMo Dockerfile here NeMo Dockerfile and add RUN pip install nemo-aligner at the end.

Future work

Add Rejection Sampling support
We will continue improving the stability of the PPO learning phase.
Improve the performance of RLHF

Contributing

We welcome community contributions! Please refer to CONTRIBUTING.md for guidelines.

Citing NeMo-Aligner

@misc{shen2024nemoaligner,
      title={NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment},
      author={Gerald Shen and Zhilin Wang and Olivier Delalleau and Jiaqi Zeng and Yi Dong and Daniel Egert and Shengyang Sun and Jimmy Zhang and Sahil Jain and Ali Taghibakhshi and Markel Sanz Ausin and Ashwath Aithal and Oleksii Kuchaiev},
      year={2024},
      eprint={2405.01481},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

This toolkit is licensed under the Apache License, Version 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github		.github
docs		docs
examples		examples
nemo_aligner		nemo_aligner
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA NeMo-Aligner

Latest News

Introduction

Key features

Learn More

Latest Release

Installing your own environment

Requirements

Installation

Docker Containers

Future work

Contributing

Citing NeMo-Aligner

License

About

Releases

Packages

Languages

License

khcs/NeMo-Aligner

Folders and files

Latest commit

History

Repository files navigation

NVIDIA NeMo-Aligner

Latest News

Introduction

Key features

Learn More

Latest Release

Installing your own environment

Requirements

Installation

Docker Containers

Future work

Contributing

Citing NeMo-Aligner

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages