Official implementation of ReAGent, as presented in our paper (ReLM at AAAI24):
ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models
cite us:
@inproceedings{zhao2024reagent,
title= "ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models",
author= "Zhao, Zhixue and Shan, Boxuan",
booktitle= "Proceedings of AAAI Workshop on Responsible Language Models",
year= "2024",
url = "https://arxiv.org/pdf/2402.00794.pdf"
}
Run demo: the jupyter notebook src/notebook.ipynb
or the Colab demo.
You could try your own model and input, you may see importance distribution plots like:
We are running experiments with Python 3.10
- install anaconda/miniconda
- create a conda environment
conda create -n env-reagent python=3.10
- activate environment
source activate env-reagent
- install package dependencies
pip install -r requirements.txt
- install python 3.10
- create a virtual environment
python3 -m venv .venv
- activate environment
source .venv/bin/activate
- install package dependencies
pip install -r requirements.txt
Run the following command
python setup_nltk.py
target_pos | target_token | source_soft_ns | source_soft_nc | rand_soft_ns | rand_soft_nc |
---|---|---|---|---|---|
10 | Nintendo | 0.000 | 0.004 | 0.000 | 0.019 |
11 | , | 0.255 | 0.017 | 0.000 | 0.007 |
12 | and | 0.000 | 0.006 | 0.004 | 0.017 |
13 | was | 0.008 | 0.008 | 0.000 | 0.006 |
14 | released | 0.027 | 0.005 | 0.000 | 0.005 |
target_tokens | metric_soft_ns | metric_soft_nc |
---|---|---|
Nintendo$,$ and$ was$ released | 4.299 | -0.309 |
This experiment has been split into multiple steps.
For the analogies experiment, we use the analogies dataset provided by Mikolev et al. Please download the text file into data/analogies.txt
.
Then run the following commands to prepare the dataset for attribution.
cache_dir="cache/"
# select from: gpt2-medium gpt2-xl EleutherAI/gpt-j-6b facebook/opt-350m facebook/opt-1.3b KoboldAI/OPT-6.7B-Erebus
model_name="gpt2-medium"
# select from: gpt2 gpt2_xl gpt6b OPT350M OPT1B OPT6B
model_short_name="gpt2"
mkdir -p "data/analogies/"$model_short_name
python src/data/prepare_evaluation_analogy.py \
--analogies-file data/analogies.txt \
--output-dir data/analogies/$model_short_name \
--compact-output True \
--schema-uri ../../docs/analogy.schema.json \
--device cuda \
--model $model_name \
--cache_dir $cache_dir
It will generate a set of json files in the --output-dir
Run the following commands to start the feature attribution
config_name=top3_replace0.1_max5000_batch5
hyper="top3_replace0.1_max5000_batch5"
FA_name="ours"
importance_results="rationalization_results/analogies/$model_short_name_$FA_name/$hyper"
mkdir -p $importance_results
logfolder_shortname=logs/analogies/$model_short_name"_"$FA_name$hyper
mkdir -p $logfolder_shortname
python src/rationalization/run_analogies.py \
--rationalization-config config/$config_name.json \
--model $model_name \
--tokenizer $model_name \
--data-dir data/analogies/$model_short_name \
--importance_results_dir $importance_results \
--device cuda \
--logfolder $logfolder_shortname \
--input_num_ratio 1 \
--cache_dir $cache_dir
It will load config of feature attribution method from --rationalization-config
and output attribution results to --importance_results_dir
Run the following commands to start the evaluation
eva_output_dir="evaluation_results/analogies/"$model_short_name"_"$FA_name$hyper
mkdir -p $eva_output_dir
python src/evaluation/evaluate_analogies.py \
--importance_results_dir $importance_results \
--eva_output_dir $eva_output_dir \
--model $model_name \
--tokenizer $model_name \
--logfolder $logfolder_shortname \
--rationale_size_ratio 1 \
--cache_dir $cache_dir
It will load attribution results form --importance_results_dir
and output evaluation metrics to --eva_output_dir
A dataset is a plaintext TXT file in data/benchmark
where each line is a prompt. Dataset used in our experiments are
- wikitext.txt
- tellmewhy2.txt
The program of this experiment is in an End-to-End style. To run the experiment, run the following commands
cache_dir="cache/"
# select from: gpt2-medium gpt2-xl EleutherAI/gpt-j-6b facebook/opt-350m facebook/opt-1.3b KoboldAI/OPT-6.7B-Erebus
model_name="gpt2-medium"
# select from: gpt2 gpt2_xl gpt6b OPT350M OPT1B OPT6B
model_short_name="gpt2"
# Select from: norm gradient_shap integrated_gradients input_x_gradient attention attention_rollout attention_last ours
FA_name="ours"
# Select from: wikitext tellmewhy wikitext2 tellmewhy2
dataset=wikitext
python src/sequence_rationalization.py \
--model $model_name \
--model_shortname $model_short_name \
--method $FA_name \
--stride 2 \
--max_new_tokens 10 \
--cache_dir $cache_dir \
--testing_data_name $dataset
It will load dataset from data/benchmark/<--testing_data_name>.txt
and output both attribution results and metrics to evaluation_results/benchmark/<--model_shortname>_<--method>/<--testing_data_name>