Skip to content

Latest commit

 

History

History

xr-transformer-neurips21

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification, NeurIPS 2021

This folder contains code to train XR-Transformer models and reproduce experiments in "Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification".

Getting Started

  • Clone the repository and enter examples/xr-transformer-neurips21 directory.
  • First create a virtual environment and then install dependencies by running the following command:
pip install -r requirements.txt

If you're unfamiliar with Python virtual environments, check out the user guide.

Downloading Data

The XMC datasets can be download at

# eurlex-4k, wiki10-31k, amazoncat-13k, amazon-670k, wiki-500k, amazon-3m
DATASET="wiki10-31k"
wget https://archive.org/download/pecos-dataset/xmc-base/${DATASET}.tar.gz
tar -zxvf ./${DATASET}.tar.gz

Training and Evaluation

To train and evaluate XR-Transformer model, run

bash run.sh ${DATASET}

Recommended platform for training: AWS p3.16xlarge instance or equivalent.

Getting XR-Transformer Embeddings

We also release the fine-tuned XR-Transformer encoders on which users can generate instance embeddings. The XR-Transformer encoders can be download at

# eurlex-4k, wiki10-31k, amazoncat-13k, amazon-670k, wiki-500k, amazon-3m
DATASET="wiki10-31k"
wget https://archive.org/download/xr-transformer-encoders/${DATASET}.tar.gz
mkdir -p ./encoders
tar -zxvf ./${DATASET}.tar.gz -C ./encoders

The XR-Transformer embeddings of training and testing instances can be generated by:

# for eurlex-4k, wiki10-31k, amazoncat-13k, MODEL_NAME can be bert|roberta|xlnet
# for amazon-670k, wiki-500k, amazon-3m, MODEL_NAME can be bert1|bert2|bert3
MODEL_NAME="bert"
model_dir="./encoders/${DATASET}/${MODEL_NAME}"

python3 -m pecos.xmc.xtransformer.encode \
    --text-path xmc-base/${DATASET}/X.trn.txt \
    --model-folder ${model_dir} \
    --batch-gen-workers 16 \
    --save-emb-path ${model_dir}/X.emb.trn.npy \
    --batch-size 128

python3 -m pecos.xmc.xtransformer.encode \
    --text-path xmc-base/${DATASET}/X.tst.txt \
    --model-folder ${model_dir} \
    --batch-gen-workers 16 \
    --save-emb-path ${model_dir}/X.emb.tst.npy \
    --batch-size 128

Embeddings will be saved at ${model_dir}/X.emb.trn.npy and ${model_dir}/X.emb.tst.npy.

Citation

If you find this useful, please consider citing our paper.