Skip to content

Files

Latest commit

Mar 1, 2021
a7512b3 · Mar 1, 2021

History

History
This branch is 19 commits ahead of, 526 commits behind master.

bert

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Aug 13, 2020
Aug 13, 2020
Aug 13, 2020
Aug 13, 2020
Aug 13, 2020
Aug 13, 2020
Jan 11, 2021
Aug 13, 2020
Aug 13, 2020
Oct 16, 2020
Jan 11, 2021
Jan 11, 2021
Jan 11, 2021
Mar 1, 2021
Oct 13, 2020
Jan 11, 2021
Aug 13, 2020
Aug 13, 2020
Sep 1, 2020

MLPerf Inference Benchmarks for Natural Language Processing

This is the reference implementation for MLPerf Inference benchmarks for Natural Language Processing.

The chosen model is BERT-Large performing SQuAD v1.1 question answering task.

Prerequisites

  • nvidia-docker
  • Any NVIDIA GPU supported by TensorFlow or PyTorch

Supported Models

model framework accuracy dataset model link model source precision notes
BERT-Large TensorFlow f1_score=90.874% SQuAD v1.1 validation set from zenodo from zenodo BERT-Large, trained with NVIDIA DeepLearningExamples fp32
BERT-Large PyTorch f1_score=90.874% SQuAD v1.1 validation set from zenodo BERT-Large, trained with NVIDIA DeepLearningExamples, converted with bert_tf_to_pytorch.py fp32
BERT-Large ONNX f1_score=90.874% SQuAD v1.1 validation set from zenodo BERT-Large, trained with NVIDIA DeepLearningExamples, converted with bert_tf_to_pytorch.py fp32
BERT-Large ONNX f1_score=90.067% SQuAD v1.1 validation set from zenodo Fine-tuned based on the PyTorch model and converted with bert_tf_to_pytorch.py int8, symetrically per-tensor quantized without bias See [MLPerf INT8 BERT Finetuning.pdf](MLPerf INT8 BERT Finetuning.pdf) for details about the fine-tuning process

Disclaimer

This benchmark app is a reference implementation that is not meant to be the fastest implementation possible.

Commands

Please run the following commands:

  • make setup: initialize submodule, download datasets, and download models.
  • make build_docker: build docker image.
  • make launch_docker: launch docker container with an interaction session.
  • python3 run.py --backend=[tf|pytorch|onnxruntime|tf_estimator] --scenario=[Offline|SingleStream|MultiStream|Server] [--accuracy] [--quantized]: run the harness inside the docker container. Performance or Accuracy results will be printed in console.

Details

  • SUT implementations are in tf_SUT.py, tf_estimator_SUT.py and pytorch_SUT.py. QSL implementation is in squad_QSL.py.
  • The script accuracy-squad.py parses LoadGen accuracy log, post-processes it, and computes the accuracy.
  • Tokenization and detokenization (post-processing) are not included in the timed path.
  • The inputs to the SUT are input_ids, input_make, and segment_ids. The output from SUT is start_logits and end_logits concatenated together.
  • max_seq_length is 384.
  • The script [tf_freeze_bert.py] freezes the TensorFlow model into pb file.
  • The script [bert_tf_to_pytorch.py] converts the TensorFlow model into the PyTorch BertForQuestionAnswering module in HuggingFace Transformers and also exports the model to ONNX format.

License

Apache License 2.0