bert

cesar-avalos3

and

guschmue

Mar 1, 2021

a7512b3 · Mar 1, 2021

History

This branch is 19 commits ahead of, 527 commits behind master.

Name	Name	Last commit message	Last commit date
parent directory ..
DeepLearningExamples @ b03375b	DeepLearningExamples @ b03375b	move to new repo structure	Aug 13, 2020
.gitignore	.gitignore	move to new repo structure	Aug 13, 2020
Dockerfile	Dockerfile	move to new repo structure	Aug 13, 2020
MLPerf INT8 BERT Finetuning.pdf	MLPerf INT8 BERT Finetuning.pdf	move to new repo structure	Aug 13, 2020
Makefile	Makefile	move to new repo structure	Aug 13, 2020
README.md	README.md	move to new repo structure	Aug 13, 2020
accuracy-squad.py	accuracy-squad.py	BERT: optional max_examples limit in accuracy mode. (#799 )	Jan 11, 2021
bert_config.json	bert_config.json	move to new repo structure	Aug 13, 2020
bert_tf_to_pytorch.py	bert_tf_to_pytorch.py	move to new repo structure	Aug 13, 2020
create_squad_data.py	create_squad_data.py	Use tf.compat.v1 for compatibility with v2.	Oct 16, 2020
evaluate-v1.1.py	evaluate-v1.1.py	BERT: optional max_examples limit in accuracy mode. (#799 )	Jan 11, 2021
onnxruntime_SUT.py	onnxruntime_SUT.py	BERT: optional max_examples limit in accuracy mode. (#799 )	Jan 11, 2021
pytorch_SUT.py	pytorch_SUT.py	BERT: optional max_examples limit in accuracy mode. (#799 )	Jan 11, 2021
run.py	run.py	Change the way bert/run.py launches the accuracy-squad script by acco…	Mar 1, 2021
squad_QSL.py	squad_QSL.py	using the local version, which has the same functionality, but fewer …	Oct 13, 2020
tf_SUT.py	tf_SUT.py	BERT: optional max_examples limit in accuracy mode. (#799 )	Jan 11, 2021
tf_estimator_SUT.py	tf_estimator_SUT.py	move to new repo structure	Aug 13, 2020
tf_freeze_bert.py	tf_freeze_bert.py	move to new repo structure	Aug 13, 2020
user.conf	user.conf	Populate user.conf with user-configurable LoadGen settings. Update be…	Sep 1, 2020

README.md

MLPerf Inference Benchmarks for Natural Language Processing

This is the reference implementation for MLPerf Inference benchmarks for Natural Language Processing.

The chosen model is BERT-Large performing SQuAD v1.1 question answering task.

Prerequisites

nvidia-docker
Any NVIDIA GPU supported by TensorFlow or PyTorch

Supported Models

model	framework	accuracy	dataset	model link	model source	precision	notes
BERT-Large	TensorFlow	f1_score=90.874%	SQuAD v1.1 validation set	from zenodo from zenodo	BERT-Large, trained with NVIDIA DeepLearningExamples	fp32
BERT-Large	PyTorch	f1_score=90.874%	SQuAD v1.1 validation set	from zenodo	BERT-Large, trained with NVIDIA DeepLearningExamples, converted with bert_tf_to_pytorch.py	fp32
BERT-Large	ONNX	f1_score=90.874%	SQuAD v1.1 validation set	from zenodo	BERT-Large, trained with NVIDIA DeepLearningExamples, converted with bert_tf_to_pytorch.py	fp32
BERT-Large	ONNX	f1_score=90.067%	SQuAD v1.1 validation set	from zenodo	Fine-tuned based on the PyTorch model and converted with bert_tf_to_pytorch.py	int8, symetrically per-tensor quantized without bias	See [MLPerf INT8 BERT Finetuning.pdf](MLPerf INT8 BERT Finetuning.pdf) for details about the fine-tuning process

Disclaimer

This benchmark app is a reference implementation that is not meant to be the fastest implementation possible.

Commands

Please run the following commands:

make setup: initialize submodule, download datasets, and download models.
make build_docker: build docker image.
make launch_docker: launch docker container with an interaction session.
python3 run.py --backend=[tf|pytorch|onnxruntime|tf_estimator] --scenario=[Offline|SingleStream|MultiStream|Server] [--accuracy] [--quantized]: run the harness inside the docker container. Performance or Accuracy results will be printed in console.

Details

SUT implementations are in tf_SUT.py, tf_estimator_SUT.py and pytorch_SUT.py. QSL implementation is in squad_QSL.py.
The script accuracy-squad.py parses LoadGen accuracy log, post-processes it, and computes the accuracy.
Tokenization and detokenization (post-processing) are not included in the timed path.
The inputs to the SUT are input_ids, input_make, and segment_ids. The output from SUT is start_logits and end_logits concatenated together.
max_seq_length is 384.
The script [tf_freeze_bert.py] freezes the TensorFlow model into pb file.
The script [bert_tf_to_pytorch.py] converts the TensorFlow model into the PyTorch BertForQuestionAnswering module in HuggingFace Transformers and also exports the model to ONNX format.

License

Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

bert

bert

README.md

MLPerf Inference Benchmarks for Natural Language Processing

Prerequisites

Supported Models

Disclaimer

Commands

Details

License

Files

bert

Directory actions

More options

Directory actions

More options

Latest commit

History

bert

Folders and files

parent directory

README.md

MLPerf Inference Benchmarks for Natural Language Processing

Prerequisites

Supported Models

Disclaimer

Commands

Details

License