#tokenize #bert #port-of-google #word-piece #original

bert_tokenizer

This crate is a Rust port of Google's BERT WordPiece tokenizer

1 unstable release

0.1.3 Feb 15, 2023
0.1.2 Feb 14, 2023
0.1.1 Feb 14, 2023
0.1.0 Feb 14, 2023

#9 in #word-piece

37 downloads per month
Used in bert_create_pretraining

Custom license

225KB
373 lines

This crate is a Rust port of Google's BERT GoogleBERT WordPiece tokenizer.


bert_tokenizer

[ API doc | crates.io ]

The crate provides the port of the original BERT tokenizer from the Google BERT repository.

License

MIT license. See LICENSE file for full license.

Dependencies

~2.5MB
~48K SLoC