Skip to content

Issues: huggingface/tokenizers

Training a model from in-memory data
#198 by loicbarrault was closed Nov 28, 2020
Closed 1
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Bert: Slow vs fast decoding inconsistency
#1723 opened Jan 17, 2025 by pcuenca updated Jan 17, 2025
Decode regression decoding performance
#1564 opened Jul 10, 2024 by daulet updated Jan 16, 2025
Suggestion: speed improvement - hashmap implementation
#1722 opened Jan 15, 2025 by royweiss1 updated Jan 15, 2025
batch_encode_plus doesn't work correctly
#1704 opened Dec 18, 2024 by tempdeltavalue updated Jan 11, 2025
DecodeStream raise error
#1705 opened Dec 26, 2024 by irexyc updated Jan 10, 2025
Cannot find package 'tokenizers-linux-x64-musl' - Alpine support
#1703 opened Dec 14, 2024 by PylotLight updated Jan 10, 2025
Rust Issue on Unix in 0.21.0 Version
#1694 opened Nov 28, 2024 by insculptor updated Jan 9, 2025
out of memory when training a BBPE tokenizer on a large corpus
#1681 opened Nov 14, 2024 by yucc-leon updated Dec 31, 2024
NormalizedString.clear() broken? bug Something isn't working
#1636 opened Sep 25, 2024 by lkurlandski updated Nov 30, 2024
Bug: is_pretokenized is not used when calling tokenizer.encode(...)
#1695 opened Nov 29, 2024 by jannessm updated Nov 29, 2024
Prebuilding tokenizers for Windows arm
#1684 opened Nov 20, 2024 by hariji814 updated Nov 28, 2024
wikitext-103-raw-v1.zip is not available on the amazonaws anymore
#1683 opened Nov 18, 2024 by gec1-dev updated Nov 18, 2024
Mismatch between slow and fast tokenizer
#1682 opened Nov 15, 2024 by KaiLv69 updated Nov 18, 2024
Reduce vocab size for BPE tokenizer Feature Request
#1668 opened Oct 29, 2024 by fzyzcjy updated Oct 30, 2024
Inconsistent behaviour of PreTrainedTokenizerFasts on diacritics marked texts bug Something isn't working
#1663 opened Oct 11, 2024 by sven-nm updated Oct 22, 2024
2 of 4 tasks
docs-check.yml uses node12 which is deprecated
#1658 opened Oct 17, 2024 by hamirmahal updated Oct 17, 2024
ProTip! no:milestone will show everything without a milestone.