From 9ab2500a6733ce5336b493f2c00055c85de25138 Mon Sep 17 00:00:00 2001 From: Andrej Karpathy Date: Sun, 18 Feb 2024 07:50:54 -0800 Subject: [PATCH] maintain todos --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index df2f431a..ce1d9493 100644 --- a/README.md +++ b/README.md @@ -62,8 +62,11 @@ to run the tests. ## todos -- write more optimized versions, both in Python and/or C/Rust? -- handle special tokens? think through... +- write a more optimized Python version that could run over large files and big vocabs +- write an even more optimized C or Rust version (think through) +- rename GPT4Tokenizer to GPTTokenizer and support GPT-2 as well? +- write a LlamaTokenizer similar to GPT4Tokenizer (i.e. attempt sentencepiece equivalent) +- handle special tokens - video coming soon ;) ## License