Skip to content

Commit

Permalink
maintain todos
Browse files Browse the repository at this point in the history
  • Loading branch information
karpathy committed Feb 18, 2024
1 parent 7843c96 commit 9ab2500
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,11 @@ to run the tests.

## todos

- write more optimized versions, both in Python and/or C/Rust?
- handle special tokens? think through...
- write a more optimized Python version that could run over large files and big vocabs
- write an even more optimized C or Rust version (think through)
- rename GPT4Tokenizer to GPTTokenizer and support GPT-2 as well?
- write a LlamaTokenizer similar to GPT4Tokenizer (i.e. attempt sentencepiece equivalent)
- handle special tokens
- video coming soon ;)

## License
Expand Down

0 comments on commit 9ab2500

Please sign in to comment.