Skip to content

Commit

Permalink
add note of caution on parity
Browse files Browse the repository at this point in the history
  • Loading branch information
karpathy committed Feb 16, 2024
1 parent aa2bfb9 commit 1c0520f
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ mergeable_ranks = enc._mergeable_ranks

And use them to construct a `RegexTokenizer` that will exactly reproduce the tokenization of GPT4. Run and step through the file [test_gpt4.py](test_gpt4.py) for details.

Note that the parity is not complete because we do not handle special tokens.

## License

MIT

0 comments on commit 1c0520f

Please sign in to comment.