Skip to content

Commit

Permalink
Merge pull request karpathy#17 from eltociear/patch-1
Browse files Browse the repository at this point in the history
Update regex.py
  • Loading branch information
karpathy authored Feb 19, 2024
2 parents ade909f + a5ecf46 commit aac75c8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion minbpe/regex.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ def encode(self, text, allowed_special="none_raise"):
return self.encode_ordinary(text)
# otherwise, we have to be careful with potential special tokens in text
# we handle special tokens by splitting the text
# based on the occurence of any exact match with any of the special tokens
# based on the occurrence of any exact match with any of the special tokens
# we can use re.split for this. note that surrounding the pattern with ()
# makes it into a capturing group, so the special tokens will be included
special_pattern = "(" + "|".join(re.escape(k) for k in special) + ")"
Expand Down

0 comments on commit aac75c8

Please sign in to comment.