Skip to content

word.ts extendedWordChars seems to not capture everything in the comment above it #634

@hchiam

Description

@hchiam

in

https://github.com/kpdecker/jsdiff/blob/master/src/diff/word.ts#L5-L23

it currently is

const extendedWordChars = 'a-zA-Z0-9_\\u{C0}-\\u{FF}\\u{D8}-\\u{F6}\\u{F8}-\\u{2C6}\\u{2C8}-\\u{2D7}\\u{2DE}-\\u{2FF}\\u{1E00}-\\u{1EFF}';

unless i'm misunderstanding something (e.g. exceptions mentioned in the comment, or how the extendedWordChars variable is actually interpreted by JS syntax) it seems to be missing 0080-00BF when i break down the unicode part of the string and compare it to the comment:

\u{C0}-\u{FF}
\u{D8}-\u{F6}
\u{F8}-\u{2C6}
\u{2C8}-\u{2D7}
\u{2DE}-\u{2FF}
\u{1E00}-\u{1EFF}

extendedWordChars comment
SEEMS TO BE MISSING 0080-00BF? // Latin-1 Supplement, 0080–00FF
\u{C0}-\u{FF} // Latin-1 Supplement, 0080–00FF
SEEMS TO NOT BE EXCLUDED PROPERLY UNLESS THIS WAS INTENTIONAL? // - U+00D7 × Multiplication sign
\u{D8}-\u{F6} seems redundant since covered by other range \u{C0}-\u{FF}
SEEMS TO NOT BE EXCLUDED PROPERLY UNLESS THIS WAS INTENTIONAL? // - U+00F7 ÷ Division sign
\u{F8}-\u{2C6} covered by other ranges 0080–00FF and 0100–017F and 0180–024F and 0250–02AF and 02B0–02FF
included in other ranges // Latin Extended-A, 0100–017F
included in other ranges // Latin Extended-B, 0180–024F
included in other ranges // IPA Extensions, 0250–02AF
\u{2C8}-\u{2D7} // Spacing Modifier Letters, 02B0–02FF covered by other ranges and exclusions combined
intentionally excluded // - U+02C7 ˇ ˇ Caron
intentionally excluded // - U+02D8 ˘ ˘ Breve
intentionally excluded // - U+02D9 ˙ ˙ Dot Above
intentionally excluded // - U+02DA ˚ ˚ Ring Above
intentionally excluded // - U+02DB ˛ ˛ Ogonek
intentionally excluded // - U+02DC ˜ ˜ Small Tilde
intentionally excluded // - U+02DD ˝ ˝ Double Acute Accent
\u{2DE}-\u{2FF} // Spacing Modifier Letters, 02B0–02FF
\u{1E00}-\u{1EFF} // Latin Extended Additional, 1E00–1EFF

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions