Skip to content

Builder.fromString and Builder.singleton don't substitute surrogate code points #280

@minoki

Description

@minoki

Unlike Data.Text.pack, Data.Text.Lazy.Builder.fromString doesn't perform replacement on surrogate code points. Therefore, one can create an invalid Text value with it.

Data.Text.Lazy.Builder.singleton also have the problem.

Example:

λ> import qualified Data.Text as T
λ> import qualified Data.Text.Lazy as TL
λ> import qualified Data.Text.Lazy.Builder as TB
λ> TL.pack "\xD800" -- correctly returns "\xFFFD"
"\65533"
λ> TL.fromChunks [T.pack "\xD800"] -- correctly returns "\xFFFD"
"\65533"
λ> TB.toLazyText (TB.fromString "\xD800") -- the Builder builds a Text containing [0xD800], whichi is invalid
"\9980"
λ> TB.toLazyText (TB.fromString "\xD800") -- a garbage is read when decoding [0xD800], thus a random value is shown
"\9571"
λ> TB.toLazyText (TB.singleton '\xD800') -- the Builder builds a Text containing [0xD800], whichi is invalid
"\10262"
λ> TB.toLazyText (TB.singleton '\xD800')
"\10269"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions