Skip to content

UTF-16BE encoding is detected instead of ISO-8859-1 #194

@ilyazub

Description

@ilyazub

ISO-8859-1 encoding is not detected for the following case:

content = "R\xE9sum\xE9"
detection = CharlockHolmes::EncodingDetector.detect content
# => {:type=>:text, :encoding=>"UTF-16BE", :ruby_encoding=>"UTF-16BE", :confidence=>10}
# Expected – ISO-8859-1

utf8_encoded_content = CharlockHolmes::Converter.convert content, detection[:encoding], 'UTF-8'
# => "勩獵淩"

# Expected
content.encode(Encoding::UTF_8, Encoding::ISO_8859_1)
# => "Résumé"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions