[stdlib] Rewrite UTF8._isValidUTF8()#1477
Conversation
|
@swift-ci Please test |
|
@PatrickPijnappel There's a build failure, would you mind taking a look? |
|
@gribozavr My bad! Will take a look and resolve. |
| // Require 10xx xxxx 110x xxxx. | ||
| if buffer & 0xc0e0 != 0x80c0 { return false } | ||
| // Disallow xxxx xxxx xxx0 000x (<= 7 bits case). | ||
| if buffer & 0x001e == 0x0000 { return false } |
There was a problem hiding this comment.
Sorry, I don't understand this case. I think you meant to test against 0x1f00 instead of 0x001e.
There was a problem hiding this comment.
The bytes come in reverse order. Never mind.
|
@PatrickPijnappel This is brilliant! Please fix the build issue, and I'll run the benchmarks. |
|
@PatrickPijnappel Great stuff! 👍 |
This is as a replacement for usages of UTF8._numTrailingBytes(). Note that the sanityCheck was redundant at both call sites.
The checks are technically different (previous check only rejected malformed initial code units, not all malformed sequences). Which is more correct is debatable, but since _buffer is only filled by transcoding from UTF-16 it should always be well-formed anyway and the difference is not very relevant.
Replaces the tests for the removed _numTrailingBytes()
|
@gribozavr OK fixed the issues and added some validation tests as well! |
|
@swift-ci Please test |
stdlib/public/core/Unicode.swift
Outdated
| } | ||
| } | ||
| return true | ||
| public static func _isValidUTF8(buffer: UInt32) -> Bool { |
There was a problem hiding this comment.
Please use public // @testable (like we do in other places), for documentation purposes, and to make it easy to fix up the code when @testable works for the standard library.
|
Added |
|
@swift-ci Please test |
|
Running benchmarks. |
|
I'm seeing >10% improvements for ErrorHandling, NSError, NSStringConversion, and SevenBoom. @PatrickPijnappel If you have a targeted microbenchmark, feel free to contribute it to a new file under |
[stdlib] Rewrite UTF8._isValidUTF8()
|
@gribozavr Added a UTF-8 benchmark (#1493). I'm in the process of simplifying/optimizing the other parts of UTF-8 decoding so it'll be useful to have a benchmark! |
What's in this pull request?
A rewrite of
UTF8._isValidUTF8(), which further improves performance (mainly by removing branches) and reduces code size.Tested against original implementation for all input values (
0...0xffffffff), results are identical.Before merging this pull request to apple/swift repository:
Triggering Swift CI
The swift-ci is triggered by writing a comment on this PR addressed to the github user @swift-ci. Different tests will run depending on the specific comment that you use. The currently available comments are:
Smoke Testing
Validation Testing
Note: Only members of the Apple organization can trigger swift-ci.