Skip to content

Conversation

@ethercrow
Copy link
Contributor

Instead of introducing a specialized encodeASCII (#299) I managed to accelerate encodeUtf8 for ASCII input without slowing it down for non-ASCII.

Before:

benchmarking EncodeUtf8/Text (non-ASCII)
time                 2.312 ms   (2.305 ms .. 2.317 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.340 ms   (2.331 ms .. 2.352 ms)
std dev              33.32 μs   (25.96 μs .. 40.96 μs)

benchmarking EncodeUtf8/Text (ASCII)
time                 295.6 μs   (294.6 μs .. 296.3 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 296.8 μs   (295.7 μs .. 298.8 μs)
std dev              4.785 μs   (2.933 μs .. 7.602 μs)

After:

benchmarking EncodeUtf8/Text (non-ASCII)
time                 2.321 ms   (2.308 ms .. 2.343 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 2.353 ms   (2.340 ms .. 2.381 ms)
std dev              66.32 μs   (42.43 μs .. 109.9 μs)
variance introduced by outliers: 14% (moderately inflated)

benchmarking EncodeUtf8/Text (ASCII)
time                 136.8 μs   (136.1 μs .. 137.4 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 137.0 μs   (136.6 μs .. 137.7 μs)
std dev              1.771 μs   (1.307 μs .. 2.713 μs)

@ethercrow ethercrow mentioned this pull request Oct 7, 2020
@phadej
Copy link
Contributor

phadej commented Oct 15, 2020

There are three SSE2 prs. Can they be combined into one? I'm confused, and don't have time to cross check how they are different.

@ethercrow
Copy link
Contributor Author

They optimize three different functions individually. Sure, I can open a combined one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants