Serialize a float (as opposed to a double) when that yields the same value #6

cabo · 2013-02-24T22:05:01Z

msgpack-ruby always serializes floating point values as a double (0xcb).
When serializing the value as a float (0xca) yields the same result, this should be the preferred serialization.
The performance impact is one additional (shortening) assignment and one compare per serialization of a floating point value, probably balanced at least in part by writing fewer bytes in many cases.

…e same value

cabo · 2013-02-25T09:44:53Z

I don't like speculating about performance, so here are some actual measurements (Ruby 1.9.3p385):

    alma:msgpack-ruby cabo$ r9 -Ilib ~/tmp/msgpack-float-bench.rb  # with this patch
           user     system      total        real
    1.0 size: 5000005
    1.0 39.130000   2.370000  41.500000 ( 41.544960)
    1.1 size: 9000005
    1.1 64.770000   5.330000  70.100000 ( 70.481468)
    alma:msgpack-ruby cabo$ r9 ~/tmp/msgpack-float-bench.rb  # stock 0.5.3
           user     system      total        real
    1.0 size: 9000005
    1.0 58.630000   4.930000  63.560000 ( 63.648409)
    1.1 size: 9000005
    1.1 57.510000   4.810000  62.320000 ( 62.380963)
    alma:msgpack-ruby cabo$

So on my laptop it takes about 58 ns to serialize a Float into a double in 0.5.3, 65 ns with the new check indicating that a double is required, and 39 ns with the new check indicating that a float is all that is needed. So if 1/4 or more of the Floats fit in a 32-bit float, there is a net CPU gain from this patch. In any case, there is a net size reduction. This may also cause additional CPU gains in other parts of the system.

(Of course, most of this time is likely GC overhead, so take these measurements with a grain of salt. The numbers are also slightly different on Ruby 2.0 with its unboxed Flonums.)

cabo · 2013-02-25T10:17:11Z

I was too curious about Ruby 2.0, so here are the numbers:

    alma:msgpack-ruby cabo$ ruby -Ilib ~/tmp/msgpack-float-bench.rb
           user     system      total        real
    1.0 size: 5000005
    1.0 32.730000   2.630000  35.360000 ( 35.617198)
    1.1 size: 9000005
    1.1 50.070000   5.360000  55.430000 ( 55.560188)
    alma:msgpack-ruby cabo$ ruby ~/tmp/msgpack-float-bench.rb # stock 0.5.3
           user     system      total        real
    1.0 size: 9000005
    1.0 46.320000   4.790000  51.110000 ( 51.172017)
    1.1 size: 9000005
    1.1 48.910000   5.090000  54.000000 ( 54.150186)
    alma:msgpack-ruby cabo$

This makes the patch look even better. (And it shows nicely that benchmarking is hard.)

frsyuki · 2013-03-19T18:02:36Z

My intention when I wrote the code was that:

I was not sure the code is portable on x86 (which always uses 80bit on registers), ARM, PowerPC, or other CPUs if compilers did optimization including LTO
behavior of libraries should be expected by users but I thought it's not so easy to understand.
I thought few ruby users use long float arrays

Thus I thought serializing floating point numbers always using the double format is appropriate.

cabo · 2013-03-19T21:08:03Z

The library already chooses the shortest integer representation for integer values.
I would be surprised if anyone were surprised if the library started doing this for floating point values as well.

While my performance test case uses a large array of floating point numbers, this was just to make sure this code doesn't cause a performance regression. As you can see, it actually improves performance in certain cases.
But the case I'm actually interested in is with some very short JSON-like structures. Doubles stick out in these.

Testing the code for more platforms than I tested it on is indeed advisable. The change shouldn't break portability except on platforms where double is an IEEE754 format (which you seem to rely on) and float isn't.
The existing code is not compatible with those ARM implementations using 32-bit doubles.

Just to be sure, I just tested the new code on PowerPC as well.
You already had some minimal test coverage for floating point values. This could be expanded.

Benchmark on PowerPC (Ruby 1.8.7, with 1/10 the iterations used above):

$ ruby -Imsgpack-ruby/lib msgpack-float-bench.rb
user system total real
1.0 size: 5000005
1.0 42.170000 5.440000 47.610000 ( 59.801824)
1.1 size: 9000005
1.1 59.250000 9.580000 68.830000 ( 87.395680)

frsyuki · 2013-08-17T08:57:53Z

I'm not convinced that the code is safe on all platform. There're concerns that it doesn't work on some platforms and causes difficulty to debug.

Write a FLOAT as a float (as opposed to a double) when that yields th…

b469013

…e same value

frsyuki closed this Aug 17, 2013

ojundt mentioned this pull request Dec 20, 2015

Possibility to write 32-bit single precision floats to MessagePack #101

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serialize a float (as opposed to a double) when that yields the same value #6

Serialize a float (as opposed to a double) when that yields the same value #6

Uh oh!

cabo commented Feb 24, 2013

Uh oh!

cabo commented Feb 25, 2013

Uh oh!

cabo commented Feb 25, 2013

Uh oh!

frsyuki commented Mar 19, 2013

Uh oh!

cabo commented Mar 19, 2013

Uh oh!

frsyuki commented Aug 17, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Serialize a float (as opposed to a double) when that yields the same value #6

Serialize a float (as opposed to a double) when that yields the same value #6

Uh oh!

Conversation

cabo commented Feb 24, 2013

Uh oh!

cabo commented Feb 25, 2013

Uh oh!

cabo commented Feb 25, 2013

Uh oh!

frsyuki commented Mar 19, 2013

Uh oh!

cabo commented Mar 19, 2013

Uh oh!

frsyuki commented Aug 17, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants