Skip to content
forked from ianic/flate

deflate compression algorithm in Zig

Notifications You must be signed in to change notification settings

squeek502/flate

 
 

Repository files navigation

Zig deflate compression/decompression implementation. It supports compression and decompression of gzip, zlib and raw deflate format. Andrew pushed for the implementation from the first principles so I give it a try.

Zig's implementation is ported from Go by huge effort of hdorio. Go implementation mainly follows original zlib implementation.

Here I used all those three as reference, but mostly started from scratch. Inflate (decompression) and deflate tokenization are implemented from the first principles. For deflate block writer I started from current std code.

Benchmark

Comparing this implementation with the one we currently have in Zig's standard library (std).
Std is roughly 1.5 times slower in decompression, and 1.17 times slower in compression. Compressed sizes are pretty much same in both cases.

Benchmark are done on aarch64/Linux (Apple M1 cpu):

Compression

Compression examples are using Zig repository tar ~177M file.

Compression time in comparison with std (current Zig standard library implementation):

level time [ms] std [ms] time/std
0 472.83 543.02 1.15
4 1019.52 1222.02 1.2
5 1401.85 1673.52 1.19
6 - default 1994.97 2325.23 1.17
7 2505.55 3141.31 1.25
8 4491.72 5118.71 1.14
9 6713.99 8243.63 1.23

Compressed size in comparison with std:

level size std size diff size/std
0 108398793 108397986 -807 1.0000
4 26610575 26557083 -53492 0.9980
5 25231037 25212703 -18334 0.9993
6 - default 24716324 24716123 -201 1.0000
7 24572126 24562137 -9989 0.9996
8 24419542 24425085 5543 1.0002
9 24370948 24389533 18585 1.0008

Decompression

Decompression time for few different files in comparison with std:

file size time [ms] std [ms] time/std
ziglang.tar.gz 177244160 353.34 519.44 1.47
war_and_peace.txt.gz 3359630 13.55 21.36 1.58
large.tar.gz 11162624 37.93 57.53 1.52
cantrbry.tar.gz 2821120 9.08 14.30 1.57

URLs from which tests files are obtained can be found here.

Note

I was also comparing with gzip/gunzip system tools and the results are pretty much similar.

To compare with gzip/gunzip:

zig build -Doptimize=ReleaseSafe
export FILE=tmp/ziglang.tar.gz
hyperfine -r 5 'zig-out/bin/gunzip $FILE' 'gunzip -kf $FILE'

export FILE=tmp/ziglang.tar
hyperfine -r 5 'zig-out/bin/gzip $FILE' 'gzip -kf $FILE'

Running same benchmarks on x86_64 GNU/Linux (Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz):

level time [ms] std [ms] time/std
0 602.97 740.39 1.23
4 1434.57 1837.89 1.28
5 1875.17 2454.04 1.31
6 2581.49 3353.04 1.3
7 3211.47 4749.88 1.48
8 6444.44 7359.56 1.14
9 8175.93 11447.38 1.4
file size time [ms] std [ms] time/std
ziglang.tar.gz 177244160 567.85 858.50 1.51
war_and_peace.txt.gz 3359630 25.96 36.04 1.39
large.tar.gz 11162624 65.68 99.80 1.52
cantrbry.tar.gz 2821120 17.69 26.85 1.52

Memory usage

This library uses static allocations for all structures, doesn't require allocator. That makes sense especially for deflate where all structures, internal buffers are allocated to the full size. Little less for inflate where we std version uses less memory by not preallocating to theoretical max size array which are usually not fully used.

For deflate this library allocates 395K while std 779K.
For inflate this library allocates 195K while std usually around 36K.

Inflate difference is because we here use 64K history instead of 32K, and main reason because in huffman_decoder.zig we use lookup of 32K places for 286 or 30 symbols!

Interface

Currently in std lib we have different wording for the same things. Type is called Decompressor, Decompress, DecompressStream (deflate, gzip, zlib). I'll suggest that we stick with Decompressor/Compressor and decompressor/compressor for initializers of that type. That will free compress/decompress word for one-shot action. Fn compress receives reader and writer, reads all plain data from reader and writes compressed data to the writer. I expect that many places can use this simple implementation.

All gzip/zlib/flate will have same methods, here on the gzip example:

const std = @import("std");
const gzip = std.compress.gzip;
const zlib = std.compress.zlib;
const flate = std.compress.flate;

/// Decompress compressed data from reader and write plain data to the writer.
gzip.decompress(reader: anytype, writer: anytype) !void

/// Create Decompressor which will read compressed data from reader.
gzip.decompressor(reader: anytype) gzip.Decompressor(@TypeOf(reader)) 

/// Decompressor type
gzip.Decompressor(comptime ReaderType: type) type {


/// Compress plain data from reader and write compressed data to the writer.
gzip.compress(reader: anytype, writer: anytype, level: Level) !void 

/// Create Compressor which outputs compressed data to the writer.
gzip.compressor(writer: anytype, level: Level) !gzip.Compressor(@TypeOf(writer)) 

/// Compressor type
gzip.Compressor(comptime WriterType: type) type 

References

Great materials for understanding deflate:

Bill Bird Video series
RFC 1951 - deflate
RFC 1950 - zlib
RFC 1952 - gzip
zlib algorithm explained
Mark Adler on stackoverflow
Faster zlib/DEFLATE
Reading bits with zero refill latency

About

deflate compression algorithm in Zig

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Zig 98.0%
  • Ruby 1.7%
  • Shell 0.3%