Crate fearless_simd

Expand description

A helper library to make SIMD more friendly.

Fearless SIMD exposes safe SIMD with ergonomic multi-versioning in Rust.

Fearless SIMD uses “marker values” which serve as proofs of which target features are available on the current CPU. These each implement the Simd trait, which exposes a core set of SIMD operations which are implemented as efficiently as possible on each target platform.

Additionally, there are types for packed vectors of a specific width and element type (such as f32x4). Fearless SIMD does not currently support vectors of less than 128 bits. These vector types implement some standard arithmetic traits (i.e. they can be added together using +, multiplied by a scalar using *, among others), which are implemented as efficiently as possible using SIMD instructions. These can be created in a SIMD context using the SimdFrom trait, or the from_slice associated function.

To call a function with the best available target features and get the associated Simd implementation, use the dispatch!() macro:

use fearless_simd::{Level, Simd, dispatch};

#[inline(always)]
fn sigmoid<S: Simd>(simd: S, x: &[f32], out: &mut [f32]) { /* ... */ }

// The stored level, which you should only construct once in your application.
let level = Level::new();

dispatch!(level, simd => sigmoid(simd, &[/*...*/], &mut [/*...*/]));

A few things to note:

sigmoid is generic over any Simd type.
The dispatch macro is used to invoke the given function with the target features associated with the supplied Level.
The function or closure passed to dispatch!() should be #[inline(always)]. The performance of the SIMD implementation may be poor if that isn’t the case. See the section on inlining for details

The first parameter to dispatch!() is the Level. If you are writing an application, you should create this once (using Level::new), and pass it to any function which wants to use SIMD. This type stores which instruction sets are available for the current process, which is used in the macro to dispatch to the most optimal variant of the supplied function for this process.

§Inlining

Fearless SIMD relies heavily on Rust’s inlining support to create functions which have the given target features enabled. As such, most functions which you write when using Fearless SIMD should have the #[inline(always)] attribute.

There is a rule of thumb for how to achieve things in Fearless SIMD:

All SIMD functions need #[inline(always)].
Use dispatch! when calling SIMD code from non-SIMD code.
Use vectorize() when calling SIMD from SIMD if you don’t want to force inlining.

We currently don’t have docs explaining why this is the case. You can read this Zulip conversation for some train of thought explanation.

§WebAssembly

WASM SIMD doesn’t have feature detection, and so you need to compile two versions of your bundle for WASM, one with SIMD and one without, then select the appropriate one for your user’s browser. This can be done via the wasm-feature-detect library.

You can compile WebAssembly with the SIMD128 feature enabled via the RUSTFLAGS environment variable (RUSTFLAGS="-Ctarget-feature=+simd128"), or by adding the compiler flags in your Cargo config.toml:

[target.'cfg(target_arch = "wasm32")']
rustflags = ["-Ctarget-feature=+simd128"]
rustdocflags = ["-Ctarget-feature=+simd128"]

If you want to compile both SIMD and non-SIMD versions of your WebAssembly library, your best option right now is to create a shell script that builds it once with the RUSTFLAGS specified, and once without. Cargo currently does not allow specifying compiler flags per-profile.

§Relaxed SIMD

Fearless SIMD can make use of the relaxed SIMD WebAssembly instructions, if the requisite target feature is enabled. These instructions can return implementation-dependent results depending on what is fastest on the underlying hardware. They are only used for operations where we already give hardware-dependent results.

At the time of writing, relaxed SIMD is only supported in Chrome. To make use of it, you’ll need to build two versions of your library, one with relaxed SIMD enabled (RUSTFLAGS="-Ctarget-feature=+simd128,+relaxed-simd") and one with it disabled, and then feature-detect at runtime.

§Credits

This crate was inspired by pulp, std::simd, among others in the Rust ecosystem, though makes many decisions differently. It benefited from conversations with Luca Versari, though he is not responsible for any of the mistakes or bad decisions.

§Feature Flags

The following crate feature flags are available:

std (enabled by default): Get floating point functions from the standard library (likely using your target’s libc). Also allows using Level::new on all platforms, to detect which target features are enabled.
libm: Use floating point implementations from libm.
safe_wrappers: Include safe wrappers for (some) target feature specific intrinsics, beyond the basic SIMD operations abstracted on all platforms.
force_support_fallback: Force scalar fallback, to be supported, even if your compilation target has a better baseline.

At least one of std and libm is required; std overrides libm.

Modules§

core_arch: Access to architecture-specific intrinsics.
prelude: This prelude module re-exports every SIMD trait defined in this library. It’s useful for accessing trait methods.
x86x86 or x86-64: Implementations of Simd on x86 architectures (both 32 and 64 bit).

Macros§

dispatch: Access the applicable Simd for a given level, and perform an operation using it.

Structs§

Avx2x86 or x86-64: The SIMD token for the “AVX2” and “FMA” level.
Fallback: The SIMD token for the “fallback” level.
Sse4_2x86 or x86-64: The SIMD token for the “SSE4.2” level.
f32x4: A SIMD vector of 4 f32 elements.
f32x8: A SIMD vector of 8 f32 elements.
f32x16: A SIMD vector of 16 f32 elements.
f64x2: A SIMD vector of 2 f64 elements.
f64x4: A SIMD vector of 4 f64 elements.
f64x8: A SIMD vector of 8 f64 elements.
i8x16: A SIMD vector of 16 i8 elements.
i8x32: A SIMD vector of 32 i8 elements.
i8x64: A SIMD vector of 64 i8 elements.
i16x8: A SIMD vector of 8 i16 elements.
i16x16: A SIMD vector of 16 i16 elements.
i16x32: A SIMD vector of 32 i16 elements.
i32x4: A SIMD vector of 4 i32 elements.
i32x8: A SIMD vector of 8 i32 elements.
i32x16: A SIMD vector of 16 i32 elements.
mask8x16: A SIMD mask of 16 8-bit elements.
mask8x32: A SIMD mask of 32 8-bit elements.
mask8x64: A SIMD mask of 64 8-bit elements.
mask16x8: A SIMD mask of 8 16-bit elements.
mask16x16: A SIMD mask of 16 16-bit elements.
mask16x32: A SIMD mask of 32 16-bit elements.
mask32x4: A SIMD mask of 4 32-bit elements.
mask32x8: A SIMD mask of 8 32-bit elements.
mask32x16: A SIMD mask of 16 32-bit elements.
mask64x2: A SIMD mask of 2 64-bit elements.
mask64x4: A SIMD mask of 4 64-bit elements.
mask64x8: A SIMD mask of 8 64-bit elements.
u8x16: A SIMD vector of 16 u8 elements.
u8x32: A SIMD vector of 32 u8 elements.
u8x64: A SIMD vector of 64 u8 elements.
u16x8: A SIMD vector of 8 u16 elements.
u16x16: A SIMD vector of 16 u16 elements.
u16x32: A SIMD vector of 32 u16 elements.
u32x4: A SIMD vector of 4 u32 elements.
u32x8: A SIMD vector of 8 u32 elements.
u32x16: A SIMD vector of 16 u32 elements.

Enums§

Level: The level enum with the specific SIMD capabilities available.

Traits§

Bytes: Conversion of SIMD types to and from raw bytes.
Select: Element-wise selection between two SIMD vectors using self.
Simd: The main SIMD trait, implemented by all SIMD token types.
SimdBase: Base functionality implemented by all SIMD vectors.
SimdCombine: Concatenation of two SIMD vectors.
SimdCvtFloat: Construction of floating point vectors from integers
SimdCvtTruncate: Construction of integer vectors from floats by truncation
SimdElement: Types that can be used as elements in SIMD vectors.
SimdFloat: Functionality implemented by floating-point SIMD vectors.
SimdFrom: Value conversion, adding a SIMD blessing.
SimdInt: Functionality implemented by (signed and unsigned) integer SIMD vectors.
SimdInto: Value conversion, adding a SIMD blessing.
SimdMask: Functionality implemented by SIMD masks.
SimdSplit: Splitting of one SIMD vector into two.
WithSimd