-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce on cuda #274
Reduce on cuda #274
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have minor comments about tests, otherwise LGTM
crates/cubecl-cpp/src/shared/base.rs
Outdated
fn bfloat162_type_name(f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result; | ||
// warp instructions (all threads participating) | ||
fn warp_shuffle(input: &CppVariable<Self>, id: &CppVariable<Self>) -> String; | ||
fn warp_shuffle_indexed( | ||
input: &CppVariable<Self>, | ||
index: usize, | ||
id: &CppVariable<Self>, | ||
) -> String; | ||
fn warp_shuffle_xor(out: &CppVariable<Self>) -> String; | ||
fn warp_shuffle_xor_indexed(out: &CppVariable<Self>, index: usize) -> String; | ||
fn warp_shuffle_down(out: &CppVariable<Self>) -> String; | ||
fn warp_shuffle_down_indexed(out: &CppVariable<Self>, index: usize) -> String; | ||
fn warp_all(out: &CppVariable<Self>) -> String; | ||
fn warp_all_indexed(out: &CppVariable<Self>, index: usize) -> String; | ||
fn warp_any(out: &CppVariable<Self>) -> String; | ||
fn warp_any_indexed(out: &CppVariable<Self>, index: usize) -> String; | ||
// Matrix-Multiple Accumulate | ||
fn mma_namespace() -> &'static str; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging, I want to double check you are ok with the approach used here. An alternative is to merge the indexed and regular methods into a single method with signature
fn do_something(&CppVariable<Self>, index: Option<usize>) -> String;
Or to only support IndexedVariable
fn do_something(&IndexedVariable) -> String;
The initial goal was to support testing the reduction algorithms on CUDA. This lead to some work on the warp operations in the compiler to support vectorization. I also added a bunch of related tests.