Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accelerate clustering with sparse-dense vector and parallel sorting #183

Merged
merged 1 commit into from
Nov 14, 2022

Conversation

yaushian
Copy link
Contributor

Issue #, if available:

Description of changes: Current PECOS hierarchical clustering implementation stores clustering centers as dense vectors, which leads to O(2^(L-1)*d ) cost for layer L, where d is the total feature dimension and can be millions or more. In extremely large datasets, at a bottom layer, L is large while the center vectors are sparse, which makes operations on dense vectors inefficient. Empirically, we find this makes bottom layers more than 10x slower than top layers on large datasets. We apply sparse-dense vectors (sdvec) as an alternative to store centers whose time complexity is O(2^L * p), where p is the averaged number of on-zero elements of sparse center vectors, and is significantly smaller than d at bottom layers.

With sdvec, the computational bottleneck switches from bottom layers to the top layer since the top layer performs sorting on the whole dataset. We further accelerate clustering training via parallel sorting.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@OctoberChang OctoberChang changed the title sparse-dense vector and parallel sorting for acceleration accelerate clustering with sparse-dense vector and parallel sorting Nov 11, 2022
@yaushian yaushian force-pushed the clustering_acceleration branch 2 times, most recently from 7e9183d to b1fe2c8 Compare November 14, 2022 20:20
std::vector<f32_dvec_t> center1; // need to be duplicated to handle parallel clustering
std::vector<f32_dvec_t> center2; // for spherical kmeans
std::vector<f32_sdvec_t> center1; // need to be duplicated to handle parallel clustering
std::vector<f32_sdvec_t> center2; // for spherical kmeans
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra whitespace

}

for(int thread_id = 0; thread_id < threads; thread_id++) {
do_axpy(1.0,center_tmp_thread[thread_id],cur_center);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add white space between arguments, e.g., do_axpy(1.0, XXX, YYY).

@@ -83,7 +141,7 @@ namespace pecos {
index_type* idx;
value_type* val;
sparse_vec_t(index_type nnz=0, index_type* idx=NULL, value_type* val=NULL): nnz(nnz), idx(idx), val(val) {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra white space?

float32_t do_dot_product(const sparse_vec_t<IX, VX>& x, const sdvec_t<IY, VY>& y) {
float32_t ret = 0;
for(size_t s=0; s < x.nnz; s++) {
auto idx = x.idx[s];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other similar use cases (do_axpy), you are using auto &. Why such disparity?

return do_dot_product(y, x);
}
float32_t ret = 0;
for(size_t s=0; s < x.nr_touch; s++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using the same style of for loop for (size_t s = 0; s < ...; s++) in all the functions for consistency.

@OctoberChang OctoberChang merged commit e4f61bf into amzn:mainline Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants