-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement C/C++ PairwiseANN w/ Python API #268
Conversation
pecos/ann/pairwise/model.py
Outdated
def train(cls, X, Y, train_params=None, pred_params=None): | ||
"""train and return the PairwiseANN indexer | ||
Args: | ||
X (numpy.array or smat.csr_matrix): database matrix to be indexed. (num_item x feat_dim). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing Y
in docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. See Revision 2.
pecos/ann/pairwise/model.py
Outdated
"""constructor of PairwiseANN class | ||
Args: | ||
model_ptr (c_void_p): pointer to C instance pecos::ann:PairwiseANN. | ||
num_item (int): number of item being indexed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. See Revision 2.
pecos/core/ann/pairwise.hpp
Outdated
|
||
struct Searcher { | ||
typedef PairwiseANN<feat_vec_t, mat_t> pairwise_ann_t; | ||
typedef pecos::ann::heap_t<pair_t, std::less<pair_t>> max_heap_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to define this again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. See Revision 2.
pecos/core/ann/pairwise.hpp
Outdated
typedef pecos::ann::KeyValPair<index_type, value_type> pair_t; | ||
typedef pecos::ann::heap_t<pair_t, std::less<pair_t>> max_heap_t; | ||
|
||
struct Searcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. See Revision 2.
pecos/core/ann/pairwise.hpp
Outdated
} | ||
} | ||
|
||
void train(const mat_t &X_trn, const pecos::csr_t &Y_trn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just assume Y_trn
to be CSC here? We can support CSR/CSC at python level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. See Revision 2.
pecos/core/libpecos.cpp
Outdated
uint32_t* ret_Mmat, \ | ||
float* ret_Dmat, \ | ||
float* ret_Vmat, \ | ||
const bool is_sample_input) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be is_same_input
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. See Revision 2.
""" | ||
return copy.deepcopy(self.pred_params) | ||
|
||
def predict(self, input_feat, label_keys, searchers, pred_params=None, is_same_input=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a more extendable design for the predict API. Now we are basically supporting two cases (1) is_samp_input=True
, a csr_vector and (2) is_same_input=False
, a coo matrix.
bdab871
to
6a3feba
Compare
Issue #, if available:
Description of changes:
Implement C/C++ PairwiseANN function with Python API.
Given a (input, label) pair, PairwiseANN finds top-K nearest queries in the indexed input-to-label graph.
The searcher return four arrays of size
K
:Imat
: top-K input indicesMmat
: the masking indicators. 1 indicates the kNN index is presented; otherwise 0.Dmat
: the kNN distances between the test input and top-K indexed input.Vmat
: the corresponding values stored in the input-to-label graph.Indexing usage
where
X_trn
is the dense or sparse input feature matrixY_csr
is the sparse input-to-label graphSave/load Usage
Searchers Usage
where
max_batch_size
defines the maximum number of rows in the returned arraymax_only_topk
defines the maximum number of cols in the returned arrayPrediction Usage
With same input (suitable for real-time inference):
query_vec
is a single feature vector of shape(1, feat_dim)
label_keys
is a np.array of shape(batch_size, )
With different inputs (suitable for batch prediction):
Query_mat
is a batch feature matrix of shape(batch_size, feat_dim)
label_keys
is a np.array of shape(batch_size, )
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.