-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure metrics work with column-wise distributed training #9020
Conversation
@trivialfis I'll follow up with some refactoring to make the logic for row vs. column split, and horizontal vs vertical federated learning more encapsulated. |
|
||
namespace xgboost { | ||
namespace metric { | ||
|
||
TEST(Metric, DeclareUnifiedTest(BinaryAUC)) { | ||
auto ctx = xgboost::CreateEmptyGenericParam(GPUIDX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to move all these code to headers? Maybe copying the function declaration is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is since we are including the .cc
file in .cu
, we'd get duplicate symbols if we put the implementation in the .cc
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are declaring all the test bodies in headers, I think we can remove the cc file include trick right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TEST
s are still defined in the .cc
file, so if you don't want to do the include trick, then we'd have to redefine all the TEST
s in the .cu
file as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will probably do that in the future. Let's keep the trick for now. Just out of curiosity, do we really need to modify every test for this? Seems it takes quite a lot of work and deep understanding to keep the tests up-to-date. Can we have something similar to the objective tests that just compare the results from single-worker and multi-worker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would testing that property be easier than the current tests? My thought is we can use something similar to the objective tests for federated learning instead of adding new parameters to the existing metric tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The objective tests are at a higher level, we only run through each objective once to obtain the result, so it's simpler to compare it with the result from column split. The metric tests are lower level, we are calculating the metric result repeatedly under different conditions, so it's harder to compare the results. I suppose we could refactor the tests more to make it easier to do comparisons, but with the current setup, this seems like the simplest approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the many questions, can we make it high-level? What's the downside of doing so
for m : all_metrics:
local_data = random()
column_split_data = copy(local_data)
column_split_data.info.column_split = true
expected_result = m.evaluate(local_data)
// alternatively, `for world_size : {1, 3} {}`, and assert results are the same.
column_split_result = run_with_communicator(world_size=3, []() { return m.evaluate(column_split_data) })
assert_equal(expected_result, column_split_result)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue is for each metric, we are testing many variations of inputs (each GetMetricEval
call is different test case). In order to compare single node vs column split, we'd have to either:
- Only test a single input per metric, a happy path, thus missing a lot of corner cases.
- Or still test all the inputs, which means repeating all the tests, introducing a lot of duplication.
Since we've already calculated the expected metric outputs, there is really no need to compare column-split results with single-node ones.
This is slightly different from the objective tests, where the outputs depend on the randomly generated data. For metric tests we are manually specifying predictions and labels, so the outputs are fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. For future references, it would be great if we can hide or make it more automatic (like having a metric test helper that can generate different test cases). Personally, I would like to keep simple things like metric to be beginner friendly.
By beginner, I mean a person who knows only the mathematic equation of the metric and nothing else.
This is mostly moving things around so that we can test the metrics under distributed training, for both CPU and GPU, row split and column split.