-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To handle non-actionable steps in sklearn #866
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #866 +/- ##
===========================================
+ Coverage 88.38% 90.62% +2.23%
===========================================
Files 37 37
Lines 4298 6163 +1865
===========================================
+ Hits 3799 5585 +1786
- Misses 499 578 +79
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your changes all make sense, but require some explanation in the code (and tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add some error handling for earlier scikit-learn versions? Right now the tests fail due to a hard-to-interpret error message.
@mfeurer one of the old unit tests that I had to change Should I then try to change the code to see how this can be handled or skip that particular unit test? |
tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py
Outdated
Show resolved
Hide resolved
You can skip that unit test. |
Reference Issue
Addresses #825 and #480.
What does this PR implement/fix? Explain your changes.
The serialization expects each step to be a sklearn module and eventually an OpenMLFlow or a None. For ColumnTransform operations, a drop or passthrough can be specified to include a step in a pipeline but not execute it. The changes in this PR incorporates this additional string handling of the keywords drop and passthrough in a ColumnTransformer call.
How should this PR be tested?
Any other comments?
This PR handles the specific case of a ColumnTransform. I'm not sure if there are other such keywords which should be handled equivalently. We can then maintain a keyword listing, based on the sklearn version under extension.py and handle them with a similar logic.
If there are more diverse cases that may arise, some examples would help in designing a generalized solution in that case.
Would need to accordingly design relevant unit tests too.