You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AutoDist is not able to compute the PSI (SimpleBucketer) when the (train) dataframe does contain missing values.
Removing missing values before running AutoDist (df.dropna()) may have a large impact on the results as it would remove all row containing a single missing value.
A solution could be to include the dropna() in the loop where the PSI is computed for each series.
The text was updated successfully, but these errors were encountered:
We need to implement a check in Autodist. I think it would be best to start with Autodist, and for each feature, check if there are any nans before applying stats tests, and if so, remove rows having NaNs before applying these tests. A warning for each feature should be printed in case this is done. One also has to make an unit test for that.
DistributionStatistic on its own will still fail in case there are NaNs in the data, but i think that is okay. Autodist is supposed to perform everything for the user, but DistributionStatistic would give error in that case, and ask the user to decide what to do with it.
Issue Description
AutoDist is not able to compute the PSI (SimpleBucketer) when the (train) dataframe does contain missing values.
Removing missing values before running AutoDist (df.dropna()) may have a large impact on the results as it would remove all row containing a single missing value.
A solution could be to include the dropna() in the loop where the PSI is computed for each series.
The text was updated successfully, but these errors were encountered: