Releases: openml/openml-python
v0.15.1
Will clean up release notes later, highlights:
- Fix usage of environment variables for locating the default cache and configuration directories by @eddiebergman in #1359
- Allow skip trying to download parquet files by setting the
OPENML_SKIP_PARQUET
variable totrue
by @PGijsbers in #1388 - a lot of maintenance work by @eddiebergman and @LennartPurucker
Thanks to everyone who contributed in any way ❤️
What's Changed
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1329
- Bump codecov/codecov-action from 3 to 4 by @dependabot in #1328
- Disable docker release on PR by @LennartPurucker in #1360
- fix(datasets): Add code
111
for dataset description not found error by @eddiebergman in #1356 - Test Fixes for v0.15.1 by @LennartPurucker in #1358
- fix: Avoid Random State and Other Test Bug by @LennartPurucker in #1362
- fix/maint: Make Docs Work Again and Stop Progress.rst Usage by @LennartPurucker in #1365
- doc: README Rework by @LennartPurucker in #1361
- doc: make all examples use names instead of IDs as reference. by @LennartPurucker in #1367
- fix: avoid stripping whitespaces for feature names by @LennartPurucker in #1368
- fix: workaround for git test workflow for Python 3.8 by @LennartPurucker in #1369
- add: test for dataset comparison and ignore fields by @LennartPurucker in #1370
- fix: github workflows and pytest issue by @LennartPurucker in #1373
- feat: support for loose init model from run by @LennartPurucker in #1371
- fix/maint: avoid exit code (which kills the docs building) by @LennartPurucker in #1374
- ux: Provide helpful link to documentation when error due to missing API token by @eddiebergman in #1364
- ci: Docker/build-push-action from 5 to 6 by @dependabot in #1357
- ci: Bumb peter-evans/dockerhub-description from 3 to 4 by @dependabot in #1326
- fix: resolve Sphinx style error by @LennartPurucker in #1375
- docs: fix borken links after openml.org rework by @LennartPurucker in #1376
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1380
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1381
- Mark test as production by @PGijsbers in #1384
- Patch release bump by @PGijsbers in #1389
Full Changelog: v0.15.0...v0.15.1
v0.15.0
What's Changed
- ADD #1335: Improve MinIO support.
- Add progress bar for downloading MinIO files. Enable it with setting
show_progress
to true on eitheropenml.config
or the configuration file. - When using
download_all_files
, files are only downloaded if they do not yet exist in the cache.
- Add progress bar for downloading MinIO files. Enable it with setting
- FIX #1338: Read the configuration file without overwriting it.
- MAINT #1340: Add Numpy 2.0 support. Update tests to work with scikit-learn <= 1.5.
- ADD #1342: Add HTTP header to requests to indicate they are from openml-python.
- ADD #1345:
task.get_dataset
now takes the same parameters asopenml.datasets.get_dataset
to allow fine-grained control over file downloads. - MAINT #1346: The ARFF file of a dataset is now only downloaded if parquet is not available.
- MAINT #1349: Removed usage of the
disutils
module, which allows for Py3.12 compatibility. - MAINT #1351: Image archives are now automatically deleted after they have been downloaded and extracted.
- MAINT #1352, 1354: When fetching tasks and datasets, file download parameters now default to not downloading the file.
Files will be downloaded only when a user tries to access properties which require them (e.g.,dataset.qualities
ordataset.get_data
).
New Contributors
- @BrunoBelucci made their first contribution in #1338
- @knyazer made their first contribution in #1345
Full Changelog: v0.14.2...v0.15.0
Version 0.14.2
This is a minor release to support several hotfixes and technical debt.
- MAINT #1280: Use the server-provided
parquet_url
instead ofminio_url
to determine the location of the parquet file. - ADD #716: add documentation for remaining attributes of classes and functions.
- ADD #1261: more annotations for type hints.
- MAINT #1294: update tests to new tag specification.
- FIX #1314: Update fetching a bucket from MinIO.
- FIX #1315: Make class label retrieval more lenient.
- ADD #1316: add feature descriptions ontologies support.
- MAINT #1310/#1307: switch to ruff and resolve all mypy errors.
Version 0.14
IMPORTANT: This release paves the way towards a breaking update of OpenML-Python. From version 0.15, functions that had the option to return a pandas DataFrame will return a pandas DataFrame by default. This version (0.14) emits a warning if you still use the old access functionality.
More concretely:
- In 0.15 we will drop the ability to return dictionaries in listing calls and only provide pandas DataFrames. To disable warnings in 0.14 you have to request a pandas DataFrame (using
output_format="dataframe"
). - In 0.15 we will drop the ability to return datasets as numpy arrays and only provide pandas DataFrames. To disable warnings in 0.14 you have to request a pandas DataFrame (using
dataset_format="dataframe"
).
Furthermore, from version 0.15, OpenML-Python will no longer download datasets and dataset metadata by default. This version (0.14) emits a warning if you don't explicitly specify the desired behavior.
Please see the pull requests #1258 and #1260 for further information.
- ADD #1081: New flag that allows disabling downloading dataset features.
- ADD #1132: New flag that forces a redownload of cached data.
- FIX #1244: Fixes a rare bug where task listing could fail when the server returned invalid data.
- DOC #1229: Fixes a comment string for the main example.
- DOC #1241: Fixes a comment in an example.
- MAINT #1124: Improve naming of helper functions that govern the cache directories.
- MAINT #1223, #1250: Update tools used in pre-commit to the latest versions (
black==23.30
,mypy==1.3.0
,flake8==6.0.0
). - MAINT #1253: Update the citation request to the JMLR paper.
- MAINT #1246: Add a warning that warns the user that checking for duplicate runs on the server cannot be done without an API key.
Version 0.13.1
- ADD #1028: Add functions to delete runs, flows, datasets, and tasks (e.g., openml.datasets.delete_dataset).
- ADD #1144: Add locally computed results to the OpenMLRun object’s representation if the run was created locally and not downloaded from the server.
- ADD #1180: Improve the error message when the checksum of a downloaded dataset does not match the checksum provided by the API.
- ADD #1201: Make OpenMLTraceIteration a dataclass.
- DOC #1069: Add argument documentation for the OpenMLRun class.
- FIX #1197 #559 #1131: Fix the order of ground truth and predictions in the OpenMLRun object and in format_prediction.
- FIX #1198: Support numpy 1.24 and higher.
- FIX #1216: Allow unknown task types on the server. This is only relevant when new task types are added to the test server.
- MAINT #1155: Add dependabot github action to automatically update other github actions.
- MAINT #1199: Obtain pre-commit’s flake8 from github.com instead of gitlab.com.
- MAINT #1215: Support latest numpy version.
- MAINT #1218: Test Python3.6 on Ubuntu 20.04 instead of the latest Ubuntu (which is 22.04).
- MAINT #1221 #1212 #1206 #1211: Update github actions to the latest versions.
Version 0.13.0
Version 0.13.0
- FIX #1030:
pre-commit
hooks now no longer should issue a warning. - FIX #1058, #1100: Avoid
NoneType
error when printing task withoutclass_labels
attribute. - FIX #1110: Make arguments to
create_study
andcreate_suite
that are defined as optional by the OpenML XSD actually optional. - FIX #1147:
openml.flow.flow_exists
no longer requires an API key. - FIX #1184: Automatically resolve proxies when downloading from minio. Turn this off by setting environment variable
no_proxy="*"
. - MAIN #1088: Do CI for Windows on Github Actions instead of Appveyor.
- MAINT #1104: Fix outdated docstring for
list_task
. - MAIN #1146: Update the pre-commit dependencies.
- ADD #1103: Add a
predictions
property to OpenMLRun for easy accessibility of prediction data. - ADD #1188: EXPERIMENTAL. Allow downloading all files from a minio bucket with
download_all_files=True
forget_dataset
.
Version 0.12.1
Version 0.12.1
- ADD #895/#1038: Measure runtimes of scikit-learn runs also for models which are parallelized via the joblib.
- DOC #1050: Refer to the webpage instead of the XML file in the main example.
- DOC #1051: Document existing extensions to OpenML-Python besides the shipped scikit-learn extension.
- FIX #1035: Render class attributes and methods again.
- FIX #1042: Fixes a rare concurrency issue with OpenML-Python and joblib which caused the joblib worker pool to fail.
- FIX #1053: Fixes a bug which could prevent importing the package in a docker container.
Version 0.12.0
0.11.1
- ADD #964: Validate
ignore_attribute
,default_target_attribute
,row_id_attribute
are set to attributes that exist on the dataset when callingcreate_dataset
. - ADD #979: Dataset features and qualities are now also cached in pickle format.
- ADD #982: Add helper functions for column transformers.
- ADD #989:
run_model_on_task
will now warn the user the the model passed has already been fitted. - ADD #1009 : Give possibility to not download the dataset qualities. The cached version is used even so download attribute is false.
- ADD #1016: Add scikit-learn 0.24 support.
- ADD #1020: Add option to parallelize evaluation of tasks with joblib.
- ADD #1022: Allow minimum version of dependencies to be listed for a flow, use more accurate minimum versions for scikit-learn dependencies.
- ADD #1023: Add admin-only calls for adding topics to datasets.
- ADD #1029: Add support for fetching dataset from a minio server in parquet format.
- ADD #1031: Generally improve runtime measurements, add them for some previously unsupported flows (e.g. BaseSearchCV derived flows).
- DOC #973 : Change the task used in the welcome page example so it no longer fails using numerical dataset.
- MAINT #671: Improved the performance of
check_datasets_active
by only querying the given list of datasets in contrast to querying all datasets. Modified the corresponding unit test. - MAINT #891: Changed the way that numerical features are stored. Numerical features that range from 0 to 255 are now stored as uint8, which reduces the storage space required as well as storing and loading times.
- MAINT #975, #988: Add CI through Github Actions.
- MAINT #977: Allow
short
andlong
scenarios for unit tests. Reduce the workload for some unit tests. - MAINT #985, #1000: Improve unit test stability and output readability, and adds load balancing.
- MAINT #1018: Refactor data loading and storage. Data is now compressed on the first call to
get_data
. - MAINT #1024: Remove flaky decorator for study unit test.
- FIX #883 #884 #906 #972: Various improvements to the caching system.
- FIX #980: Speed up
check_datasets_active
. - FIX #984: Add a retry mechanism when the server encounters a database issue.
- FIX #1004: Fixed an issue that prevented installation on some systems (e.g. Ubuntu).
- FIX #1013: Fixes a bug where
OpenMLRun.setup_string
was not uploaded to the server, prepares forrun_details
being sent from the server. - FIX #1021: Fixes an issue that could occur when running unit tests and openml-python was not in
PATH
. - FIX #1037: Fixes a bug where a dataset could not be loaded if a categorical value had listed nan-like as a possible category.
Version 0.11.0
- ADD #753: Allows uploading custom flows to OpenML via OpenML-Python.
- ADD #777: Allows running a flow on pandas dataframes (in addition to numpy arrays).
- ADD #888: Allow passing a
task_id
torun_model_on_task
. - ADD #894: Support caching of datasets using feather format as an option.
- ADD #929: Add
edit_dataset
andfork_dataset
to allow editing and forking of uploaded datasets. - ADD #866, #943: Add support for scikit-learn's
passthrough
anddrop
when uploading flows to OpenML. - ADD #879: Add support for scikit-learn's MLP hyperparameter
layer_sizes
. - ADD #894: Support caching of datasets using feather format as an option.
- ADD #945: PEP 561 compliance for distributing Type information.
- DOC #660: Remove nonexistent argument from docstring.
- DOC #901: The API reference now documents the config file and its options.
- DOC #912: API reference now shows
create_task
. - DOC #954: Remove TODO text from documentation.
- DOC #960: document how to upload multiple ignore attributes.
- FIX #873: Fixes an issue which resulted in incorrect URLs when printing OpenML objects after switching the server.
- FIX #885: Logger no longer registered by default. Added utility functions to easily register logging to console and file.
- FIX #890: Correct the scaling of data in the SVM example.
- MAINT #371:
list_evaluations
defaultsize
changed fromNone
to10_000
. - MAINT #767: Source distribution installation is now unit-tested.
- MAINT #781: Add pre-commit and automated code formatting with black.
- MAINT #804: Rename arguments of list_evaluations to indicate they expect lists of ids.
- MAINT #836: OpenML supports only pandas version 1.0.0 or above.
- MAINT #865: OpenML no longer bundles test files in the source distribution.
- MAINT #881: Improve the error message for too-long URIs.
- MAINT #897: Dropping support for Python 3.5.
- MAINT #916: Adding support for Python 3.8.
- MAINT #920: Improve error messages for dataset upload.
- MAINT #921: Improve hangling of the OpenML server URL in the config file.
- MAINT #925: Improve error handling and error message when loading datasets.
- MAINT #928: Restructures the contributing documentation.
- MAINT #936: Adding support for scikit-learn 0.23.X.
- MAINT #945: Make OpenML-Python PEP562 compliant.
- MAINT #951: Converts TaskType class to a TaskType enum.