A collection of open datasets for industrial applications, divided by categories. PRs are welcome. Links to papers describing the datasets are only included if the dataset page doesn't link to the paper already
Google Dataset Search is now out of beta and it's one of the most powerful engines to search for datasets.
https://datasetsearch.research.google.com/
A dataset of more than 19.400 X-ray images for the development, testing and evaluation of image analysis and computer vision algorithms
https://github.com/siemens/industrialbenchmark
https://www.openml.org/d/41170 (only the .arff
file format is working at the moment)
https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/ The actual dataset can be accessed by compiling this request form https://itrust.sutd.edu.sg/research/dataset/dataset_characteristics/#swat
The biggest data lake of open, continuously streaming industrial data https://openindustrialdata.com/
Especially interesting for IIoT applications, three theses and 1 paper already published about this project http://cs.unibo.it/projects/us-tm2017/index.html
A big list of Machine Learning datasets. They're not necessarily related to industry application, but the list is very up-to-date, and it contains very large datasets which, through transfer learning, can help training models for industrial applications
https://www.datasetlist.com/
The Data Repository of the UK Oil & Gas Authority, hosting a wealth of information about the UK Continental Shelf. Currently only a subset of the data is accessible to a wider public, but there are plans to extend the access also to the other data which are currently accessible only by petroleum licensee users.
An impressive collection of industrial multivariate time series from the biggest European aircraft company. These datasets were used for competitions (all closed, now), regarding different learning task. In particular this one was about anomaly detection on real industrial data. The competitions are all closed, but it's possbile to contact Airbus in order to ask access to data.
https://aigym.airbus.com/
Minute-wise voltage data for some years (2014-2019) and several users in India. A '0' voltage reading means a power outage and missing data should be interpreted as unavailable data and not as interruption.
This dataset is linked to a Kaggle competition to detect partial discharge patterns in signals acquired from these medium voltage overhead power lines.
Engine degradation simulation was carried out using C-MAPSS. Four different were sets simulated under different combinations of operational conditions and fault modes. Records several sensor channels to characterize fault evolution. The data set was provided by the Prognostics CoE at NASA Ames.
The generation of data-driven prognostics models requires the availability of datasets with run-to-failure trajectories. In order to contribute to the development of these methods, the dataset provides a new realistic dataset of run-to-failure trajectories for a small fleet of aircraft engines under realistic flight conditions. The damage propagation modelling used for the generation of this synthetic dataset builds on the modeling strategy from previous work. The dataset was generated with the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dynamical model. The data set is been provided by the Prognostics CoE at NASA Ames in collaboration with ETH Zurich and PARC.