Oil is known to improve the tribological properties of interacting surfaces in relative motion to each other by reducing friction, temperature and wear. Oil degrades in time which may have severe consequences to the asset. For example, some contamination due to a leakage of fuel or a refrigerant may heavily affect the properties of the oil and consequently the asset's operation. An oil analysis provides sight on the oil's health and indirectly also on some aspects of the asset's health.
The oil analysis in this case study comprises various assessments of the contaminating solid particles:
The oil analysis also comprises some tribological properties like the viscosity at various temperatures. Finally, the oil analysis comprises an assessment of some chemical properties like the water and fuel contamination, the total acid/base number (TAN/TBN) or the flash point.
Most of the samples in the data set come from oil that has been in operation for about a year in a vehicle. However, some samples come from fresh oil or from fresh oil after a short test ride. So, this case study does not contain any sequences of measurements that have been taken during the operational life of the same oil. So, resampling cannot explain the dependencies in the data set. Still, the vehicles have been deployed in various ways and their health state may well differ. So, the use and the health of the vehicle as well as the initial health of the oil most likely explain the outcome of the oil analysis at the age of one year.
The data set allows to apply both unsupervised and supervised machine learning, as will be illustrated in the demo scripts. These demo scripts merely illustrate unsupervised and supervised machine learning. The idea is that both engineering knowledge and data science are needed to improve these demo scripts. The data scientist's quest for better algorithms is as important as the engineer's quest for explanations. Unexplained observations or unobserved explanations delimit data driven decision support. To serve data scientists and engineers, the identifiers of the measurements are meaningful to engineers. This as opposed to many other publicly available data sets on similar cases.
Unsupervised machine learning is a type of algorithm that learns patterns from unlabelled data. Oil may degrade in several ways and each way may result in a specific pattern in the analysed oil sample. Conversely, a specific pattern in the analysed oil sample indicates which type of degradations are predominant. Unsupervised machine learning may just recognise patterns in the analysed oil samples. However, the engineer should seek viable explanations for these patterns to make them practically meaningful. This explanation remains engineering judgement as the type of degradation is not in the data set.
Supervised machine learning is a type of algorithm that learns about a function from labelled data. So, supervised machine learning yields a function that maps inputs (data) to outputs (label) by learning from pairs of inputs and outputs. Some of the inputs may appear to be highly predictive for the output. Again, the engineer should seek viable explanations for the dependencies found. Possibly, the inputs that strongly associate with the output are not the cause of the output.