Data cyclicity #2: an example

After reading this article you will get a better understanding of our work with predictive maintenance.

In the previous article, we explained how to recognize a problem with cyclic data. Now we are going to deal with a concrete case of a problem containing cyclic data. We will use our feature generation software designed for cyclic problems, and observe what gains such a descriptor generator brings when used in combination with classic anomaly detection models.

The data of the problem are extracted from vibratory data in a rotary system, here a ball bearing (link of the dataset).

Data handling

Figure 3: Dataset files

The dataset is made up of many files, some for sound measurements, and others for certain types of possible faults. Here, we will look at the problem by simply taking the H-A-1.mat file which will contain the reference data for the healthy state, and O-A-1.mat which contains the data related to the fault to be detected. Here there will only be one type of fault, concerning the outer part of the ball bearing.

Each file contains data sampled at 200,000 Hz, for 10 seconds of measurement. Two sensors are available: vibration data from an accelerometer, and the rotational speed of the bearings.

Figure 4: First values ​​of accelerometer data during healthy and abnormal operation

Here, the problem we choose is the following: from the healthy data, build a model allowing to recognize the healthy data and the abnormal data.

Models used

To deal with this problem, we will use classic models, available in the sklearn machine learning library. We will use the following 3 models from the library:

The objective is not to go into the details of these methods, or to optimize the models obtained as much as possible. Rather, it is a question of seeing how a suitable feature generation tool can improve the performance of models for data training.

These 3 models, like many machine learning models, do not work natively on time series. Indeed, these are models which learn to associate 1 entry of fixed size, with 1 exit. For example, they can associate a segment of 10 seconds of measurements with a “healthy” or “anomaly” label. But they are not designed to follow the state of a system continuously over time, generating a time series that would be the evolution of the health of our system.

A particularity of these models is that they are models used to detect anomalies. That is, they learn using only healthy data. Then during the evaluation on new data, a “healthy” or “anomaly” label is returned by the models built. The “anomaly” label is returned if a deviation from the healthy state is detected by the model.

To be able to apply these models to the problem, we will split the healthy and abnormal data into small segments. These segments will be the inputs to the model, and depending on where the segment is extracted from, each will have its “healthy” or “anomaly” label. The models will be trained using a fraction (here 12.5%) of the healthy data, and will be validated on the rest of the data.

Regular segmentation

A first possible approach is to cut into segments of regular sizes for the healthy data and the data from the fault. Here, we choose to have segments of 10,000 points (therefore 10 times the size of Figure 4). Each segment is associated with its label: “healthy” or “abnormal”.

Once this cutting is done, we can apply the 3 previously mentioned models in order to categorize the healthy data, abnormal data, by training only on part of the healthy data.

In order to visualize the results obtained in a synthetic way, we will use a ROC curve. It is a classic tool, allowing to visualize the different trade-offs rate of false positives / rate of true positives that it is possible to obtain from a model. False positives are false alarms, and the true positive rate is the anomaly detection rate. For each model, it is possible to vary an anomaly detection threshold.

To read the results obtained for a model, all you have to do is set an acceptable false positive rate (eg 0.05 for 5% false positives), and look at which detection rate is associated (eg here, 42% for Local Outlier Factor). The higher the detection rate for a certain fixed rate of false alarms, the better the model.

A value for globally comparing models is the area under the curve (AUC: Area Under Curve). The closer this value is to 1, the better the model.

Figure 5: ROC curves for a simple segmentation

Without detailing the results, we observe that the best model is the One Class SVM, with an area under the curve of 0.92. Whether such a result is good or not depends entirely on the context in which the model is used and the difficulty of the problem. We do not seek to interpret this value, but simply to use it to easily compare the generated models.

Now that we have established our benchmarks, let’s see if there is room for improvement. In our DiagFit software, there is a feature generation module, specially adapted to the cyclic case. However, this tool can be applied to any time series, which is useful in cases where the cyclicity of the data is not evident. The results obtained by this method are shown in the figure below.

Figure 6: Simple segments + Automatic feature generation (cyclic)

After evaluating the models thus constructed from simple segmentation and cyclic feature generation, we can see that the AUC of the models does not improve. Learning the models on the transformed data therefore did not improve performance.

In fact, the data for this problem is relatively far from an ideal case of cyclic data. Our feature generation tool highlights fine differences in the evolution of data over time. In the case of vibrational data, this approach may work, but by its nature this data contains a lot of random noise, which can mask finer changes in the time course of the data.

In addition, the speed of rotation increases in the studied data set, so each segment constructed via regular segmentation over time can contain a variable number of revolutions. Each segment therefore corresponds to a slightly different physical phenomenon, which takes us further away from an ideal cyclical case.

Fortunately, the latter problem can be fixed by performing smarter data segmentation.

Segmentation by cycle

In the previous part, the results were obtained by dividing the data into regular segments over time. This simple method has the disadvantage of creating segments where the number of rotations is not constant.

Fortunately, the data set provides a second piece of information which allows segments to be constructed which have the same number of turns for the ball bearing studied. Indeed, the second sensor is the speed of rotation of the bearing, measured via a tachometer.

Figure 7: Cutting in turns from the second data provided

From this data, it is possible to make a cut where each segment contains exactly the same number of turns for the bearing (here we choose 1024 turns). The segments are not significantly different in size: before resampling, the average length of a cycle is 10,092 points (compared to 10,000 with simple segmentation). The advantage is that each segment obtained via this cutting now corresponds to better comparable physical phenomena, which makes it possible to approach the ideal case of cyclic data.

First of all, we can observe that if we directly apply the same 3 methods as in the previous section on these synchronized segments, the gain provided by this segmentation is not obvious:

Figure 8: ROC curves for a regular segmentation in number of turns

Indeed, we do not observe any significant change in the AUC of the models. However, by using our automatic feature generation tool in the cyclic case, we observe this time a major gain in the performance of the models, contrary to what had been obtained with a simple segmentation:

Figure 9: ROC curves of the models via a cyclic approach

Thanks to this regular segmentation in number of laps, and not in segment length, we obtain much more interesting results. The area under the curve of all models is greatly improved. For example, taking a false alarm rate of 5%, we go from a detection rate of 42% to 99% for a Local Outlier Factor type model. The results for the other models are similar, with a sharp increase in the quality of the models. Approaching the problem using tools specific to cyclical data therefore generated much better results than a basic approach.

Note: Of course, if the goal was to accurately assess performance, this assessment would have to be done on the entire dataset, with different healthy measurement files, and different types of faults. It is possible that we would then see a slight decrease in the performance of the models, the problem becoming more complex due to the different contexts and types of faults.

The final word

Now you can better understand our breakthrough approach to predictive maintenance.

We have seen an example of a problem where smart data slicing allows our feature generation software to work better for cyclic problems. In this example, cutting against the number of rotations and not against time, the performance of the best model goes from an AUC of 0.87 to an AUC of 0.995.

The improvement in model performance shows that dealing with problems through algorithms specifically designed for the cyclical case can achieve very good performance in some cases.

In practice, cycles are regularly present in the data. There is not always a need to find them as was done in the example accompanying this article. But for cases where cycles are not present, DiagFit will integrate, in a future version, our new automatic cycle detection algorithm. Thus, whether explicit or implicit, the cyclicity of the data will be used to improve the detection of anomalies.

Other feature generation techniques, and other failure prediction models exist outside of the three models used here. In a future article, we will compare the results obtained via different methods, on several datasets.

Definition

Feature: A feature here, if we translate it literally, is a descriptor. At its simplest, it is a value (or a set of values) that describes something, which expresses information in an interesting form about the data. If we take a class’s grades on the math test, the class mean is one feature, the variance is another feature. The advantage of generating features is to re-express the information from the raw data (class notes) in an interesting form and more suited to machine learning models (class average). In Diagfit, we do this, but here instead of having class notes we take cyclic time series.

Read part 1: Data cyclicity, understand what it is

Learn more about DiagFit, our blind failure prediction software: https://www.amiraltechnologies.com/actualite/2020/09/diagfit-1-5/

Article by Vincent Heurtin, Data Scientist at Amiral Technologies

Latest posts