Blog

Analyze, clean and transform industrial time series: from discovery to virtual sensors creation to maximize data utilization

The industry is moving towards more advanced utilization of data from industrial equipment. This data, often in the form of complex time series, requires careful preparation and/or thorough exploration by domain experts.

When industrial users embark on analyzing their data to solve a specific problem (such as detecting specific failures, anticipating machine downtime, etc.), they may find themselves disoriented regarding the approach to take.

How to approach raw data from equipment? What elements should be examined to extract useful and actionable information? How to make the switch from analysis to value creation for the company?

In this article, we will explore some of the operations that our Data Scientists experts can carry out from your data during pilot phases on your use cases.

Analyze industrial time series

Visualization and comparison of raw data

To begin data analysis, it can be interesting to visualize the raw signals over time. This visualization allows the domain expert to become aware of signal variations, possible recurring patterns and correlations between sensor signals. 

The use of simple visualization tools, such as signal comparison (being able to superimpose the signals from two or more sensors of the same nature on the same axis) or the superposition of cycles (for example in the case of a rotating machine) make it possible to visualize obvious drifts or potential correlations between signals.

Figure 1 Time series of 2 sensors of a same axis
Sometime, the domain expert can be helped by more advanced analysis featus such as the correlation matrix, the visualization of spectra or spectrograms (tools that we will discuss in a future article).

Labeling of visible incidents​

Thanks to the visual analysis, users who master the data, can possibly identify certain malfunctions visible in the signals. It is then possible to identify these areas of anomalies and classify them (if necessary) to generate a complete and usable dataset. 

This labeling can be very useful both to : 

  • exclude these areas in advance from all the healthy data used for the characterization of normality, making it more relevant
  • but also to provide areas of defects allowing partial validation of the normality output of the indicators learned later in the process.
courbe avec zone classification
Figure 2 Identification of an area as “unhealthy” and possibility of classification

Clean Data

Analyze missing values and input NaN

It is crucial to check if the file or database contains missing data. This may be due to a different sampling frequency between the sensors or from partial capture of the data.

In the case of different sampling frequency of the sensors it is necessary to impute these missing values to synchronize all the sensors. Several mathematical methods can be used for this.

tableau avec des séries temporelles échantillonnées différemment
Figure 3 Table of before resampling

Use the slowest frequency  

This consists to undersampling sensors that have a faster acquisition frequency

  • Advantage : the data used is real
  • Disadvantage: loss of information and sensitivity on fast sensors.

Use the highest frequency 

This consists to oversampling sensors with a slowest acquisition frequency

  • Advantage: the responsiveness of the model is maximum
  • Disadvantage: oversampled data are calculated, not measured, and can introduce bias into interpretation
Figure 4 Tableau de données après sous-échantillonnage.
Figure 4 Table of data after undersampling
Figure 5 Tableau de données après sur-échantillonnage avec valeur précédente
Figure 5 Table of data after oversampling

💡FYI: If we decide to create a failure prediction model subsequently, its responsiveness will be reduced. Indeed, the slower the acquisition frequency, the more the predictions made will be delayed, and the less the associated model will be able to anticipate the failure. This solution may even cause a delay in detection!

💡FYI: under these conditions the model is capable of exploiting very weak signals (in terms of deviation from normality) and detecting deviations very early.

On the other hand, be careful it is important to maintain the correlation between the sensors and a simple “filling” (as illustrated) can be problematic. It is recommended to use a more advanced filling algorithm that will meet this need.

In the case of missing values ​​linked to partial capture of the data, imputation will require a thorough understanding of this lack of data (failure on the sensor, external factor).

Detection and outliers removal

However, depending on the nature of the signal, the removal of outliers is not trivial and requires expertise in time series.

When exploring data, it is important to spot and remove outliers that could influence the quality of certain models.

Figure 6 Série Temporelle avec "outliers"
Figure 6 Time series with outliers

After removal, the quality of the signal is greatly improved, specifically if the target is to generate an efficient fault prediction model.

Figure 7 Série Temporelle sans "outliers".
Figure 7 Time series without outliers

Transform data

Splitting the signal from a sensor with trend​

The overall analysis of the sensor curves enables to visualize whether they present a temporal trend or not. 

These trends, though natural for certain data measured, can sometimes hide abnormal variations, signs of anomalies. It may therefore be interesting to take this trend into account explicitly, in order to carry out a more complete analysis. It is then advisable to split the signal in order to obtain two curves, one with the trend and the other without. The two signals thus obtained contain important and different information, allowing a more in-depth analysis of this same signal.

As part of the generation of a predictive model from a single sensor (univariate model) it is even more interesting to analyze the curve without trend.

In the context of multivariate models, this splitting technique also makes it possible to capture inter-sensor relationships that the trend risks hiding.

Figure 8 Série temporelle brute avec tendance.
Figure 8 Time serie with a trend
Figure 9 Série temporelle sans tendance avec zoom sur anomalie visible
Figure 9 Time serie without trend and zoom on the anomaly

The curve with trend (often associated with low frequencies) makes it possible to analyze possible correlations between the trends of two different sensors. Secondly, in the event of a breakdown, one of the possible causes could come from this de-correlation.

 

On the other hand, he trendless curve (often associated with high frequencies) containing the so-called more “stationary” information makes it possible to analyze variations that would not have been visible if the trend was still present.

 

Thanks to this splitting of the signal, we obtain a very in-depth analysis of the behavior of the same sensor.

Virtual sensors - Mathematica operators applie to one or several signals

When exploring data, it may be appropriate to perform operations on raw signals in order to extract new time series (called “Virtual Sensors”), thus revealing new exploitable information. Basic operations such as addition, subtraction, multiplication and division between two signals can be used for this purpose.

For example:

  • Multiplication of the voltage and the intensity enables to obtain the electrical power (P=UxI, reminder of physics lessons in high school)
  • The division of A by B can simulate a transfer function of a system or analyze the de-correlation between two sensors measuring the same data

It is also possible to go further using more complex operations such as the derivative, which can be relevant for modeling a dynamic system of the form Ẋ=AX.

For example:

  • The derivative of the speed provides its acceleration
  • The derivative of the volume allows you to obtain its flow rate

Conclusion

La maximisation de la valeur des données industrielles repose sur une analyse approfondie et parfois sur une transformation judicieuse des séries temporelles. Grâce à une approche rigoureuse, qui commence par la visualisation des signaux bruts et se poursuit avec le nettoyage des données ainsi que la création de capteurs virtuels, l’exploitation des données devient un levier puissant pour l’amélioration des performances des équipements industriels.

Ces opérations ne sont que le début de notre méthodologie éprouvée au fil des années sur divers cas d’usages. Dans un prochain article, nous explorerons l’analyse tridimensionnelle (capteurs, temps et fréquence) des signaux, élargissant ainsi nos perspectives pour une exploitation encore plus fine des séries temporelles industrielles.

Maximizing the value of industrial data relies on in-depth analysis and sometimes judicious transformation of time series. Thanks to a rigorous approach, which begins with the visualization of raw signals and continues with data cleaning as well as the creation of virtual sensors, the exploitation of data becomes a powerful lever for improving the performance of industrial equipment.

These operations are only the first steps of our methodology proven over the years on various use cases. In a future article, we will explore the tridimensional analysis (sensors, time and frequency) of signals, thus broadening our perspectives for even finer exploitation of industrial time series.

 

New release – DiagFit 3.0

With DiagFit 3.0, performing all the operations mentioned above is now at your fingertips, without the need to write a single line of code. All of the operations proposed (deletion of outliers, trend detection) are based on proprietary algorithms proven for industrial time series. The intuitive interface is designed to make life easier for users, offering smooth navigation within time series. Thanks to guidance supported by our solid methodology, users explore their data to extract relevent information, essential to the implementation of an effective predictive maintenance approach.

 

Would you like to follow our news?

Receive articles and information from Amiral Technology every week

By entering your email address you agree to receive emails from Amiral Technologies that may contain marketing information and you agree to our Terms & Conditions and Privacy Policy.

Latest news