Examining time series in the industrial context can sometimes be complex. In a previous article entitled “*Analyzing, cleaning and transforming industrial time series*“, we discussed how a simple analytics approach, combined with cleaning techniques and the creation of virtual sensors, can extract more value from time series.

**How can we go even further in data exploration? **

**How can we make the most of data when creating predictive models?**

In this new article, we’ll look at more advanced operations based on our methodology, such as sensor correlation and frequency exploration. Note that, due to the nature of the data, the “temporal” dimension is almost always present in these analysis functions. We’ll end the article with a few key steps leading from analysis to model creation.

## Time/Frequency analysis

The first step involves analyzing the time series of each sensor meticulously through the frequency dimension. To do this, various methods can be used. Here, we will illustrate the analysis of spectra and spectrograms.

### Fourier Transform method, transitioning from the time domain to the frequency domain

A commonly used mathematical method for frequency analysis is the Fourier Transform. This technique, named after the French mathematician Jean-Baptiste Joseph Fourier, enables the decomposition of a time function into a sum or integral of sine and cosine functions of different frequencies. This facilitates the transition from the time domain to the frequency domain.

The Fourier Transform breaks down a complex signal into its constituent frequency components, thus providing a precise representation of the signal in the frequency domain.

The result of this operation notably enables the generation of associated spectra and spectrograms for a thorough frequency analysis.

### Analysis of spectra

Spectral analysis involves studying the characteristics of a signal by decomposing it into frequency bands to study its properties.

In the realm of time series, spectral analysis **helps to understand the frequency structure of signals**, which can reveal important information about their origin, content, or behavior. Time is removed from the equation to focus only on the frequency component.

Identifying the different “frequency bands” associated with time signals will notably help to determine if there is a concentration of data around certain frequencies. This analysis also allows for the specific identification of frequency bands composing a certain type of anomaly.

With this approach, it becomes interesting to divide a single time series into several time series associated with different frequency bands. This division enables separate analysis and **extraction of relevant and useful information**.

To illustrate this approach, signals from two different sensors are used: temperature and accelerometer.

Let’s consider the following temperature signal:

The associated spectrum illustrates that a large portion of the data is concentrated below 0.047 Hz, which corresponds to a time scale of approximately twenty seconds.

Then it’s possible to perform a “high-pass” operation at 0.05 Hz; the user will then obtain a signal that is filtered from all physical phenomena slower than the mentioned scale above (cf. Illustration 3).

However, the analysis of the following accelerometer signal illustrates a different situation:

The spectrum corresponding to this signal demonstrates that the frequency decomposition is not the same as that of temperature. Here, the frequency segmentation will instead occur in 3 or 4 frequency bands, such as divisions at 0.15 Hz, 0.55 Hz, and 0.94 Hz, for example.

The first frequency band will filter out all physical phenomena faster than 7 seconds, the second will identify only phenomena evolving between 7 and 2 seconds, the third between 2 and 1 second, and the last will filter out all physical phenomena slower than one second.

With this decomposition, it will be possible to **detect the very precise signature of certain anomalies on certain frequency bands**. This would not have been easy with the preservation of a complete signal.

💡Information: Purists will argue that a spectrum should be done on a stationary signal. However, the reality of use cases shows that spectral analysis is commonly employed. Nevertheless, time/frequency information can be retrieved by analyzing spectrograms.

### Analysis of spectrograms

The spectrogram, originally used in audio (sonogram), illustrates the energy distribution of the signal over time. This graphical representation of the signal **enables the identification of the temporal evolution of the signal’s frequency content**. The energy, represented in red tones on the spectrogram, allows the observer to visualize how the signal behaves over time from a frequency perspective.

This can be relevant for detecting changes in context (such as a change in the operating mode of the equipment). The change in context will be represented by a vertical break in the spectrogram (cf. Illustration 8).

This representation of the signal **allows us to determine if the signal contains frequency clusters**, making the generation of frequency-level features relevant (see paragraph on feature generation).

## Time/Sensor Analysis

During the exploration of time series data, it often beneficial to examine inter-sensor dynamics. This analysis, called correlation matrix, involves comparing the signals from two or more sensors to determine if the evolution of their data carries useful information for predictive model generation.

### The correlation matrix of sensors

The correlation matrix allows for a quick visualization of whether the **sensor signals evolve with synergies over time** across the entirety of the analyzed data. This correlation matrix can depend on the contexts being analyzed.

**On the upper part of the matrix **

A correlation coefficient is calculated (sensor A in relation to sensor B), allowing for a quick identification of whether the sensors:

- evolve positively together over time (represented in red in the illustration) – when the signal of one increases, the other also increases
- evolve negatively together (represented in blue in the image) – when the signal of one sensor changes, the other sensor’s signal changes in the opposite direction

**On the lower part of the matrix **

It is possible to observe the graphical representation of the measurements of one sensor in relation to those of the others. This representation visually identifies the synergies of the sensors over time.

💡Information: When calculating correlation coefficients, it is important for it to be robust to outliers in the data (if they haven’t been removed) to prevent bias.

💡Information: The stronger the correlation coefficient, the more linear the relationship will be between the two sensors in the scatterplot.

Subsequently, analyzing correlations allows for the examination of sensors correlated together, particularly to **identify faults resulting from a modification in the correlation of these sensors.**

**Did you know?**

It is also possible to generate a correlation matrix from specific frequency bands of the sensors to identify whether correlations are persistent or if new correlations emerge.

## From analysis to model creation

### Features generation- « Features engineering »

It should be noted that in the case of time series data, most machine learning algorithms cannot be directly applied to raw data. Therefore, **the solution often involves transforming the raw time series into a vector representation of features**, and then applying a predictive model to this new representation. The quality of defining these features will have a major impact on the performance of the associated model. Generating features manually is a task that can sometimes be very difficult and, above all, very time-consuming.

There are many extraction methods, more or less standard, such as calculating a minimum, a maximum, a mean, a median, etc. However, in the context of complex industrial time series, these methods can sometimes be limited. It then becomes interesting to turn to more advanced feature generators that will allow the exploitation of the physical laws governing the equipment or signal processing techniques. Thus, the extraction of discriminant features will identify potentially abnormal behaviors of the observed equipment.

**This feature generation directly exploits the three dimensions mentioned earlier: time, sensors, and frequency.**

To limit “false positives,” it may sometimes be necessary to reduce the number of features used. This is called “feature selection.” This technique involves selecting only relevant features – those that have a significant impact on the model’s performance – and is often done manually.

*About us: **Amiral Technologies was founded on scientific inventions originating from the GIPSA laboratory within the French CNRS of Grenoble: extremely discriminant feature generators. Moreover, our feature generators incorporate an automated “features selection” step, enabling our users to save considerable time in generating their models.*

### Calculation of invariants

In a context where equipment and their associated data are constantly evolving, our team of Data Scientists focuses on identifying what we refer to as “invariants” within time series data.

These invariants, derived from the generated features, possess the unique characteristic of **remaining unchanged over time** and potential changes in context, hence their name. They thus hold strategic importance in the development of predictive models, as they enable the characterization of the constant elements underlying the temporal relationships between successive measurements from a sensor or group of sensors.

The search for anomalies associated with these invariants primarily involves **identifying any data that violates this rule of immutability**.

## Conclusion

However, while this approach may seem intuitive at first glance, its effectiveness depends on two essential conditions. Firstly, the feature generation must be capable of extracting these “invariants,” which is not always possible with standard generators. Additionally, these invariants must be sufficiently precise and robust to avoid triggering alerts with every new data point, thus ensuring effective detection of anomalies.

Analyzing across three dimensions – time, sensor, and frequency – provides a **wealth of information** that significantly enhances the relevance of the generated predictive model. **This rigorous approach**, combining correlation exploration and frequency analysis, **fully exploits the data**, becoming a **powerful** **lever** in developing a predictive maintenance strategy.

In an upcoming article, we will explore *model generation on subsets of sensors *derived from the analysis described above, providing an overview of the last two steps of our methodology.

**New release – DiagFit 3.0**

With DiagFit 3.0, performing all the operations mentioned above is now at your fingertips, **without the need to write a single line of code**. All the analyses and operations offered (correlation matrix, spectrum, frequency division) are based on proven proprietary algorithms **for industrial time series**. The **intuitive interface** is designed to simplify the users’ lives, offering **smooth navigation through time series data** and automatically generating the associated graphical representations of sensors. **With guidance supported by our robust methodology**, users explore their data to extract essential information necessary for implementing **an effective predictive maintenance approach**.