Deep learning for cyber-attack detection in industrial control systems

April 18, 2022

Industrial Control System (ICS) represents a common term that relates to different types of control systems which usually include the devices, systems, and networks used to operate and/or automate industrial processes. Today, ICSs are fully employed in many industrial sectors and critical infrastructures such as manufacturing, energy, and transportation. According to the control principles, ICSs can be roughly categorized into two groups: centralized control and distributed control. In centralized control systems, all data from the low-level devices is wired back to a high-level controller (usually Programmable Logic Controller or microcontroller) where the control logic of the system is realized (Fig. 1a). On the other hand, distributed control is achieved by the coordinated work between devices deployed within the plant (Fig. 1b).

The Industry 4.0 paradigm is changing the way we manufacture through the transition from centralized to distributed control, where the concepts of Cyber-Physical Systems (CPS) and the Industrial Internet of Things (IIoT) are considered the main enablers of this transition. In that case, devices (sensors, actuators, etc.) that operate on a low level are becoming smart by adding computation and communication capabilities, where the exchange of data between them realizes wirelessly. However, ubiquitous communication leads to a large number of objects involved in the network, which opens up a wide area for threats and malicious cyber-attacks. The effects of an attack can lead to serious consequences such as system dysfunction or even can endanger human lives. Therefore, to solve these security issues and keep the system safe and secure, defense mechanisms with a high level of protection have to be developed. Attackers on the other side (especially when they launch sophisticated attacks) tend to be stealthy, so cyber-attacks detection represents a challenging task. In this game between attackers and security mechanisms, time plays an important role where timely detection and appropriate response can mitigate or completely neutralize the impact of an attack.

Fig. 1. Cyber-attacks on: a) centralized control system; b) distributed control system

Generally, modeling of industrial processes is based on one of two approaches: 1) design-driven and 2) data-driven. In the first approach, the system is modeled through mathematical equations (in analytical form), while the second approach uses data obtained from the system to model its behavior. Since it is difficult to describe complex processes in analytical form or it is possible to do only with rough assumptions, data-driven models have found wide application in the design of attack detection mechanisms. Due to its advantages, machine learning (ML) imposes as the right choice for working with data, and deep learning (DL) as its branch is attracting significant attention in the field of cyber security. Based on the nature of input data that we give to the machine learning algorithms, it can be classified into three major categories: 1) supervised, 2) semi-supervised, and 3) unsupervised learning (Fig. 2).

Fig. 2. Classification of DL techniques depending on the data used during model creation [1]

Supervised learning is characterized by labeled data as input to the ML model and known output. In the semi-supervised approach on the other hand, we feed input with known data, but without the known response at the output. Finally, in unsupervised learning, the model is generated based on unlabeled training data, where ML algorithm needs to learn the inherent structure from the input data.

One of the commonly used semi-supervised approaches for attack detection models the system behavior under normal operating conditions using input data that does not contain attacks. A created model performs a prediction of new data (that was not used in the training process), where the attack detection is based on the discrepancy between the estimated and the value obtained from the real process (Fig. 3).

Fig. 3. Attack detection algorithm

          A number of DL techniques are used today for generating attack detection mechanisms, and the choice of the appropriate technique often depends on the specific application [2, 3]. Since the signals obtained from the plant are usually time-series, we will briefly discuss a few DL techniques that can be effectively applied for this type of data: 1) Deep Neural Networks (DNN), 2) Convolutional Neural Networks (CNN), 3) Recurrent Neural Networks (RNN), and 4) Autoencoders.

          DNN represents a multi-layer perceptron where all layers (input layer, at least 2 hidden layers, and output layer) are fully connected. This network architecture ensures good performance in extracting high-level features from raw data. However, the main disadvantage of DNN refers to its computational complexity.

          The basis of CNN represents convolution – an operation frequently used in image and digital signal processing. CNN has at least one layer which performs convolution of its input using the filter instead of carrying out matrix multiplication. Using the sequences, the CNN model can effectively learn an internal representation of the time series data and achieve good prediction performance.

          RNN represents a network where the current output vector depends not only on the present input (as in standard feedforward neural networks) but also on the recurrent input representing the previous hidden state. Thus, using the input and previous state vector, RNN computes output for every element of a sequence. This characteristic allows RNN to learn and understand the sequential nature of the data. Three commonly used types of RNN are: 1) Simple RNN, 2) Long Short-Term Memory (LSTM), and 3) Gated Recurrent Unit (GRU).

          Autoencoders with two symmetrical components called encoder and decoder represent a specific type of feedforward neural network. Encoder extracts features in the input and in that way produce the code, whereas decoder reconstructs the input only using generated code. Reconstruction of the input signal is based on hidden layers with lower dimensionality than the input and output layers. In this way, the input space is projected onto a space of lower dimensionality whose information density is higher.

References:  

  1. Supervised, unsupervised, and semi-supervised learning with real-life usecase, [Online], Available: https://www.enjoyalgorithms.com/blogs/supervised-unsupervised-and-semisupervised-learning, Accessed on: Apr. 2022.
  2. Liu, H. and Lang, B., 2019. Machine learning and deep learning methods for intrusion detection systems: A survey. Applied sciences9 (20), p. 4396.
  3. Kwon, D., Kim, H., Kim, J., Suh, S.C., Kim, I. and Kim, K.J., 2019. A survey of deep learning-based network anomaly detection. Cluster Computing, 22 (1), pp. 949-961.