1.1 Background
For the machinery in modern industry, the complex structure, large-scale design, and intelligent functions are the new development directions. Due to the influence of many factors during operation such as heavy load and high temperature, the core components of mechanical equipment such as gears and bearings may experience certain degradation, fault, or even completely failure. The failure of mechanical components can cause a series of problems, ranging from property damage to human severe injury. Therefore, it is an indispensable part of modern industry to apply fault diagnosis technology to perform real-time monitoring during the operation of equipment, detect the fault as early as possible, and provide a reliable basis for equipment maintenance.
Although the fault diagnosis of industrial equipment has a long history and has accumulated a lot of experience, mechanical equipment has been developing fast in highly complex and intelligent directions in recent years. Therefore, intelligent fault diagnosis research for multiple faults and multiple working conditions has attracted the attention of researchers. Modern industrial equipment is large and complex with many monitoring points and sensors, which bring a large amount of data and information for analysis and recognition. The big data of mechanical equipment brings new challenges to fault diagnosis, which can be summarized as the following two points:
- The manual detection of fault is very difficult to meet the requirement with a large amount of data, which requires automatic and intelligent fault diagnosis algorithm.
- The data types are diversified, and each sample may be obtained from different machines at different positions under different working conditions, which increase the difficulty of feature mining and fault diagnosis.
Because of the above two challenges, using massive data for feature mining to achieve efficient and accurate fault diagnosis is a complex problem. In recent years, the development of artificial intelligence technology has been relatively rapid. As important disciplines and promising development directions of artificial intelligence, neural networks, and deep learning theories have developed fast and show the importance in directions such as automatic reasoning, cognitive modeling, and intelligent manufacturing. At the same time, the intelligent fault diagnosis method of mechanical equipment is also thriving, which combines fault diagnosis theory and machine learning methods. The intelligent fault diagnosis methods perform feature mining on a large number of signals extracted by various sensors to obtain feature information reflecting the fault, establish the diagnosis model through the analysis of various feature information, and conduct accurate and reliable fault diagnosis in mechanical equipment.
In summary, the combination of machine learning and fault diagnosis can make full use of the massive amounts of data to diagnose faults on industrial machinery without much human involvement. Therefore, the machine learning-based fault diagnosis of mechanical equipment has essential research significance and broad research prospects.
1.2 Related Methods
1.2.1 Back Propagation Neural Network
Back propagation neural network is a network formed by interconnecting multiple layers of neurons. It has a very powerful nonlinear mapping capability and is the basis for many complex network structures. The loss function and the gradient descent method are the core parts of the back propagation neural network. The value of the neuron node in each layer can be calculated as follows:
(1.1)
where x(l)(j) represents the activation value of the jth neuron in the lth layer, w(l)ji represents the weight between the ith neuron in the l-1th layer, and the jth neuron in the lth layer and f refers to the activation function. Different activation functions can be designed for different problems in practical applications according to the requirements.
Usually, the loss function of the back propagation neural network is represented by the sum of variances between the actual value and the expected value, and the computation is as follows:
(1.2)
where o is the actual output value and d is the expected value.
Then the partial derivative of the loss function to the weights of the last layer and the neurons in the previous layer can be computed as:
(1.3)
where wn(L) represents the weight vector of the neuron of the previous layer to the nth neuron of the Lth layer, and f' represents the differential form of the activation function. By analogy, the weight gradient and neuron gradient of the entire network can be obtained.
1.2.2 Convolutional Neural Network
Convolutional neural network is a multi-level network composed of multiple convolutional layers, and the idea comes from the visual cortex classification principle of the biological nervous system. The basic structure of convolutional neural network includes four essential layers: convolutional layer, pooling layer, activation layer, and fully connected layer. As a classic model in deep learning theory, the convolutional neural network adopts the method of sparse local connection and weight sharing, which avoids the problem of intensive weight calculations in ordinary multi-layer networks due to full connections. Convolutional neural networks have been applied to machine vision, speech recognition, fault diagnosis, and other fields in recent years and have achieved great success in these fields.
The convolutional layer is the core of the convolutional neural network. It is characterized by the ability to share weights among neurons and the primary purpose is to extract data features. It is also a vital part of the convolutional neural network that is different from the ordinary neural network. The convolutional layer performs convolution operations on the local area of the input signal (or the feature vector provided by the previous layer) through a specific size of convolution kernel and obtains the data features. For the sparse local connection, each neuron in each layer only perceives part of the input data. For the weight sharing, the neurons use the same convolution kernel to compute the outputs. As the most essential feature of the convolutional layer, weight sharing can reduce network parameters to reduce the phenomenon of overfitting.
To increase the representation ability of the network, it is necessary to perform activation operation on these features and map the features extracted from convolution operation to a linearly separable feature space. The activation functions commonly used in neural networks include the Sigmoid, Tanh (hyperbolic tangent), and ReLU (rectified linear unit).
In the convolutional neural network, the pooling operation is a down-sampling process, and the primary purpose is to reduce the amount of calculation by reducing the size of the feature map, which is usually use...