sensors Article An Adaptive Multi-Sensor Data Fusion Method Based on Deep Convolutional Neural Networks for Fault Diagnosis of Planetary Gearbox Luyang Jing 1 , Taiyong Wang 1, *, Ming Zhao 2 and Peng Wang 1 1 2 * School of Mechanical Engineering, Tianjin University, Tianjin 300354, China; jingluyang@outlook.com (L.J.); pengwang@tju.edu.cn (P.W.) School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an 710049, China; zhaomingxjtu@mail.xjtu.edu.cn Correspondence: taiyongwang@yahoo.com; Tel.: +86-22-8740-2033 Academic Editor: Vittorio M. N. Passaro Received: 30 December 2016; Accepted: 16 February 2017; Published: 21 February 2017 Abstract: A fault diagnosis approach based on multi-sensor data fusion is a promising tool to deal with complicated damage detection problems of mechanical systems. Nevertheless, this approach suffers from two challenges, which are (1) the feature extraction from various types of sensory data and (2) the selection of a suitable fusion level. It is usually difficult to choose an optimal feature or fusion level for a specific fault diagnosis task, and extensive domain expertise and human labor are also highly required during these selections. To address these two challenges, we propose an adaptive multi-sensor data fusion method based on deep convolutional neural networks (DCNN) for fault diagnosis. The proposed method can learn features from raw data and optimize a combination of different fusion levels adaptively to satisfy the requirements of any fault diagnosis task. The proposed method is tested through a planetary gearbox test rig. Handcraft features, manual-selected fusion levels, single sensory data, and two traditional intelligent models, back-propagation neural networks (BPNN) and a support vector machine (SVM), are used as comparisons in the experiment. The results demonstrate that the proposed method is able to detect the conditions of the planetary gearbox effectively with the best diagnosis accuracy among all comparative methods in the experiment. Keywords: fault diagnosis; multi-sensor data fusion; deep convolutional neural networks; feature learning; planetary gearbox 1. Introduction The planetary gearbox is a key component in mechanical transmission systems and has been widely used in wind turbines, helicopters and other heavy machineries [1]. The wide range of gear ratios, small room in power transmission line, and high transmission efficiency are the most significant advantages of planetary gearboxes [2]. Planetary gearboxes generally operate in tough working environments, which makes them suffer from a variety of failures and damages [1,3] causing unwanted downtime, economic losses, and even human casualties [4]. Therefore, the fault diagnosis of planetary gearboxes is necessary to guarantee a safe and efficient operation of mechanical transmission systems. Health conditions of planetary gearboxes can be reflected by various kinds of measurements, including vibration signal, acoustic signal, driven motor current, shaft speed, oil debris, etc. Different measurements have different drawbacks and are sensitive to different types of damage modes or operation conditions. Thus, combining and analyzing these measurements together should be an appropriate approach to detect various types of faults of complex systems. This approach, Sensors 2017, 17, 414; doi:10.3390/s17020414 www.mdpi.com/journal/sensors Sensors 2017, 17, 414 2 of 15 named multi-sensor data fusion, can achieve more accurate and reliable result than approaches with a single sensor [5–7]. Nevertheless, multi-sensor data fusion for fault diagnosis suffers from two challenging problems [6]. (1) One is the feature extraction of multi-sensory data. Generally, conventional fault diagnosis includes three steps [8]: signal acquisition, feature extraction, and fault classification. In the feature extraction step, fault sensitive features are extracted and selected from raw data through signal processing technologies and data analysis strategies, such as Fourier spectral analysis, wavelet transformation (WT), and principal component analysis (PCA). However, the multiple types of sensory data of multi-sensor data fusion may cause a number of issues that make the feature extraction with multi-sensor much more difficult than with a single sensor. These issues [6,7] include more imprecision and uncertainties in measurements, various noises, more conflicting or correlating data, higher data dimensions, etc. Extracting features from these kinds of data will be a very challenging and cruel task. At the same time, multiple types of sensory data also increases the difficulties and consumes much time to choose an optimal handcraft feature or manual feature extraction method, even though to date, the optimal handcraft feature or feature extraction method for a specific type of sensory data still remains unanswered [8,9]; (2) The other challenge of the multi-sensor data fusion is the selection of different fusion levels. Similarly, different fusion levels have their own advantages and disadvantages, and the suitable one for different fault diagnosis tasks is usually different [5]. Selecting an optimal fusion level for a specific fault diagnosis task always requires domain expertise, prior knowledge, and human labor. Deep neural networks (DNN), also known as deep learning, have been attracting increasing attention from researchers from various fields in recent years [10–12]. The key advantage of DNN is the feature learning ability [11], which can automatically discover an intricate structure and learn useful features from raw data layer by layer. A number of studies [11–13] have shown that DNN can fuse input raw data and extract basic information from it in its lower layers, fuse the basic information into higher representation information and decisions in its middle layers, and further fuse these decisions and information in its higher layers to form the final classification result. It can be seen that DNN itself is a fusion structure [11,13], which fuses the feature extraction, feature selection, data-level fusion, feature-level fusion, decision-level fusion, and classification into one single learning body. For this reason, a DNN-based low level fusion, e.g., data-level fusion, can not only learn features from fused raw data automatically, but also fuse these data, features and decisions adaptively through its deep-layered structure. DNN might be an ideal model to fuse multi-sensory data and detect faults of a mechanical system. However, although some applications [14–18] of DNN in feature learning and fault diagnosis with a single sensor have been found in recent years, no study, to the best of our knowledge, has investigated the effectiveness of DNN-based feature learning and the adaptive level fusion method for fault diagnosis. It is attractive and meaningful to investigate this adaptive fusion method, which can learn features from raw data automatically, and select and combine fusion levels adaptively. Deep convolutional neural networks (DCNNs), as one of the main types of DNN models, have been successfully used in mechanical fault diagnosis with automatic feature learning from single sensory data [9,14,19,20]. Benefitting from several unique structures, DCNN can achieve better results with less training time than standard neural networks. Firstly, DCNN has a large set of filter kernels in convolutional layers, which can capture representative information and patterns from raw data. Stacking these convolutional layers can further fuse information and build complex patterns; Secondly, DCNN is an unfully-connected network, where each filter shares the same weights. This structure can reduce both the training time and complication of the model. In addition, the pooling layer of DCNN further reduces the revolution of the input data as well as the training time, and improves the robustness of the extracted patterns (a detailed introduction to the DCNN model is presented in Section 2). Thus, DCNN should have great potential in processing the multi-sensory data of Sensors 2017, 17, 414 3 of 15 a mechanical system, which usually contains rich information in the raw data and is sensitive to training time as well as model size. Aiming to address the two problems of multi-sensor data fusion mentioned above, this paper proposes Sensors 2017, 17, 414 3 of 15 an adaptive data fusion method based on DCNN and applies it to detect the health conditions of a planetary Aiming from to address the two problems of the multi-sensor data fusion is mentioned above, thisfeatures paper from gearbox. Different conventional methods, proposed method able to (1) extract proposes an adaptive data fusion method based on DCNN and applies it to detect the health raw data automatically and (2) optimize a combination of different fusion levels adaptively for any specific conditions of a planetary gearbox. Different from conventional methods, the proposed method is fault diagnosis task with less dependence on expert knowledge or human labor. able to (1) extract features from raw data automatically and (2) optimize a combination of different Thefusion rest of the paper is organized as follows. In Section 2, the typical architecture of DCNN and levels adaptively for any specific fault diagnosis task with less dependence on expert an adaptive training method are briefly described. Section 3 illustrates the procedures of the proposed knowledge or human labor. method, theThe design of the DCNN model, and several comparative introduced to further rest of the paper is organized as follows. In Section 2, the typicalmethods architecture of DCNN and an adaptive training method are briefly described. Section 3 illustrates the procedures of the analyze the performance of the proposed method. In Section 4, an experimental system of a planetary method, thevalidate design ofthe the effectiveness DCNN model, and several comparative methods introduced to gearboxproposed test rig is used to of the proposed method. Finally, the conclusions further analyze the performance of the proposed method. In Section 4, an experimental system of a are drawn in Section 5. planetary gearbox test rig is used to validate the effectiveness of the proposed method. Finally, the conclusions are drawn in Section 5. 2. Deep Convolutional Neural Networks 2. Deep Convolutional Neural Networks 2.1. Architecture of Deep Convolutional Neural Networks 2.1. Architecture of Deep Convolutional Neural Networks DCNN is a type of DNN model inspired by visual system structure [11,21], and it has is a type of DNNfor model inspired by visual system [11,21], andinitimage has become become theDCNN dominant approach almost all recognition andstructure detection tasks and speech the dominant approach for almost all recognition and detection tasks in image and speech analysis analysis [22–24]. DCNN contains three kinds of layers [25], which are the convolutional layer, [22–24]. DCNN contains three kinds of layers [25], which are the convolutional layer, pooling layer, pooling layer, and fully-connected layer. As shown in Figure 1, the first several layers of a typical and fully-connected layer. As shown in Figure 1, the first several layers of a typical DCNN usually DCNN consist usuallyofconsist of a combination of two types of layers—convolutional followed by a combination of two types of layers—convolutional layers, followedlayers, by pooling poolinglayers—and layers—and is a fully-connected In the following part, we will thethe last last layerlayer is a fully-connected layer. Inlayer. the following part, we will describe thesedescribe threekinds kinds of layers detail. these three layersininmore more detail. Convolutional and pooling layer Input data Convolutional and pooling layer subsampling Convolutional filters Fully connected layer subsampling Feature maps Feature maps Softmax regression subsampling subsampling Convolution Convolution Feature maps Fully connect Feature maps subsampling subsampling Feature maps Feature maps Figure 1. A typical architecture of deep convolutional neural network (DCNN). Figure 1. A typical architecture of deep convolutional neural network (DCNN). The convolutional layer is composed of a number of two-dimensional (2D) filters with parameters. filters convolute with input data and obtain an output, as Theweighted convolutional layerThese is composed of a number of two-dimensional (2D) filtersnamed with weighted feature maps. Each filter shares the same weighted parameters for all the patches of the input data to maps. parameters. These filters convolute with input data and obtain an output, named as feature reduce the training time and complication of the model, which is different from a traditional neural Each filter shares the same weighted parameters for all the patches of the input data to reduce the network with different weighted parameters for different patches of the input data. Suppose the traininginput timeofand complication of the model, which is different from a traditional neural network the convolutional layer is 𝑿, which belongs to 𝑹𝐴×𝐵 , and A and B are the dimensions of the with different weighted different patches the inputasdata. Suppose the input of the input data. Then theparameters output of thefor convolutional layer can of be calculated follows [26]: A × B convolutional layer is X, which belongs to R 𝐶𝐶 , and A and B are the dimensions of the input data. 𝑙−1 𝑙 𝑙 follows [26]: Then the output of the convolutional layer be 𝑋calculated as (1) 𝐶𝑐𝑛 = can 𝑓(∑ 𝑐𝑐 ∗ 𝑊𝑐𝑛 + 𝐵 𝑐𝑛 ) 𝑐𝑐=1 CC Ccn = f ∑ cc=1 ! l −1 Xcc l ∗ Wcn + l Bcn (1) Sensors 2017, 17, 414 4 of 15 where Ccn is the cn-th output of the convolutional layer, and the output number is CN, which is also equal to the filter number; ∗ is an operator of convolution; Xcc represents the input data of cc-th channel l is the weight of cn-th filter of the current of previous layer l − 1, and the channel number is CC; Wcn layer l; the width and height of the filter are CW and CH, respectively; the cn-th bias is denoted with l ; f is an activation function, typically hyperbolic tangent or sigmoid function. bcn The pooling layer is a sub-sampling layer, which reduces the revolution of the input data and improves the robustness of learned features. A pooling layer generally follows a convolutional layer with a max pooling method and it outputs only the maximum of each sub-sampling patch of the feature maps to subsample the feature maps from the previous convolutional layer. The output can be described as follows [26]: Pcn = max Ccn (2) Ccn ∈S where Pcn is the cn-th output of the pooling layer, and the output number is CN; S is the pooling block size. This function will sum over each distinct S pooling block in the input data so that the output will become S times smaller along both spatial dimensions. The fully-connected layer is the last layer of the DCNN model. It follows several combinations of the convolutional layers and the pooling layers, and classifies the higher-level information from the previous layers. A fully-connected layer is similar to a traditional multilayer neural network with a hidden layer and a classification layer, typically using a softmax regression. Assuming that the task is a K-label problem, the output of the softmax regression can be calculated as follows: Oj = P(y = 1| x; θ ) P(y = 2| x; θ ) ... P(y = k | x; θ ) 1 = K ∑ j=1 exp θ ( j) x exp θ (1) x exp θ (2) x . . . exp θ (K ) x (3) where θ (1) , θ (2) , . . . θ (K ) are the parameters of the model, and O j is the final result of the DCNN. 2.2. Training Method l , and θ ( j) are the learnable parameters It can be seen from the previous description that wil , bcn and will be optimized through model training with gradient decent algorithms. Since a gradient decent algorithm is easily trapped into local optima, we introduce several enhancement methods, including stochastic gradient decent (SGD), cross-validation, and adaptive learning rate, to solve this problem. SGD updates gradient [27] based on a few training data instead of the entire training set. This approach not only increases the training speed, but also improves the training reliability. Cross-validation selects a validation set from training data to test the performance of the parameters of the model to avoid overfitting. Since a global constant learning rate easily causes either a slow convergence with a lower learning rate or a serious fluctuation of the convergence with a higher learning rate, an adaptive learning rate is employed. The adaptive learning rate has a high rate at first and decreases with the increase of the training epochs adaptively to obtain a fast and reliable convergence result. 3. Adaptive Multi-Sensor Data Fusion Method Based on DCNN for Fault Diagnosis 3.1. Procedure of the Proposed Method An adaptive multi-sensor data fusion method based on DCNN is presented to learn features from raw data automatically and combine fusion levels adaptively to detect faults of a planetary gearbox. Through its deep-layered structure, DCNN can fuse input data and extract basic features in the lower layers, fuse basic features into high level features and decisions in the middle layers, and further fuse these features and decisions in the higher layers to obtain the final diagnosis result. Sensors 2017, 17, 414 5 of 15 Figure 2 displays the flowchart of the proposed method: (1) four types of signals, including5vibration Sensors 2017, 17, 414 of 15 signal [3,28,29], acoustic signal [4,30], current signal [31–33], and instantaneous angular speed (IAS) (IAS)are signal [34,35],according are selected to published studies and acquired a planetary signalspeed [34,35], selected toaccording published studies and acquired from afrom planetary gearbox; gearbox; (2) data preprocessing applied to standardize each signal andthem divide them into segments; (2) data preprocessing is applied toisstandardize each signal and divide into segments; (3) each of (3) each of theof four of the fourare signal types are combined together simply one data the four segments thesegments four signal types combined together simply as one data as sample to form sample to form the data-level fused input data of the DCNN model; (4) DCNN is trained and tested the data-level fused input data of the DCNN model; (4) DCNN is trained and tested with these fused with these fused input data, and its output will be the diagnosis result of the planetary gearbox. The input data, and its output will be the diagnosis result of the planetary gearbox. The testing accuracy of testing accuracy of the output result is used to evaluate the effectiveness of the proposed method. It the output result is used to evaluate the effectiveness of the proposed method. It should be noted that should be noted that although we use a data-level fusion in the third step, data is fused again in the although we layers use a of data-level fusion in to thefurther third step, datathe is fused again inThe the DCNN startingimplicitly layers of the starting the DCNN model optimize data structure. DCNN model to further optimize the data structure. The DCNN implicitly contains data-level fusion, contains data-level fusion, feature-level fusion, and decision-level fusion through the deep-layered feature-level decision-level fusionofthrough the deep-layered structure and ittooptimizes structure fusion, and it and optimizes a combination these fusion levels adaptively according the characteristic of the fusion data itself. a combination of these levels adaptively according to the characteristic of the data itself. Multi-sensor Accelerometer Data preprocessing Data level fusion Deep convolutional neural networks based feature learning and data fusion Divide signals into segments Combine four segments into one data sample Input data samples Convolutional and pooling layer Diagnosis result Fully connected layer Microphone Current sensor Optical encoder Figure 2. Flowchart of the proposed method. Figure 2. Flowchart of the proposed method. 3.2. Model Design of DCNN 3.2. Model Design of DCNN The model of DCNN is adjusted to satisfy the characteristics of mechanical fault diagnosis. The model DCNN isof DCNN adjusted to satisfy the characteristics of mechanical Although mostofapplications in image recognition chose a 2D convolutional structure fault [11,22,36], and some researchers [20,37] of also used the way to diagnosechose mechanical faults, we diagnosis. Although most applications DCNN in same image recognition a 2D convolutional select[11,22,36], a one-dimensional (1D) convolutional structure with the a 1Dsame filter way bank to as diagnose the kernelmechanical of the structure and some researchers [20,37] also used DCNN model. In our opinion, the main reason behind the applications of the 2D convolutional faults, we select a one-dimensional (1D) convolutional structure with a 1D filter bank as the kernel of structure of DCNN in image analysis lies in the natural 2D space correlation in images. However, the DCNN model. In our opinion, the main reason behind the applications of the 2D convolutional most of the measurement data for mechanical fault diagnosis only correlate with time, which is a 1D structure of DCNN in image analysis lies in the natural 2D space correlation in images. However, parameter. Thus, 1D convolutional structure should be an appropriate choice for a DCNN-based most fault of the measurement data for mechanical fault diagnosis only correlate with time, which is a 1D diagnosis problem. In addition, we choose a larger filter size than conventional ones used in parameter. Thus, 1D convolutional be be anmore appropriate choice for of a DCNN-based image recognition. While a largerstructure size of theshould filter may expensive in terms computation, a fault diagnosis In addition, we choosebetween a largerthe filter thanaway conventional ones[38], used in image largerproblem. filter can capture more information datasize farther from each other which recognition. While a larger of the filter may be more expensive in terms of computation, a larger may be the features in thesize frequency domain. In spite of the information many benefitsbetween of the deep-layered structure DCNN, a “deep” structure filter can capture more the data farther awayoffrom each other [38], whichalso may be means complicated hyper-parameters as well as various choices of architectures, which increases the the features in the frequency domain. difficulty build an benefits appropriate anddeep-layered efficient model. Although there areaseveral In spite oftothe many of the structure of DCNN, “deep”researches structure [39,40] also means about the automatic optimization of parameters of DCNN, the optimized process is usually complicated hyper-parameters as well as various choices of architectures, which increases the difficulty time-consuming and easily converges into a local optimum due to the large number of parameters of to build an appropriate and efficient model. Although there are several researches [39,40] about the DCNN. Thus, we build the DCNN model initially based on a few general design principles [38,41]. automatic optimization of parameters of DCNN, the optimized process is usually time-consuming and Then several configurations of the network are tested using the experimental data, and the one with easilybest converges into aislocal optimum due toofthe performance selected as the setting thelarge finalnumber model. of parameters of DCNN. Thus, we build the DCNN model initially based on a few general design principles [38,41]. Then several configurations Sensors 2017, 17, 414 6 of 15 Sensors 17, 414are tested using the experimental data, and the one with best performance is selected 6 of 15 of the 2017, network as the setting of the final model. 3.3. Comparative Methods 3.3. Comparative Methods methods are employed to further test and confirm the performance of the Several comparative proposed method. The flowcharts comparative methods in Figure Several comparative methodsofare employed to furtherare testshown and confirm the3.performance of the To evaluate ability of learning features from the raw of the proposed proposed method.the The flowcharts of comparative methods aredata shown in Figure 3. method, manual feature extraction is used as a replacement and comparison of the feature learning method, in each To evaluate the ability of learning features from the raw data of the proposed comparative method. Eight time-domain featuresand andcomparison several frequency-domain features are manual feature extraction is used as a replacement of the feature learning in each extracted. Root mean square (RMS), kurtosis, crest factor, skewness, mean, minimum, maximum, comparative method. Eight time-domain features and several frequency-domain features are extracted. and variance are chosen the handcraft features in the time [34,37,42]. The characteristic Root mean square (RMS),as kurtosis, crest factor, skewness, mean,domain minimum, maximum, and variance frequencies ofthe the handcraft planetary features gearbox [1], including the rotating frequencies of the sun gear, planetary are chosen as in the time domain [34,37,42]. The characteristic frequencies of gear and the carrier, the pass frequency of the planetary gear and the meshing frequency of the the planetary gearbox [1], including the rotating frequencies of the sun gear, planetary gear and planetary are selected the handcraft features in the frequency domain well as their ten carrier, thegearbox, pass frequency of theasplanetary gear and the meshing frequency of the as planetary gearbox, sidebands of thefeatures sensor signals [34,43,44] except signal. For the current are selectedfor asall thetypes handcraft in the frequency domainfor asthe wellcurrent as their ten sidebands for all signal,ofthe frequency its sidebands generated the modulation of thesignal, characteristic types theline sensor signalsand [34,43,44] except [45] for the current by signal. For the current the line frequenciesand of its thesidebands planetary[45] gearbox are chosen its frequency-domain features.frequencies In addition, frequency generated by theas modulation of the characteristic of the fast Fourier-transform (FFT) energy of each sample, which is obtained by splitting the frequency planetary gearbox are chosen as its frequency-domain features. In addition, the fast Fourier-transform spectrum of each sample into 32 average bands and calculating the RMS of eachofband is also (FFT) energy of each sample, which is obtained by splitting the frequency spectrum each [46], sample into added intobands the handcraft features in the domain. features 32 average and calculating the RMS of frequency each band [46], is alsoWhile addedall intothe thedomain handcraft featuresare in processed together as the “handcraft features”, the “time-domain features” and “frequency-domain the frequency domain. While all the domain features are processed together as the “handcraft features”, features” are also tested respectively to reflect more features” information the sensitivity of the data. the “time-domain features” and “frequency-domain are about also tested respectively to reflect The comparison learningof features from data andbetween the handcraft features is marked more informationbetween about thethe sensitivity the data. Theraw comparison the learning features from in orange in Figure 3. raw data and the handcraft features is marked in orange in Figure 3. (a) Multi-sensor Data preprocessing (b) Feature learning Data preprocessing (c) Raw data Handcraft features DCNN, BPNN or SVM Results with manual feature extraction and data level fusion DCNN, BPNN or SVM Results with feature learning and feature level fusion Data preprocessing (d) Feature level fusion Feature level fusion DCNN, BPNN or SVM Results with manual feature extraction and feature level fusion Raw data DCNN, BPNN or SVM Decision level fusion Results with feature learning and decision level fusion Handcraft features DCNN, BPNN or SVM Decision level fusion Results with manual feature extraction and decision level fusion Raw data DCNN, BPNN or SVM Results with feature learning and single sensor data Handcraft features DCNN, BPNN or SVM Results with manual feature extraction and single sensor data Feature learning or Feature extraction Feature learning Data preprocessing DCNN or BPNN Handcraft features Feature learning Feature extraction Single sensor Results with feature learning and data level fusion Feature learning or Feature extraction Feature extraction Multi-sensor DCNN, BPNN or SVM Feature learning or Feature extraction Data level fusion Feature extraction Multi-sensor Raw data Feature learning Feature learning or Feature extraction Feature extraction Figure 3. Flowcharts of comparative methods: (a) Data-level fusion methods; (b) Feature-level fusion with single single sensory sensory data. data. methods; (c) Decision-level fusion methods; and (d) Methods with As a comparison of the DCNN model of the proposed method, two intelligent models, back-propagation neural networks (BPNN) and support vector machine (SVM), are introduced as replacements of DCNN in each comparative method. BPNN is built into a three-layer model with Sensors 2017, 17, 414 7 of 15 Sensorsactivation 2017, 17, 414 functions. SVM uses Gaussian radial basis function (RBF) as the kernel7 function of 15 sigmoid and the grid search method to optimize the parameters of the kernel. The three comparative models, DCNN, BPNN, and SVM, are marked in green 3. As a comparison of the DCNN model in ofFigure the proposed method, two intelligent models, For testing the performance of the different fusion levels, feature-level fusion back-propagation neural networks (BPNN) and support vector manual-selected machine (SVM), are introduced as and replacements decision-level explored and compared with is the data-level fusion ofmodel the proposed of fusion DCNN are in each comparative method. BPNN built into a three-layer with sigmoid functions. uses Gaussian radialfrom basis raw function (RBF) the kernel function method. Foractivation feature-level fusionSVM with feature learning data, onlyasDCNN and BPNN are and the method to optimize parameters of the kernel. The three comparative models, applied to grid learnsearch features from the rawthedata of each sensor, respectively. The outputs of the DCNN, BPNN, and SVM, are marked in green in Figure 3. second-last layers of DCNN are extracted as the learned features of each sensory data. Then, all the For testing the performance of the different fusion levels, manual-selected feature-level fusion and learned features from the four types of sensors are combined together as the feature-level fused decision-level fusion are explored and compared with the data-level fusion of the proposed method. features and used as the input of another DCNN for classification. In the same way, the outputs of For feature-level fusion with feature learning from raw data, only DCNN and BPNN are applied to the second-last layers of BPNN are extracted and fused. Then, the fused features are used as the learn features from the raw data of each sensor, respectively. The outputs of the second-last layers of inputDCNN of both andasSVM for classification. Thesensory comparison of different fusion features levels isfrom marked areBPNN extracted the learned features of each data. Then, all the learned in red in Figure 3. the four types of sensors are combined together as the feature-level fused features and used as the As a comparison of theformulti-sensory thethe proposed method, fault diagnosis input of another DCNN classification. input In the data same of way, outputs of the second-last layers ofwith single sensory data is also applied to the evaluate the effectiveness of the proposed method, which is BPNN are extracted and fused. Then, fused features are used as the input of both BPNN and SVM for classification. The comparison of different fusion levels is marked in red in Figure 3. marked in purple and shown in Figure 3d. As a comparison of the multi-sensory input data of the proposed method, fault diagnosis with single sensory is also applied to evaluate the effectiveness of the proposed method, which is 4. Experiment anddata Discussion marked in purple and shown in Figure 3d. 4.1. Experiment Setup 4. Experiment and Discussion An experimental system of a planetary gearbox test rig is established to evaluate the effectiveness of the proposed method. As shown in Figure 4, it consists of a one-stage planetary experimental of a planetary gearbox test rig is established to evaluate the20-tooth effectiveness gearbox, An a driven motor system and a magnetic brake. The planetary gearbox contains one sun gear of the proposed method. As shown in Figure 4, it consists of a one-stage planetary gearbox, a driven and three 31-tooth planetary gears, and transmits torque from the sun gear to the planetary carrier of motor andgears a magnetic The planetary gearbox contains one 20-tooth sun gear and three 31-tooth the planetary with brake. a standstill ring gear. planetary gears, and transmits torque from the sun gear to the planetary carrier of the planetary gears Figure 5 presents the four types of sensors employed in the experiment, including an with a standstill ring gear. accelerometer, microphone, current sensor, and optical encoder. Vibration signal, acoustic signal, Figure 5 presents the four types of sensors employed in the experiment, including an accelerometer, and motor current signal are measured by their corresponding sensors and acquired through a data microphone, current sensor, and optical encoder. Vibration signal, acoustic signal, and motor current acquisition box with a sampling frequency of 20 kHz and data length points. The of the signal are measured by their corresponding sensors and acquired throughofa 320 datakacquisition boxIAS with output shaft is calculated based on counting the number of high resolution pulses of the encoder a sampling frequency of 20 kHz and data length of 320 k points. The IAS of the output shaft is calculated [47]; based and down-sampling, using of the data acquisition box with the same frequencyusing and data on counting the number high resolution pulses of the encoder [47];sampling and down-sampling, theas data box with the same sampling frequency and data length as the other three signals. length theacquisition other three signals. 4.1. Experiment Setup Magnetic brake Motor Planetary gearbox Figure gearboxtest testrig. rig. Figure4.4.Planetary Planetary gearbox Sensors 2017, 17, 414 Sensors 2017, 17, 414 8 of 15 8 of 15 Sensors 2017, 17, 414 Accelerometer Microphone Accelerometer 8 of 15 Microphone Optical encoder Current sensor Optical encoder Current sensor Figure 5. 5. Four and their their installations. installations. Figure Four types types of of sensors sensors and Figure 5. Four types of sensors and their installations. Seven health conditions of the planetary gearbox are tested, including normal, pitting tooth, Seven health conditions of the planetary gearbox are tested, including normal, pitting tooth, Seven health conditions the crack planetary gearbox tested, including normal, pitting tooth, in chaffing tooth, chipped tooth,ofroot tooth, slight are worn tooth, and worn tooth. As shown chaffing tooth, chipped tooth, root crack tooth, slight worn tooth, and worn tooth. As shown in chaffing chipped tooth, in root tooth, gears. slight In worn tooth, and worn tooth. shown in Figure 6, alltooth, the faults occurred thecrack planetary each experiment, only oneAs planetary gear Figure 6, allall the faults in planetary gears. Ineach eachexperiment, experiment,only only one planetary gear Figure the faultsoccurred occurred inthe planetaryand gears. one planetary gear with one 6,kind of health condition isthe examined, allIn the other gears are normal. Experiments are with one kind of health condition is examined, and all the other gears are normal. Experiments are with one kind of health condition is examined, and all the other gears are normal. Experiments are conducted under three motor speeds (600 rpm, 1200 rpm and 1800 rpm) and a load-free condition. conducted under three speeds (600 rpm, 1200 1200rpm rpmand and1800 1800rpm) rpm)and and a load-free condition. conducted threemotor motor (600 rpm, aisload-free condition. The detailed under description for thespeeds datasets and pattern labels of the experiment shown in Table 1. The detailed description labels of ofthe theexperiment experimentisisshown showninin Table The detailed descriptionfor forthe thedatasets datasets and and pattern pattern labels Table 1. 1. (a) (a) (d) (d) (b) (b) (e) (e) (c) (c) (f) (f) Figure tooth; (b) (b) Chaffing Chaffingtooth; tooth;(c)(c)Chipped Chipped tooth; Root Figure6. 6.Faulty Faultyplanetary planetarygears: gears: (a) (a) Pitting Pitting tooth; tooth; (d)(d) Root Figure 6. Faulty planetary gears: (a) Pitting tooth; (b) Chaffing tooth; (c) Chipped tooth; (d) Root crack crack tooth;(e)(e)Slight Slightworn worntooth; tooth;and and (f) (f) Worn tooth. crack tooth; tooth. tooth; (e) Slight worn tooth; and (f) Worn tooth. Table1.1.Description Description of of the pattern pattern labels data. Table labelsof ofplanetary planetarygearbox gearbox data. Table 1. Description of the pattern labels of planetary gearbox data. PatternLabel Label Gearbox Gearbox Condition Condition Input Pattern InputSpeed Speed(rpm) (rpm) Load Load Pattern Label Gearbox Condition Input1200 Speed (rpm) Load 1 Normal 600, and 1800 Zero 1 Normal 600, 1200 and 1800 Zero Normal 600, 1200 and 1800 Pitting tooth 600, and 1800 Zero 22 1 Pitting tooth 600,1200 1200 and 1800 Zero Zero 2 Pitting tooth 600, 1200 and 1800 Zero 3 Chaffing tooth 600, 1200 and 1800 Zero 3 3 Chaffing tooth 600,1200 1200 and Zero Chaffing tooth 600, and 18001800 Zero 4 Chipped tooth 600, 1200and and 1800 Zero Zero Chipped tooth 600, 1800 4 4 Chipped tooth 600,1200 1200 and 1800 Zero 55 Root tooth 600, 1200and and 1800 Zero Zero Rootcracked cracked tooth 600, 1200 1800 5 6 Root cracked tooth 600, 600, 1200 and 1800 Zero Slightworn worn tooth 1800 6 Slight tooth 600, 1200 1200and and 1800 Zero Zero 6 7 Slight worn tooth 600,1200 1200 and Zero Worn tooth 600, and 18001800 Zero 7 Worn tooth 600, 1200 and 1800 Zero 7 Worn tooth 600, 1200 and 1800 Zero 4.2.4.2. Data Processing Data Processing 4.2. Data Processing The acquired signal, current currentsignal, signal,and andIAS IASsignal signal are preprocessed The acquiredvibration vibrationsignal, signal, acoustic acoustic signal, are preprocessed The acquired vibration signal, acoustic signal,method, current signal, and IAS signal are preprocessed following thethe steps in in Section 3.1. For the proposed the collected signals are standardized and following steps Section 3.1. For the proposed method, the collected signals are standardized following the steps in Section 3.1. For the proposed method, the collected signals are standardized divided into segments at first. A total of 1024 points are selected as a segment, so each type of signal and divided into segments at first. A total of 1024 points are selected as a segment, so each type of and divided into segments ateach first.health Afor total of 1024 points are motor selected as aand segment, so each of will contain segments condition under one speed 6552 segments in total signal will312 contain 312 for segments each health condition under one motor speed and type 6552 signal willhealth contain 312seven segments for each health under onefour motor speed 6552 segments in total for healththree conditions undercondition three motor speeds. Next, each of the four for seven conditions under motor speeds. Next, each of the segments ofand the four segments inof total for health under three speeds. Next, each of the four segments the fourseven signal typesasconditions are together asmotor one input data sample to the form the input signal types are combined together onecombined data sample to form the vectors of DCNN model. vectors of the DCNN model. In this way, each data sample will be a 4096-dimensional vector (four segments of the four signal types are combined together as one data sample to form the input In this way, each data sample will be a 4096-dimensional vector (four times the size of segments), times of thethe sizeDCNN of segments), there willeach be 6552 samples A total of 40% ofvector the 6552 vectors model. and In this way, datadata sample will in betotal. a 4096-dimensional (four times the size of segments), and there will be 6552 data samples in total. A total of 40% of the 6552 Sensors 2017, 17, 414 9 of 15 and there will be 6552 data samples in total. A total of 40% of the 6552 data samples are selected randomly as the training data set, 10% are used as the validation set, and the remaining 50% are selected as the testing data set. Eight trails are carried out to avoid particularity and contingency of the diagnosis result. The average testing accuracy of the eight trails is calculated as the final result. 4.3. Model Design The model of the DCNN is developed using the principles described in Section 3.2. For different input data, different model settings are tested, and the one with the best performance among all the tested settings is selected to process this input data. The model setting of the proposed method is displayed in Table 2, which consists of three convolutional layers, two pooling layers, and a fully-connected layer with softmax regression. The convolutional layer corresponds to Equation (1), the pooling layer to Equation (2) and the fully-connected layer to Equation (3). The DCNN model is developed based on C++. Table 2. Variables of parameters and structures of the deep convolutional neural networks. Layer Type Variables and Dimensions Training Parameters 1 2 Convolution Pooling CW = 65; CH = 1; CC = 1; CN = 10; B = 10 S=2 SGD minibatch size = 20 Initial learning rate = 0.05 3 Convolution CW = 65; CH = 1; CC = 10; CN = 15; B = 15 4 Pooling S=2 Decrease of learning rate after each ten epochs = 20% Momentum = 0.5 5 6 7 Convolution Hidden layer Softmax CW = 976; CH = 1; CC = 15; CN = 30; B = 30 Relu activation function 7 outputs Weight decay = 0.04 Max epochs = 200 Testing sample rate = 50% SDG = stochastic gradient decent; CW = filter width; CH = filter height; CC = filter channel; CN = number of filter in the bank; B = bias; S = sub-sampling rate; No overlapping of convolutional window and no padding. 4.4. Experimental Results 4.4.1. Results of Single Sensory Data Following the flowchart shown in Figure 3d in Section 3.3, methods with three intelligent models, feature learning and manual feature extraction methods, and four types of single sensor data are tested in the experiment. The results are displayed in Table 3. Table 3. Average testing accuracy of comparative methods with single sensory data. Sensory Data Model Feature Learning from Raw Data Vibration signal DCNN BPNN SVM Acoustic signal Manual Feature Extraction Time-Domain Features Frequency-Domain Features Handcraft Features 81.45% 42.56% 45.11% 55.84% 55.62% 56.35% 70.74% 69.03% 72.23% 73.64% 72.36% 73.86% DCNN BPNN SVM 66.23% 19.80% 26.54% 31.42% 35.89% 33.62% 76.45% 76.04% 77.36% 76.02% 75.79% 76.32% Current signal DCNN BPNN SVM 85.68% 52.36% 51.64% 60.73% 60.47% 63.74% 61.45% 61.21% 63.53% 76.85% 76.43% 78.76% Instantaneous angular speed (IAS) signal DCNN BPNN SVM 90.23% 51.37% 48.22% 75.34% 75.36% 75.68% 84.42% 85.22% 85.65% 88.34% 89.82% 89.85% DCNN = deep convolutional neural network; BPNN = back-propagation neural networks; SVM = support vector machine. Sensors 2017, 17, 414 Sensors 2017, 17, 414 10 of 15 10 of 15 Sensors 2017, 17, 414 10 of 15 4.4.2.Results ResultsofofMulti-Sensory Multi-SensoryData Data 4.4.2. 4.4.2. Results of Multi-Sensory Data Followingthe theflowcharts flowchartsshown shown in in Figure Figure 3a–c 3a–c in in Section Section 3.3, Following 3.3, methods methods with withthree threefusion fusionlevels, levels, Following the flowcharts shown in Figure 3a–c in Section 3.3, methods with three fusion levels, three intelligent models, two feature extraction methods, and multi-sensory data are tested three intelligent models, two feature extraction methods, and multi-sensory data are testedin inthe the three intelligent models,are twodisplayed feature extraction methods, andthe multi-sensory data are tested in the is experiment. The results in Table 4, in which result of the proposed method experiment. The results are displayed in Table 4, in which the result of the proposed method is marked experiment. TheFigure results are displayed in Tableresults 4, in which the result of thethe proposed method is in bold. presents the testing of the eight trails top three methods, inmarked bold. Figure 7 presents7 the testing results of the eight trails of the topofthree methods, which are marked in bold. Figure 7 presents the testing results of the eight trails of the top three methods, which are the proposed method, the DCNN model with feature learning and feature-level fusion, the proposed method, the DCNN model with feature learning and feature-level fusion, and the SVM which the proposed DCNNand model with feature learning and feature-level fusion, and the are SVM model with method, handcraftthe features feature-level fusion. model with handcraft features and feature-level fusion. andFinally, the SVMthe model with handcraft features and feature-level fusion. average testing accuracies of all the comparative methods in the experiment are Finally, thethe average testing accuracies of all methods in the are shown accuracies of the all comparative the comparative methods inexperiment the experiment are shownFinally, together inaverage Figure 8testing for a clearer comparison between each other. together in Figure 8 for a clearer comparison between each other. shown together in Figure 8 for a clearer comparison between each other. Table 4. Average testing accuracy of comparative methods with multi-sensory data. Table 4. 4.Average methodswith withmulti-sensory multi-sensorydata. data. Table Averagetesting testingaccuracy accuracyof of comparative comparative methods Manual Feature Extraction Feature Learning Manual Feature Manual Feature Extraction Extraction Handcraft Fusion Level Model Feature Learning Time-Domain Frequency-Domain Feature Learning from Raw Data Time-Domain Frequency-Domain Handcraft Fusion Model Fusion Level Level Model Time-Domain Frequency-Domain Handcraft Features Features Features from fromRaw RawData Data Features Features Features Features Features Features DCNN 99.28% 66.08% 87.63% 90.23% DCNN 99.28% 66.08% 87.63% 90.23% DCNN 99.28% 66.08% 87.63% 90.23% Data-level fusion BPNN 53.28% 65.95% 87.89% 91.22% Data-level BPNN 53.28% 65.95% 87.89% 91.22% BPNN 53.28% 65.95% 87.89% 91.22% Data-level fusion fusion SVM 51.62% 67.32% 87.28% 90.67% SVM 51.62% 67.32% 87.28% 90.67% SVM 51.62% 67.32% 87.28% 90.67% DCNN 98.75% 86.35% 92.34% 94.08% DCNN 98.75% 86.35% 92.34% 94.08% DCNN 98.75% 86.35% 92.34% 94.08% Feature-level fusion BPNN 64.74% 86.81% 92.15% 94.04% Feature-level BPNN 64.74% 86.81% 92.15% 94.04% BPNN 64.74% 86.81% 92.15% 94.04% Feature-level fusion fusion SVM 56.27% 86.74% 94.62% 95.80% SVM 56.27% 86.74% 94.62% 95.80% SVM 56.27% 86.74% 94.62% 95.80% DCNN 93.65% 84.65% 90.23% 92.19% DCNN 93.65% 84.65% 90.23% 92.19% DCNN 93.65% 84.65% 90.23% 92.19% Decision-level fusion BPNN 77.62% 84.47% 91.19% 93.42% Decision-level BPNN 77.62% 84.47% 91.19% 93.42% BPNN 77.62% 84.47% 91.19% 93.42% Decision-level fusion fusion SVM 76.17% 86.32% 90.98% 93.44% SVM 76.17% 86.32% 90.98% 93.44% SVM 76.17% 86.32% 90.98% 93.44% Figure Testing accuracyof ofeight eighttrials trialsof the top three the DCNN Figure Testing accuracy of eight trials of the the top top three three methods methods(the method, the DCNN Figure 7.7.7. Testing accuracy methods (theproposed proposedmethod, method, the DCNN model with feature learning and feature-level fusion, and the SVM model with handcraft features modelwith withfeature featurelearning learning and feature-level fusion, and SVM model with handcraft features model and feature-level fusion, and thethe SVM model with handcraft features and andfeature-level feature-level fusion). FL == feature feature learning; FE feature extraction; and fusion). FL learning; FE == manual manual feature extraction; DF= =data-level data-level feature-level fusion). FL = feature learning; FE = manual feature extraction; DF = DF data-level fusion; fusion; FF==feature-level feature-level fusion. fusion; FF FF = feature-level fusion. fusion. The proposed method The proposed method Figure 8. Testing accuracy of all the comparative methods. Figure8.8. Testing Testingaccuracy accuracy of of all all the the comparative comparative methods. Figure methods. Sensors 2017, 17, 414 Sensors 2017, 17, 414 11 of 15 11 of 15 4.5. Data and and Learned LearnedFeatures Features 4.5.Principal PrincipalComponent ComponentAnalysis Analysisof of the the Experimental Experimental Data PCA toanalyze analyzeand and visualize learned features the proposed As PCAisis employed employed to visualize the the learned features of the of proposed method.method. As shown shown in Figure 9, the labels 1 to 7 correspond to the seven conditions of the planetary gearbox in Figure 9, the labels 1 to 7 correspond to the seven conditions of the planetary gearbox described described Table andtwo theprincipal first two components principal components (PCs) arebyobtained by PCA tothe represent in Table 1,inand the1,first (PCs) are obtained PCA to represent useful the useful information hidden in the data. Figure 9a shows the result of the input data of the testing information hidden in the data. Figure 9a shows the result of the input data of the testing dataset of dataset of the proposed method the PCs. first two PCs. 9b illustrates result of the learned the proposed method along thealong first two Figure 9bFigure illustrates the result the of the learned features features with adaptive fusion levels of themethod proposed method testing the are dataset, which are with adaptive fusion levels of the proposed for testing thefor dataset, which extracted from extracted from thesecond-last output of layer the second-last layer the DCNN.the For comparison, the results the output of the of the DCNN. Forofcomparison, results of feature-level fusedof feature-level fused features learned DCNN andhandcraft feature-level fused handcraft features learned through DCNN andthrough feature-level fused features along the firstfeatures two PCsalong are the first two PCs are shown in Figure respectively. It should bedisplay noted that we only display the shown in Figure 9c,d, respectively. It 9c,d, should be noted that we only the first two PCs of the first two of thevisualization, data for a clearer which that there may some be overlaps between data forPCs a clearer whichvisualization, means that there maymeans be overlaps between categories and some categories andbemany of them can bePCs. divided into higher PCs. many of them can divided into higher (a) (b) (c) (d) Figure component analysis (PCA) of of thethe experimental data and learned features: (a) Figure9.9.Principal Principal component analysis (PCA) experimental data and learned features: Data-level fusedfused inputinput datadata of the proposed method; (b) (b) Features of the proposed method; (c) (a) Data-level of the proposed method; Features of the proposed method; Feature-level fused features learned through DCNN; features. (c) Feature-level fused features learned through DCNN;and and(d) (d)Feature-level Feature-levelfused fused handcraft handcraft features. 4.6. 4.6.Discussion Discussion 1.1. The is able able to to diagnose diagnosethe thefaults faultsofofthe the Theexperimental experimentalresults resultsshow showthat thatthe the proposed proposed method method is planetary best testing testingaccuracy accuracyininthe theexperiment. experiment.ItItcan can planetarygearbox gearboxtest testrig rigeffectively, effectively, yielding the best bebeseen from Table 4 and Figure 8 that the proposed method achieves the best testing accuracy seen from Table 4 and Figure 8 that method achieves the best testing accuracy 99.28% this result resultisissignificantly significantlycorrelated correlated 99.28%among amongall allthe thecomparative comparativemethods. methods. We We think that this with proposedmethod. method.DCNN DCNNcan canfuse fuseinput input withthe thedeep deeparchitecture architectureof ofthe the DCNN DCNN model model of the proposed dataand andlearn learnbasic basicfeatures featuresfrom from itit in in its its lower layers, fuse data fuse basic basic features featuresinto intohigher higherlevel level featuresorordecisions decisionsininits itsmiddle middlelayers, layers, and and further further fuse these features these features featuresand anddecisions decisionstotoobtain obtain thefinal finalresult resultininits itshigher higher layers. layers. Although Although there there is a data-level the data-level fusion fusionbefore beforeDCNN DCNNininthe the proposedmethod, method,DCNN DCNNstill stillactually actually fuses fuses the the data again in proposed in its its starting starting layers layerstotofurther further optimize the data structure. Optimized features and combinations of different level fusions are formed through this deep-layered model, which provides a better result than with manually selected features or fusion levels. Sensors 2017, 17, 414 2. 3. 4. 5. 6. 12 of 15 optimize the data structure. Optimized features and combinations of different level fusions are formed through this deep-layered model, which provides a better result than with manually selected features or fusion levels. The ability of automatic feature learning of the DCNN model with multi-sensory data is proven through the experiment. It can obviously be seen from Figure 8 that both the proposed method and the feature-level fusion method with feature learning through DCNN obtain a better result, 99.28% and 98.75%, than any other comparative methods with handcraft features or feature learning through BPNN. This result proves that the feature learning through DCNN with multi-sensory data can improve the performance of the multi-sensor data fusion method for fault diagnosis. In addition, the result also implies that the proposed method with adaptive fusion-level selection can achieve a better result 99.28% than the result 98.75% of the method with manual-selected feature-level fusion, which is the only difference between these two methods. However, the method with automatic feature learning of DCNN from the raw signal of a single sensor cannot achieve a better result than methods with handcraft features. Table 3 displays the diagnosis results using signals from a single sensor. Only with a vibration signal and current signal, can the DCNN-based feature learning method achieve better results than conventional methods with handcraft features. By contrast, the results of the DCNN-based feature learning method with an acoustic signal and IAS signal are worse than that of conventional methods. This implies that the DCNN-based method with learned features from single sensory data cannot provide stable improvements for all kinds of sensory data. We think that the performance of the DCNN-based feature learning is influenced by the characteristics of the input data. As can be seen from the results shown in Table 3, the performance of feature learning has a stronger positive correlation with the performance of time-domain features than frequency-domain features, which infers that the DCNN-based feature learning from a raw signal may be more sensitive to time-correlated features than frequency-correlated features. The effectiveness of the automatic feature learning and adaptive fusion-level selection of the proposed method is further confirmed through PCA. As can be seen from Figure 9a, most of the categories of the input raw data overlap each other, which makes it difficult to distinguish them. After the processing of the proposed method, the learned features with adaptive fusion levels along the first two PCs become distinguishable in Figure 9b. Meanwhile, Figure 9c,d presents the results of PCA with feature-level fused learned features and handcraft features as comparisons, respectively. The feature-level fused features learned through DCNN have just a slightly worse distinction between each category than the features of the proposed method, which not only verifies the feature learning ability of DCNN used in both methods, but also proves the better performance of the adaptive-level fusion of the proposed method than that of the manual-selected feature-level fusion. On the contrary, the fused handcraft features show a much worse distinction between different categories than the learned features of the proposed method. These analyses further demonstrate the effective performance of the automatic feature learning and adaptive fusion-level selection of the proposed method. While DCNN has a much better feature learning ability than BPNN, the three comparative models, DCNN, BPNN and SVM, obtain similar results with handcraft features. Figure 8 shows clearly that feature learning through DCNN achieves much better testing accuracies than through BPNN. Nevertheless, with handcrafts features, these three intelligent models provide similar accuracies, which suggests that DCNN cannot achieve much more improvements than conventional methods without using its ability of feature learning. Methods with multi-sensory data provide better results than those with single sensory data. It can be seen from Figure 8 that methods with multi-sensory data achieve higher testing accuracies than with single sensory data, no matter which fusion level or intelligent model is selected. This phenomenon indicates that multi-sensory data can improve the reliability and accuracy for fault diagnosis. Sensors 2017, 17, 414 13 of 15 5. Conclusions and Future Work This paper presents an adaptive data fusion method based on DCNN to detect the health conditions of planetary gearboxes. The processes of data-level fusion, feature-level fusion, decision-level fusion, feature learning, and fault diagnosis are all fused into one DCNN model adaptively. The proposed method can learn features from raw data, and fuse data, features, and decisions adaptively through the deep-layered structure of DCNN with fewer requirements of expert knowledge or human labor for feature extraction and fusion-level selection. The performance of the proposed method is evaluated through the experiment of the planetary gearbox fault test rig. As comparisons, feature-level fusion, decision-level fusion, handcraft features, single sensory data, and two traditional intelligent models, BPNN and SVM, are also tested in the experiment. The comparative results of the experiment verify the effectiveness of the proposed method, which achieves the best testing accuracy among all the comparative methods in the experiment. Our future work will focus on testing the DCNN model-based feature learning and data fusion approaches on more mechanical objects, fault modes, operation conditions, and sensor types, which can further confirm the effectiveness of approaches and help us to find out other useful application guidance. Moreover, due to the large number of parameters of deep learning models, manual parameter optimization often takes many trials-and-errors to find the best one, and conventional automatic searching methods are usually very time-consuming and easily converge into a local optimum. It is meaningful to investigate more effective and faster approaches to optimize the parameters automatically. Finally, combinations of different deep learning architectures should improve the effect of fault diagnosis. Adding recurrent architecture may make the model suitable to predict future fault conditions, and combining with auto-encoder architecture may improve the feature learning ability to capture more complex features. Acknowledgments: The authors would like to thank the National Natural Science Foundation of China (Grant No. 51475324), and the National Natural Science Foundation of China and Civil Aviation Administration of China jointly funded project (Grant No. U1533103) for their support. The authors would also like to thank the editors and anonymous reviewers for their constructive comments and suggestions. Author Contributions: Luyang Jing and Taiyong Wang conceived and designed the research; Ming Zhao designed and performed the experiment; Peng Wang contributed to the theory studies; Luyang Jing wrote the manuscript; and all the authors reviewed and revised the manuscript. Conflicts of Interest: The authors declare no conflict of interest. References 1. 2. 3. 4. 5. 6. 7. 8. Lei, Y.; Lin, J.; Zuo, M.J.; He, Z. Condition monitoring and fault diagnosis of planetary gearboxes: A review. Measurement 2014, 48, 292–305. [CrossRef] Khazaee, M.; Ahmadi, H.; Omid, M.; Banakar, A.; Moosavian, A. Feature-level fusion based on wavelet transform and artificial neural network for fault diagnosis of planetary gearbox using acoustic and vibration signals. Insight Non-Destr. Test. Cond. Monit. 2013, 55, 323–330. [CrossRef] Lei, Y.; Lin, J.; He, Z.; Kong, D. A Method Based on Multi-Sensor Data Fusion for Fault Detection of Planetary Gearboxes. Sensors 2012, 12, 2005–2017. [CrossRef] [PubMed] Li, C.; Sanchez, R.V.; Zurita, G.; Cerrada, M.; Cabrera, D.; Vásquez, R.E. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals. Mech. Syst. Signal Process. 2016, 76, 283–293. [CrossRef] Hall, D.L.; Llinas, J. An introduction to multisensor data fusion. Proc. IEEE 1997, 85, 6–23. [CrossRef] Khaleghi, B.; Khamis, A.; Karray, F.O.; Razavi, S.N. Multisensor data fusion: A review of the state-of-the-art. Inf. Fusion 2013, 14, 28–44. [CrossRef] Serdio, F.; Lughofer, E.; Pichler, K.; Buchegger, T.; Pichler, M.; Efendic, H. Fault detection in multi-sensor networks based on multivariate time-series models and orthogonal transformations. Inf. Fusion 2014, 20, 272–291. [CrossRef] Lei, Y.; Jia, F.; Lin, J.; Xing, S. An Intelligent Fault Diagnosis Method Using Unsupervised Feature Learning Towards Mechanical Big Data. IEEE Trans. Ind. Electron. 2016, 63, 3137–3147. [CrossRef] Sensors 2017, 17, 414 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 14 of 15 Ince, T.; Kiranyaz, S.; Eren, L.; Askar, M. Real-Time Motor Fault Detection by 1D Convolutional Neural Networks. IEEE Trans. Ind. Electron. 2016, 63, 7067–7075. [CrossRef] Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [CrossRef] [PubMed] Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed] Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2014, 61, 85–117. [CrossRef] [PubMed] Bengio, Y. Learning Deep Architectures for AI. Found. Trends® Mach. Learn. 2009, 2, 1–127. [CrossRef] Janssens, O.; Slavkovikj, V.; Vervisch, B.; Stockman, K.; Loccufier, M.; Verstockt, S.; Walle, R.V.D.; Hoecke, S.V. Convolutional Neural Network Based Fault Detection for Rotating Machinery. J. Sound Vib. 2016, 377, 331–345. [CrossRef] Jia, F.; Lei, Y.; Lin, J.; Zhou, X.; Lu, N. Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mech. Syst. Signal Process. 2016, 72–73, 303–315. [CrossRef] Lu, C.; Wang, Z.Y.; Qin, W.L.; Ma, J. Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification. Signal Process. 2016, 130, 377–388. [CrossRef] Sun, W.; Shao, S.; Zhao, R.; Yan, R.; Zhang, X.; Chen, X. A sparse auto-encoder-based deep neural network approach for induction motor faults classification. Measurement 2016, 89, 171–178. [CrossRef] Zhao, R.; Yan, R.; Wang, J.; Mao, K. Learning to Monitor Machine Health with Convolutional Bi-Directional LSTM Networks. Sensors 2017, 17, 273. [CrossRef] [PubMed] Abdeljaber, O.; Avci, O.; Kiranyaz, S.; Gabbouj, M.; Inman, D.J. Real-Time Vibration-Based Structural Damage Detection Using One-Dimensional Convolutional Neural Networks. J. Sound Vib. 2017, 388, 154–170. [CrossRef] Guo, X.; Chen, L.; Shen, C. Hierarchical adaptive deep convolution neural network and its application to bearing fault diagnosis. Measurement 2016, 93, 490–502. [CrossRef] Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 221–231. [CrossRef] [PubMed] Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference On Computer Vision and Patten Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; Lecun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv 2014, arXiv:1312.6229. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. Nebauer, C. Evaluation of convolutional neural networks for visual recognition. IEEE Trans. Neural Netw. 1998, 9, 685–696. [CrossRef] [PubMed] Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [CrossRef] [PubMed] Lei, Y.; Han, D.; Lin, J.; He, Z. Planetary gearbox fault diagnosis using an adaptive stochastic resonance method. Mech. Syst. Signal Process. 2013, 38, 113–124. [CrossRef] Sun, H.; Zi, Y.; He, Z.; Yuan, J.; Wang, X.; Chen, L. Customized multiwavelets for planetary gearbox fault detection based on vibration sensor signals. Sensors 2013, 13, 1183–1209. [CrossRef] [PubMed] Amarnath, M.; Krishna, I.R.P. Local fault detection in helical gears via vibration and acoustic signals using EMD based statistical parameter analysis. Measurement 2014, 58, 154–164. [CrossRef] Immovilli, F.; Cocconcelli, M.; Bellini, A.; Rubini, R. Detection of Generalized-Roughness Bearing Fault by Spectral-Kurtosis Energy of Vibration or Current Signals. IEEE Trans. Ind. Electron. 2009, 56, 4710–4717. [CrossRef] Kar, C.; Mohanty, A.R. Monitoring gear vibrations through motor current signature analysis and wavelet transform. Mech. Syst. Signal Process. 2006, 20, 158–187. [CrossRef] Sensors 2017, 17, 414 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 15 of 15 Lu, D.; Gong, X.; Qiao, W. Current-Based Diagnosis for Gear Tooth Breaks in Wind Turbine Gearboxes. In Proceedings of the Energy Conversion Congress and Exposition, Raleigh, NC, USA, 15–20 September 2012; pp. 3780–3786. Fedala, S.; Rémond, D.; Zegadi, R.; Felkaoui, A. Contribution of angular measurements to intelligent gear faults diagnosis. J. Intell. Manuf. 2015. [CrossRef] Moustafa, W.; Cousinard, O.; Bolaers, F.; Sghir, K.; Dron, J.P. Low speed bearings fault detection and size estimation using instantaneous angular speed. J. Vib. Control 2014, 22, 3413–3425. [CrossRef] Turaga, S.C.; Murray, J.F.; Jain, V.; Roth, F.; Helmstaedter, M.; Briggman, K.; Denk, W.; Seung, H.S. Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comput. 2010, 22, 511–538. [CrossRef] [PubMed] Chen, Z.Q.; Li, C.; Sanchez, R.V. Gearbox Fault Identification and Classification with Convolutional Neural Networks. Shock Vib. 2015, 2015, 390134. [CrossRef] Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Patten Recognition, Seattle, WA, USA, 27–30 June 2016. Bergstra, J.; Bardenet, R.; Kégl, B.; Bengio, Y. Algorithms for Hyper-Parameter Optimization. In Proceedings of the Advances in Neural Information Processing Systems, Granada, Spain, 12–15 December 2011; pp. 2546–2554. Zhang, L.; Suganthan, P.N. A Survey of Randomized Algorithms for Training Neural Networks. Inf. Sci. 2016, 364–365, 146–155. [CrossRef] Abdel-Hamid, O.; Li, D.; Dong, Y. Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition. In Proceedings of the INTERSPEECH, Lyon, France, 25–29 August 2013; pp. 1173–1175. Kane, P.V.; Andhare, A.B. Application of psychoacoustics for gear fault diagnosis using artificial neural network. J. Low Freq. Noise Vib. Act.Control 2016, 35, 207–220. [CrossRef] Fedala, S.; Rémond, D.; Zegadi, R.; Felkaoui, A. Gear Fault Diagnosis Based on Angular Measurements and Support Vector Machines in Normal and Nonstationary Conditions. In Condition Monitoring of Machinery in Non-Stationary Operations; Springer: Berlin/Heidelberg, Germany, 2016; pp. 291–308. Li, C.; Sánchez, R.V.; Zurita, G.; Cerrada, M.; Cabrera, D. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning. Sensors 2016, 16, 895. [CrossRef] [PubMed] Lu, D.; Qiao, W. Adaptive Feature Extraction and SVM Classification for Real-Time Fault Diagnosis of Drivetrain Gearboxes. In Proceedings of the Energy Conversion Congress and Exposition, Denver, CO, USA, 15–19 September 2013; pp. 3934–3940. Abumahfouz, I.A. A comparative study of three artificial neural networks for the detection and classification of gear faults. Int. J. Gen. Syst. 2005, 34, 261–277. [CrossRef] Li, Y.; Gu, F.; Harris, G.; Ball, A.; Bennett, N.; Travis, K. The measurement of instantaneous angular speed. Mech. Syst. Signal Process. 2005, 19, 786–805. [CrossRef] © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).