MATEC Web of Conferences 4 4, 0 2 0 10 (2016 ) DOI: 10.1051/ m atecconf/ 2016 4 4 0 2 0 10 C Owned by the authors, published by EDP Sciences, 2016 Analysis of space payload operation modes based on divide-and-conquer clustering Feng Si1,2,a, Bao Jun Lin1 and Shan Cong Zhang3 1 Academy of Opto-Electronics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China; 3 Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing, China 2 Abstract. With the development of space electronic technology, the space payload operation modes are more and more complex, and manual interpretation is prone to errors for much workload. Generally the space payload’s operation modes are reflected by its telemetry data. By analysing the characteristics of the payload telemetry data, it is proposed an automatic analysis method of payload operation modes based on divide-and-conquer clustering. The clustering method combines division and incremental clustering. The principle of the method is introduced and the method is validated using the actual payload telemetry data. Furthermore the improved method is proposed to the problems encountered. Experimental results show that divide-and-conquer clustering method has the feature of calculation simple and classification accurate, when applied to the classification of payload operation modes. Furthermore this method can be applied to the other areas of payload data processing by extending the method. 1 Introduction The operation modes of space payload are much more complex, as the system size becomes larger [1]. It can be identified by the telemetry data related to the payload in orbit. The mass telemetry data will be produced every day and has large differences between different kinds of payloads in general. Traditional methods either depend on the manual judgment, or the rules according to the prior knowledge of the designer. The rules are more complex to design in the transient between operation modes. Furthermore the rules designed for specific payloads are often difficult to expand to other payloads. In order to reduce the error rate of manual interpretation and complexity of rule-based method, we present a novel method based on data mining technology to find system structure hidden in telemetry data and the operation modes of payload automatically. There are many kinds of technology in data mining field [2-3]. According to the characteristics of the payload telemetry data, we choose data clustering method to identify the payload operation modes. Data clustering [4-7] belongs to the category of unsupervised learning, which is a discovery process that groups data set such that the intracluster similarity is maximized and the intercluster similarity is minimized. In the absence of any prior knowledge, we don’t know the payloads how to work and how many operation modes do they have. In this case, the method of data clustering can help us to find the patterns in the underlying telemetry data. a In this paper, we present a novel clustering algorithm called divide-and-conquer clustering that applies group method and incremental clustering to classify payload telemetry data [8-11]. We demonstrate the effectiveness of divide-and-conquer algorithm in huge actual telemetry data of payloads. Experimental results show that this algorithm can discover the payload operation modes simply and effectively. Finally, we apply this algorithm to other fields of payload data processing. 2 Divide-and-Conquer Algorithm According to the payload’s prior work rule in orbit, much more telemetry data was generated every day (some greater than one hundred thousand records). There will be great amount of calculation if we do data clustering using one day’s records directly. In order to solve this problem, we adapt the divide-and-conquer clustering algorithm. Firstly, the telemetry data will be grouped in order and then we calculate the similarity of the records in the group, complete the clustering intra-group. Secondly, within groups after clustering, eigenvector is calculated for each category. Thirdly, the similarity between groups will be calculated using eigenvectors of each category. The process will be iterative until all records are classified. The process of clustering is as follows: (1) Data pre-processing Scan all the parameters in the data set and find out the characteristics of each type of data parameters, such as the parameter’s data range, discrete or continuous values, Corresponding author: sifeng04@126.com This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits XQUHVWULFWHGXVH distribution, and reproduction in any medium, provided the original work is properly cited. Article available at http://www.matec-conferences.org or http://dx.doi.org/10.1051/matecconf/20164402010 MATEC Web of Conferences or variation of the parameter. For the voltage telemetry, temperature telemetry, or some count values of payload, it can be applied to calculate directly. But for some state variables, the numerical value only represents a different state and these state variables can’t be applied for calculation directly. These data should be pre-processed for calculation [12-13]. (2) Normalization All data will be normalized using the following formula and they will be limited in the range of [0, 1] after normalization. xik xik min xik 1i n max xik min xik first group’s 100 records and their partner records with highest similarity. k 1, 2,..., m (1) 1i n 1i n (3) Grouping All the records in the data set will be grouped. In general, the space payload operation modes are continuous and consistent unless in fault condition. The process of grouping can be operated in the sequence of data in the data set. (4) Clustering intra-group Calculate the similarity between any two records in the group. We’ll cluster the records according to the similarity results. After clustering intra-group, we’ll calculate the eigenvector of each category, which represents the category’s centroid. The similarity between two records can be analysed in the fractional distance metrics [14]. The formula is as follows: d 1/ f distdf ( x, y ) ( xi yi ) f , f (0,1) (2) i 1 (5) Clustering inter-group After clustering in the groups, we’ll cluster between groups using the selected eigenvectors of each category. It may be grouped continually if we have many more categories after clustering inter-group. Cluster until the terminal conditions. (6) Evaluation of clustering result In order to evaluate the performance of the classification, many other methods may be applied for contrast, such as manual inspection of payload operation modes, rule-based method etc. 3 Experimental Results and Discussion 3.1 The original clustering In this section, we validate our divide-and-conquer algorithm with an actual telemetry data set of one earth observation payload. The data set contains several thousand telemetry records and we choose 300 records for test. Each record includes 206 parameters. Then we’ll do the data set clustering in the algorithm above. We do data pre-processing, normalization and divide the data records into three groups and each group has 100 records. Then we calculate the similarity of records between each other in each group. According to the similarity result, we can find records with the highest similarity to each record. But we can’t classify the records only by the similarity result. Figure 1 shows the Figure 1. The record and its partner record with highest similarity 3.2 Analysis of clustering failure By analysis the original records in the first group, we have found two main reasons for clustering failure. The first reason is disturbance of some irrelevant parameters. Searching all parameters in the record manually, we can find some parameters which have drastic changes between different records. But the drastic changes don’t reflect the system operation modes in macroscopic view. For example, the payload heartbeat signal, reversed once between adjacent records, reflects only the normal state not the operation modes change. The second reason is parameter multiplexing problem. Some parameters are multiplexed in system design, and each cycle represent different physical meanings. Sometimes there is great numerical difference in different cycle, but this difference can’t reflect the switch of system mode. 3.3 Modified clustering method In order to solve the above problems, we’ll take the following measures. Firstly, we’ll apply weighted parameters instead of original parameters in similarity calculation. According to the characteristics of the parameters distribution, we’ll dispatch different weights of parameters. Secondly, we can avoid the parameter multiplexing problem in system planning and design phase. Alternatively we’ll use parametric subdivision for the parameters of multiplexing in the original system architecture. The subdivision parameters will reflect the system operation modes. By taking the above two measures, we re-calculate the similarity of 100 records in the first group. The similarity calculation results show that the similarity of the same mode records is very high and mostly greater than 0.999. By manual analysis of 100 records in the first group, the group can be divided into four clusters, the 1~24 records for the compression-and-storage mode, the 25~50 records for the compression system power-off mode, the 51~82 records for the transmit system power-on mode, the 83~100 records for the system idle mode. The calculation result of modified clustering algorithm is consistent with 02010-p.2 ICEICE 2016 manual analysis. Figure 2 shows the first group’s 100 records and their partner records with highest similarity. Contrast to the result of the clustering algorithm, the result of manual analysis is basically the same. The following table shows the results obtained by manual analysis. Table 1. The Result of Manual Analysis Clustering Record Operation modes Number Number Figure 2. The record and its partner record with highest similarity (using modified clustering algorithm). Following the same algorithm, we get the other two group records’ intra-group similarity. Figure 3 shows the three group records’ intra-group similarity. In the figure, we can see that the first group is divided into four clusters, the second group has two clusters and the third group has three clusters. Figure 3. Clustering of every group After completing the intra-group clustering, we’ll calculate the centroid of each category. Using the selected centroid of each category, we’ll calculate the similarity between categories. The result shows that the fourth category and fifth category can be merged into one cluster, sixth category and seventh category can be merged into one category. Thus the original nine categories can be combined into seven categories. Figure 4 shows the result of second round calculation. Figure 4. Clustering of three groups 1 1~3 Data Download from Channel 1 2 4~24 Channel 1 Close 3 25~50 Compression Stop 4 51~53 Compression Power Off 5 54~82 Transmission Close 6 83~188 Camera 1 Power Off 7 189~253 Compression and Storage 8 254~283 Idle 9 284~300 Data Download from Channel 2 When the clustering algorithm above is applied to the rest of the data set, we can also obtain a good classification and get the accurate payload operation modes. 3.4 Discussion on how to divide data set In the above chapter, we have discovered that the payload operation modes can be automatically analysed using divide-and-conquer clustering to the payload telemetry data. But how to choose the best dividing method is still a problem. We’ll discuss the best group method below. We’ll take into account the processing time and the clustering accuracy as the evaluation criterion. Using the data set above, we get the result of different dividing method. Figure 5 shows the calculation result. In the figure, the relative processing time is calculated by the ratio of the spending time of all dividing methods and the un-dividing method. The clustering accuracy is got by the ratio of successful clustering records and all the records. Figure 5. Relative processing time and clustering accuracy From the figure, we can see that with the increasing records number in each group, the processing time is dramatically increased, but with the more clustering accuracy. Furthermore, the increase of the processing time is synchronous with increase of the records number in each group. But the clustering accuracy is no longer 02010-p.3 MATEC Web of Conferences increased when the clustering accuracy reaches a balance point. So we can get the transition point by the balance of the processing time and clustering accuracy. From figure 5, we can see that when the data set is divided into 3 groups and each group includes 100 records, the clustering accuracy will reach 100% and the processing time is only half of the un-divided method. It’s a better choice for the payload operation modes clustering. 4 Conclusions In this paper, we have presented a divide-and-conquer clustering algorithm for analysing the payload operation modes. The algorithm takes into account the group method, weighted parameter and parametric subdivision, which successfully solves the mass data, disturbance parameter and parameter multiplexing problem. Experimental results on actual telemetry data set show that the modified algorithm can successfully identify the payload’s operation modes automatically. The method may be extended to the real-time processing of payload by creating the payload characteristics library. Furthermore when we have acquired all the operation modes and characteristics of payload, we can do some fault detection and fault diagnosis for the payload. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Y.F. He, G.H. Zhao, C.M. Lv, Journal of University of Chinese Academy of Science, 28, 4(2011) J. Han, M. Kamber, Data mining: concepts and techniquesˈ187~198(2006) A.K. Jain, M.N. Murty, P.J. Flynn, ACM Computing Surveys, 31, 3(1999) P. Hansen, B. Jaumard, Math. Program., 79, 13(1997) A.K. Jain, R.C. Dubes, Englewood Cliffs Prentice Hall, 32, 2(1988) L. Kaufman, P. Rousseeuw, Finding groups in data:An introduction to cluster analysis, 1990 C.H. Cheng, IEEE Transactions on computers, 24, 9(1975) M.N. Murty, G. Krishna, Pattern Recognition, 12(1980) W.F. Eddy, A. Mockus, S. Oue, Computational Statistics & Data Analysis, 23, 1(1996) G.A. Carpenter, S. Grossberg, Neur Netw, 3, 2(1990) S. Asharaf, M.N. Murty, Pattern Recognition, 36, 12(2003) H.L. Chen, K.T. Chuang, M.S. Chen, ICDM, 2005 H.L. Chen, M.S. Chen, S.C. Lin, IEEE Transaction Knowledge and Data Engineering, 21, 5(2009) G. Karypis, E.H. Han, V. Kumar, IEEE Computer, 32, 8(1999) 02010-p.4