NTU/Intel M2M Project: Wireless Sensor Networks Content Analysis and Management Special Interest Group Data Analysis Team Discriminant Classification Sub-Team Graphical Learning Sub-Team Anomaly Detection Sub-Team Pattern Mining Sub-Team Monthly Report: December, 2011 1. Team Organization Principal Investigator: Shou-De Lin Co-Principal Investigator: Yung-Jen (Jane) Hsu Team Leader: Todd McKenzie Team Members: Peng-Hua Gong, Hsun-Ping Hsieh, Fu-Chun Hsu, Chung-Yi Li, Ting-Wei Lin, WeiLun Su, En-Hsu Yen, Tu-Chun Yin 2. Discussion with Champions Number of meetings with champion in current month: 2 (1/9, 1/16-1/18) 3. Progress between last month and this month Graphical Model Team For this period, we focus on developing the indexing technique for large-scale pattern recognition problem. Current state-of-the-art approach for Pattern Recognition and Machine Learning (like SVM, Graphical Model…) cannot scale to problem with large number of pattern/observation. In our research, we try to exploit indexing technique, which is widely used for search engines, to solve scaling problem of Pattern Recognition in general. Many state-ofthe-art Machine Learning models will become feasible in large scale sensor network, if such technique can be successfully applied. In current setup, we transfer our target problem (ex. detection, classification) into Nearest Neighbor Search Problem in high dimensional feature space and obtain quick search via our indexing. We design a new indexing method “KernelSplitTree” to index observed data and parametric models in kernel space instead of working on explicit feature space. In our system, given observation we can quickly find relevant pattern models, while given a model, we can quickly detect corresponding pattern from indexed observation. Compared with existing machine learning approach, we get speedup in the training, detection, and classification phase by several magnitude, with O(D*NolnNo ) and O(D*NMlnNM) time building index at the beginning, as in the following: Inferring Social Relationships 1. Literature Survey for Co-location feature or Spatial-Temporal Co-occurrence: • • • • 2. Exploiting Place Features in Link Prediction on Location-based Social Networks(KDD 2011) – Place features, Social features and Global features – Supervised learning framework Inferring social ties from geographic coincidences(PNAS 2010) – Spatial-Temporal Co-occurrence – Probabilistic model approach A Geo-Social Model: From Real-World Co-occurrences to Social Connections(Journal for DATABASES IN NETWORKED INFORMATION SYSTEMS ) – Time series pattern to identify the similarity for each pair Finding Your Friends and Following Them to Where You Are (WSDM 2012) – Text, co-location and topology of the underlying friendship graph feature from twitter – Probabilistic model approach Transform our predicting model into a large-scale dataset, the check-in dataset from Gowalla • • 10-fold cross-validation (Liblinear):Precision: 85.2%, Recall:43.5% Apply on testing data (Liblinear): Precision: 79.63Recall: 32.1% Activity Inference from Sensor Network Data This month, our sub-team continued to examine the problem of learning and inferring activities from a user using data from the event log of other users. Two possible methods include using frequent pattern based profiles and transfer learning based methods. The former idea generates a list of frequent episodes from observed data, and creates a weight matrix to value the event periods. The latter idea tries to find a linear mapping from the list of events happening at each location encountered by the original user to the lists of events happening at other locations encountered by the other users. Each has received considerable interests. Finally, we fortunately obtained the tracking record from the other four users from the original dataset provided by Mr. Zhao. The data collected from the other four users are of shorter length, and the corresponding trajectory of locations has less distinctive mappings to the locations the original user travelled. How to actually employ the investigated methods effectively is still our current research direction, and the question will be examined more thoroughly in the coming month. Classification This month the classification keeps exploring the algorithm for missing recover in sensor network. We have successfully incorporated the temporal correlation into our matrix factorization model, and now we are designing the mathematical formulation to include the spatial correlation and correlation among different attributes like temperature and humidity. Besides, we are also conducting experiment on different dataset, to make sure our algorithm is general and validate our finding. Local Sensor Network Setup Produce the data by controlling the voltage, there are four different scenarios in two environments, the goal of this data set is to produce at least 100 data. • Jan: 3.2V ~ 2.0V in 1 hours, (0.2V/min) Two environment: bathroom, balcony Different scenarios: 1 with full-charge power, 1 with artificial power 1 with full-charge power, 2 with artificial power 1 with battery power(full), 1 with artificial power 1 with full-charge power, 2 with artificial power 3.2V-2.0V, 2.6V-2.0V -> Different exhaustion rate Survey near all the papers discussing sensory data imputation, there are several papers mentioning the spatial-temporal correlation of the data. Several baselines methods are survey and listed, they have the characteristic of easy to implement. 1. Linear interpolation: temporal 2. Moving average: temporal 3. Hybrid-KNN: spatial+temporal 4. Correlated imputation: spatial+temporal 5. Recent sliding window: temporal 6. Replaced by certain: temporal 7. Adaptive weight adjustment: temporal+spatial 8. Multiple regression: spatial 9. Support vector regression Some of the above literatures will be implemented as the comparing methods to our RF algorithm.4 5. Research Byproducts 5.1 Papers: N/A (1) International Journal (2) International Conference (3) Domestic journal (4) Domestic Conference (5) Highly Cited Articles 5.2 Served on the Editorial Board of International Journals Journal of Social Network Analysis and Mining 5.3 Invited Lectures Intel/NTU Symposium on 1/17, Data Analysis Presentation by Professor Shou-De Lin 5.4 Significant Honors / Awards N/A