NTU/Intel M2M Project: Wireless Sensor Networks Content Analysis and Management Special Interest Group Data Analysis Team Monthly Report 1. Team Organization Principal Investigator: Shou-De Lin Co-Principal Investigator: Mi-Yen Yeh Team Members: Chih-Hung Hsieh (postdoc), Yi-Chen Lo (PhD student), Perng-Hwa Kung (Graduate student), Ruei-Bin Wang (Graduate student), Yu-Chen Lu (Undergraduate student), Kuan-Ting Chou (Undergraduate student), Chin-en Wang (Graduate student) 2. Discussion with Champions a. Number of meetings with champion in current month: 1 by phone b. Major comments/conclusion from the discussion: submission to KDD and MM next year 3. Progress between last month and this month a. Topic1: Video Summarization using MSWave. 1) Evaluation for single camera: 2) Camera: - Mehtod1: Take a frame if d(q,f) > thredshold - i. Take a frame if d(q,f) > thredshold ii. Result Mehtod2: i. ii. iii. iv. threshold *= 0.5 threshold *= 0.5 : no frame retrieved threshold *= 0.2: see next page Result 3) Euclidean Distance - Data: 200 sample of pamap (dim=105) Euclidean distance - Compare: inner product b. Topic2: Distributed Nearest Neighbor Search of Time Series Using Dynamic Time Warping 1) Framework 2 Initialization: - The following table shows the small difference of performance between the real cases and the idea ones. 2) Framework v.s. Naive - Run each dataset with random parameters for 100 times in the UCR 45 datasets - 45 * 100 = 4500 instances i. 1434 “1” (32%): Framework better than Naive ii. 3066 “-1” (68%): Framework worse than Naive Reasons for many “-1”: Small datasets, small T, large S / M, K > S / 2… - Machine Learning i. Feature selection: M, K are still dominant ii. LibLinear: 74.3111% accuracy iii. LibSVM: 88.8% accuracy 3) Equal v.s. Unequal Size - The following table shows the difference of performance between the equal-sized and unequal-sized segmentation methods. c. Topic 3: Intelligent Transportation System (ITS) Machine Learning: Predict whether driver will stop at intersection or not without using video data. 1) Building prediction model to predict whether driver will stop or not on intersection: - Unlike used only data generated on the intersection, recently, we tried to generate ground truths among the whole trajectory. - Using a sliding window to scan and label the ground truths of trajectory segments covered by the window. - However, this method will result in a extremely unbalanced dataset (too much non-stopping cases) and may loss some important stopping cases happened in the red-squared regions. The subsequent model does not work well. Therefore we will modify the ground truth generating method as that: we first focus on scanning and generating all the stopping cases without missing any one of them, then we try to randomly generate the non-stopping cases of equal amount. 1) Another issue needed to be addressed: An Efficient Way to Generate Dataset for Identifying Driving Behaviors - Problem Statement: To the best of our knowledge based on previous studies so far, because of the huge cost to go through and mark happening driving events in the whole trajectories and corresponding time-series data by human effort, there are few large-scaled datasets to build an accurate intelligent transportation system for identify driving behaviors. Although the computer-aiding way can help to reduce the high cost of marking happened driving events among the whole ITS trajectories, however the existent methods are lacking of flexibility (sliding-window-based approaches) or have only application-specific usages (computer-vision-based ones), such that the available usages of these methods are still limited. The most important issues to be addressed when labeling the ground truths with computer aiding are the followings: 1) the events will occur as fragments starting from any positions of the whole trajectory; 2) the same events derived by even the same drivers or not often vary in length. - Hypothesis: We believe that there should be some patterns existing among instances of the same event coming from different trajectories, such that these patterns can be conserved and discovered when we align these trajectories together. - Expected Contribution: A more efficient and general way to extract instances of pre-defined events from large amount of raw trajectories and time-series data will be proposed. Further, the conserved patterns derived from the alignment result will provide significant knowledge to generate informative attributes describing driving events or to cluster drivers into categories representing different tendencies. Those derived information will be adopted for building an accurate model to identify improper driving event and to improve driving safety. 4. Brief plan for the next month a. We will continuous paper survey and refine our proposed approaches. b. To implement our proposed approaches and evaluate their performance. 5. Research Byproducts a. Paper: N/A b. Served on the Editorial Board of International Journals: N/A c. Invited Lectures: N/A d. Significant Honors / Awards: N/A