NTU/Intel M2M Project: Wireless Sensor Networks Content Analysis and Management Special Interest Group Data Analysis Team Monthly Report 1. Team Organization Principal Investigator: Shou-De Lin Co-Principal Investigator: Mi-Yen Yeh Team Members: Chih-Hung Hsieh (postdoc), Yi-Chen Lo (PhD student), Perng-Hwa Kung (Graduate student), Ruei-Bin Wang (Graduate student), Yu-Chen Lu (Undergraduate student), Kuan-Ting Chou (Undergraduate student), Chin-en Wang (Graduate student) 2. Discussion with Champions a. Number of meetings with champion in current month: many times (during F2F meeting) b. Major comments/conclusion from the discussion: discuss the future topics and directions 3. Progress between last month and this month a. Topic1: Video Summarization using MSWave. 1) About manuscript to submit: - 2104 IEEE International Conference on Image Processing Title: Efficiency Multi-view Keyframe Extraction on Distributed Video Sensor Network Estimated date of completion: 2/14 2) Experiment Results: - Dataset: bl2, lobby, office - Recall: |events we found| / |all events| Precision: |frames in event| / |frames we choose| Bandwidth Saving: 1 - |mswave| / |naive| Dataset: bl2, lobby, office bl2: - - office: - lobby: b. Topic2: Distributed Nearest Neighbor Search of Time Series Using Dynamic Time Warping 1) Setup of experiments. - Initialization Approaches i. Oracle (impossible in practice) 1. Assume we know which sites have kNN 2. Initialization: Send exact query to those sites ii. Our approach 1. Assume the order of LB = the order of DTWs 2. Initialization: Send exact query to sites that have top K lower bounds iii. Naive approach 1. Assume we have no idea how to choose sites for initialization 2. - Initialization: Send exact query to sites that have random K time series Initialization Comparison: Top: Oracle; Medium: Our; Bottom: Naive - Pruning Site Order: Top: LB order; Bottom: Random; Order: Smallest 1st-level lower bound of each site - Big Data: i. Synthetic dataset ii. iii. 1. 10000 times series of length 10000 2. Random walk Parameters 1. S = 9999 2. K = 10 3. M = S / 2, S / 4, S / 8, S / 16, S / 32 Experiment process 1. 2. 3. iv. Randomly select a time series as the query Run 100 times for each group of parameters Bandwidth ratio = (Framework bandwidth) / (Naive approach bandwidth) Big Data Performance: Still has good performance 2) Future work - Design experiment presentation Finish writing the paper c. Topic 3: Intelligent Transportation System (ITS) Machine Learning: Predict whether driver will stop at intersection or not without using video data. 1) Extract Stop & Non-stop cases from all users. - Total 135 users. i. Most of them doesn’t have valid trajectories. (we will re-check this situation.) ii. 44 users provides the stop & non-stop cases. iii. Total: 65819 cases - 1. Positive Samples = 38909 (stop) 2. Negative Samples = 26910 (non-stop) Experiment Setup i. random partition training : testing = 2:1 ii. The best 5-CV rate for training set: 74.8% iii. Accuracy for testing set= 0.699342 2) PCA for feature reduction - Appling PCA to 91 features of training set. i. Transform training and testing sets based on the resulted principal components. - Accuracy on Testing set. i. Original: 0.699342 ii. The first 7 components: 0.5381 iii. 91 PCA components: 0.58976 3) Combine the driver type. - 39 drivers have driver type i. 31 normal drivers 1. 2. ii. iii. 24731 stop cases 18956 non-stop cases 8 aggressive drivers 1. 2009 stop cases 2. 712 non-stop cases Labeling rule: One driver is aggressive if he's ratio of aggressive trajectory is higher than a threshold t which indicates mean added 1 sigma computed from 81 drivers. - Experiment results i. 5-CV on 31 normal drivers: 1. ii. Avg. accuracy = 0.771488 5-CV on 8 aggressive drivers: 1. iii. 5-CV on 39(all) drivers: 1. iv. Avg. accuracy = 0.780228 Avg. accuracy = 0.774242 Currently, integrating the driver type seems no significantly improve. 4) To-do list. - After discussion with Jin-Yao, the current assignments of driver type will be modified. Other feature selection or feature reduction methods will be used and - evaluated. i. Apply a wrapper-method of feature selection method, (Fselect.py) to a , small-sized dataset, randomly sampled from original dataset. - Generate datasets of other driving behaviors 4. Brief plan for the next month a. We will continuous paper survey and refine our proposed approaches. b. To implement our proposed approaches and evaluate their performance. 5. Research Byproducts a. Paper: N/A b. Served on the Editorial Board of International Journals: N/A c. Invited Lectures: N/A d. Significant Honors / Awards: N/A