NTU/Intel M2M Project: Wireless Sensor Networks Content Analysis

advertisement
NTU/Intel M2M Project: Wireless Sensor Networks
Content Analysis and Management Special Interest Group
Data Analysis Team
Monthly Report
1. Team Organization
Principal Investigator: Shou-De Lin
Co-Principal Investigator: Mi-Yen Yeh
Team Members: Chih-Hung Hsieh (postdoc), Yi-Chen Lo (PhD student), Perng-Hwa Kung
(Graduate student), Ruei-Bin Wang (Graduate student), Yu-Chen Lu (Undergraduate student),
Kuan-Ting Chou (Undergraduate student), Chin-en Wang (Graduate student)
2. Discussion with Champions
a. Number of meetings with champion in current month: 1 by phone
b. Major comments/conclusion from the discussion: submission to KDD and MM next year
3. Progress between last month and this month
a. Topic1: Video Summarization using MSWave.
1) Evaluation for single camera:
2) Camera:
-
Mehtod1: Take a frame if d(q,f) > thredshold
-
i.
Take a frame if d(q,f) > thredshold
ii.
Result
Mehtod2:
i.
ii.
iii.
iv.
threshold *= 0.5
threshold *= 0.5 : no frame retrieved
threshold *= 0.2: see next page
Result
3) Euclidean Distance
-
Data: 200 sample of pamap (dim=105)
Euclidean distance
-
Compare: inner product
b. Topic2: Distributed Nearest Neighbor Search of Time Series Using Dynamic Time
Warping
1) Framework 2 Initialization:
-
The following table shows the small difference of performance between
the real cases and the idea ones.
2) Framework v.s. Naive
-
Run each dataset with random parameters for 100 times in the UCR 45
datasets
-
45 * 100 = 4500 instances
i.
1434 “1” (32%): Framework better than Naive
ii.
3066 “-1” (68%): Framework worse than Naive
Reasons for many “-1”: Small datasets, small T, large S / M, K >
S / 2…
-
Machine Learning
i.
Feature selection: M, K are still dominant
ii.
LibLinear: 74.3111% accuracy
iii.
LibSVM: 88.8% accuracy
3) Equal v.s. Unequal Size
-
The following table shows the difference of performance between the
equal-sized and unequal-sized segmentation methods.
c. Topic 3: Intelligent Transportation System (ITS) Machine Learning: Predict whether
driver will stop at intersection or not without using video data.
1) Building prediction model to predict whether driver will stop or not on intersection:
-
Unlike used only data generated on the intersection, recently, we tried to
generate ground truths among the whole trajectory.
-
Using a sliding window to scan and label the ground truths of trajectory
segments covered by the window.
-
However, this method will result in a extremely unbalanced dataset (too
much non-stopping cases) and may loss some important stopping cases
happened in the red-squared regions. The subsequent model does not work
well. Therefore we will modify the ground truth generating method as that:
we first focus on scanning and generating all the stopping cases without
missing any one of them, then we try to randomly generate the non-stopping
cases of equal amount.
1) Another issue needed to be addressed: An Efficient Way to Generate Dataset for
Identifying Driving Behaviors
-
Problem Statement:
To the best of our knowledge based on previous studies so far, because of the
huge cost to go through and mark happening driving events in the whole
trajectories and corresponding time-series data by human effort, there are
few large-scaled datasets to build an accurate intelligent transportation
system for identify driving behaviors. Although the computer-aiding way can
help to reduce the high cost of marking happened driving events among the
whole ITS trajectories, however the existent methods are lacking of flexibility
(sliding-window-based approaches) or have only application-specific usages
(computer-vision-based ones), such that the available usages of these
methods are still limited. The most important issues to be addressed when
labeling the ground truths with computer aiding are the followings: 1) the
events will occur as fragments starting from any positions of the whole
trajectory; 2) the same events derived by even the same drivers or not often
vary in length.
-
Hypothesis:
We believe that there should be some patterns existing among instances of
the same event coming from different trajectories, such that these patterns
can be conserved and discovered when we align these trajectories together.
-
Expected Contribution:
A more efficient and general way to extract instances of pre-defined events
from large amount of raw trajectories and time-series data will be proposed.
Further, the conserved patterns derived from the alignment result will
provide significant knowledge to generate informative attributes describing
driving events or to cluster drivers into categories representing different
tendencies. Those derived information will be adopted for building an
accurate model to identify improper driving event and to improve driving
safety.
4. Brief plan for the next month
a. We will continuous paper survey and refine our proposed approaches.
b. To implement our proposed approaches and evaluate their performance.
5. Research Byproducts
a. Paper: N/A
b. Served on the Editorial Board of International Journals: N/A
c. Invited Lectures: N/A
d. Significant Honors / Awards: N/A
Download