NTU/Intel M2M Project: Wireless Sensor Networks Content Analysis

advertisement
NTU/Intel M2M Project: Wireless Sensor Networks
Content Analysis and Management Special Interest Group
Data Analysis Team
Monthly Report
1. Team Organization
Principal Investigator: Shou-De Lin
Co-Principal Investigator: Mi-Yen Yeh
Team Members: Chih-Hung Hsieh (postdoc), Yi-Chen Lo (PhD student), Perng-Hwa Kung
(Graduate student), Ruei-Bin Wang (Graduate student), Yu-Chen Lu (Undergraduate student),
Kuan-Ting Chou (Undergraduate student), Chin-en Wang (Graduate student)
2. Discussion with Champions
a. Number of meetings with champion in current month: we met the champion once on
7/26(F2F)
b. Major comments/conclusion from the discussion: discussing the proposal for next year.
3. Progress between last month and this month
a. Topic1: Clustering streams using MSWave.
1) Bonds from LEEWAVE seem not to work well so far based on the result of
experiment:
{a) Data set: pamap; b) Dimension: 105; c) Sample 3514 instances
from 3465000}.
2) Discussion:
-
The Haar wavelet transform is usually used in time series.
-
Nevertheless, the data use in the paper is not time series but a vector
composed of some features with different meanings. (Not smooth if it seems
as a time series.)
-
It is possible to hurt the performance of pruning in Haar wavelet.
-
If experiment results are bad, perhaps we can overcome the problem by
some techniques of preprocessing data.
-
Most of current works make use of the hashing functions to find the
similarity between vectors. (The more close two vectors are, the similar
result of the hashing are.)
-
After discussion with En-Hsu, He said the approach we use to find the inner
product through wavelet seems better since we can get the true bounds (To
some extent, the Haar wavelet transform is also some kind of hashing
function.)
-
Perhaps we can compare the two methods in different scenarios.
b. Topic2: Exploiting Correlation among Sensors
1) Trials: Use closest similarity first to determine order
-
Fixing program: sampling too much data (25%-> 27~29%) cause some
sensors being sent more frequently (but cause good results compared to
random sampling with the same rate?)
2) Summary
-
Lower MAE than random by determined order
-
Problems
i.
Over sampled
1.
ii.
Compare random sampling with same rate
Some sensors sent more
c. Topic 3: Distributed Nearest Neighbor Search of Time Series Using Dynamic Time
Warping
1) Progress:
-
Rewriting testing code of both frameworks
-
New theoretical discovery
i.
FTW-based lower / upper bounds must be increasing / decreasing
ii.
We can save signals and space that keep lower / upper bounds at the
previous level
-
Discussion on Framework 2
i.
Threshold sent to site
1.
Comparison with threshold locally
2.
The site returns a signal to indicate if the server continue to send
rest of the query
3.
If the whole query is sent and the exact DTWs < threshold, the site
returns the exact DTWs to the server to update the threshold
ii.
Iteration order
1.
Sites from small to larger lower bounds
2) Pseudo codes of framework 1 and framework2:
-
Framework 1
-
Framework 2
d. Topic 4: Intelligent Transportation System (ITS) Machine Learning Group.
1) Video, audio, and data from sensors (accelerometer, magnetic, gyro, and GPS) of
riding scooter by using smart phone are collected.
2) Work: Predict whether driver will stop at intersection or not using only sensor data.
-
117
driving
cases
at
the
intersection
(25.024819,
121.543399)
between Fuxing South Road and Hoping East Road.
-
i.
66/117 stop cases (56.4%);
ii.
51/117 non-stop cases. (43.6%)
Used Features
i. GPS( longitude, latitude, altitude, GPS_speed, GPS_accuracy, GPS_bearing)
ii. ACCELEROMETER_x, ACCELEROMETER_y, ACCELEROMETER_z
iii. ORIENTATION_posx,
ORIENTATION_oriw,
ORIENTATION_posy,
ORIENTATION_posz,
ORIENTATION_orix,
ORIENTATION_oriy,
ORIENTATION_oriz
-
Reasults
i. LibLinear
1.
Accuracy = 71.7949% (84/117)
2.
Cross Validation Accuracy = 61.5385%
ii. LibSVM with RBF kernel
-
1.
Training Accuracy = 88.8889% (104/117)
2.
Best c=512.0, g=0.03125 CV rate=67.5214%
Now we focus on:
i. Collect more data described by appropriate features.
ii. Try to cluster drivers as “Aggressive”, “Conservative”, and “Neutral” clusters,
and do two-staged prediction.
iii. Find some interesting application of Trajectory Pattern Mining.
4. Brief plan for the next month
a. We will continuous paper survey and refine our proposed approaches.
b. To implement our proposed approaches and evaluate their performance.
5. Research Byproducts
a. Paper: N/A
b. Served on the Editorial Board of International Journals: N/A
c. Invited Lectures: N/A
d. Significant Honors / Awards: N/A
Download