NTU/Intel M2M Project: Wireless Sensor Networks Content Analysis

advertisement
NTU/Intel M2M Project: Wireless Sensor Networks
Content Analysis and Management Special Interest Group
Data Analysis Team
Monthly Report
1. Team Organization
Principal Investigator: Shou-De Lin
Co-Principal Investigator: Mi-Yen Yeh
Team Members: Chih-Hung Hsieh (postdoc), Yi-Chen Lo (PhD student), Perng-Hwa Kung
(Graduate student), Ruei-Bin Wang (Graduate student), Yu-Chen Lu (Undergraduate student),
Kuan-Ting Chou (Undergraduate student), Chin-en Wang(Graduate student)
2. Discussion with Champions
a. Number of meetings with champion in current month: 2(F2F)
b. Major comments/conclusion from the discussion: about how to use the newly collect data
3. Progress between last month and this month
a. Topic1: Clustering streams using MSWave.
1)In order to test the performance on large data, we generated the synthetic data by
the
(
same
random
walk
data
model
used
http://www.cs.ucr.edu/~eamonn/SIGKDD_trillion.pdf
in
references
or
http://www.cs.ucr.edu/~eamonn/UCRsuite.html). For every stream, we generate it by
the random walk whose every step size is a normal distributed random number with
mean 0 and standard deviation 1. We generate 12500 streams with length 12500 by the
model.
2)From Fig. (a), we can see that both MSWave-L and MSWave-S still saved more
transmission cost than CP, PRP, and LEEWAVE-M even though the scale of data set
increased. Furthermore, we can notice that the difference between these methods
became more distinguished than we saw in temperature data as the data were plotted
by semi-log graph. That means the larger scale of data we face, the better MSWave-L
and MSWave-S work. Moreover, the gap between MSWave-L and MSWave-S also
increased as |Q| became larger which was the same with our discussion in the previous.
3) Fig. (b) shows the performance of pruning of MSWave-L. Although the scale of data
increased, the performance of pruning was still good when the difference between
k(Here was 30) and M (Here was 500) is large. Due to the performance of pruning, the
reduction in the transmission cost was much more significant than temperature data.
(a)
(b)
b. Topic2: Exploiting Correlation among Sensors
1) Trials:
1.
Use closest similarity first to determine order
-
Fixing program: sampling too much data (25%-> 27~29%) makes some
sensors send more frequently (but does it cause good results compared to
random sampling with the same rate?)
2.
3.
Change the way to mod sensors as clusters
-
Ex: cluster size = 12, mod 5
-
(1, 6, 12) (2, 7, 12) (3,8)… to (1, 2, 3) ( 4, 5, 6)(7,8)…
-
No improvement (worse)
Do the one which has closest similarity first without clustering
-
No improvement (worse)
c. Topic 3: Distributed Nearest Neighbor Search of Time Series Using Dynamic Time
Warping
1) FTW-based method: FTW with Coarse C and Coarse Q
-
-
Original Proposal by FTW paper
-
Both candidates and query use the setting of reduced length
-
Each segment has only one node
New segment size = Old segment size / 2 each step
-
Reusing min / max of old segments
2) FTW-based method: FTW with Accurate C and Coarse Q
-
Sites have all nodes of candidates (sensors)
-
Both candidates and query use the setting of reduced length
-
Each segment has only one node
-
Candidates keep original streams
-
Difference
-
Reduced Length: Query uses the setting of reduced length like original FTW
-
Original Length: Query uses the setting of original length by duplicating min /
max data of segments
-
Each segment has multiple nodes
3) The experimental results shows that FTW-based method performs better than
AP-based method. The robustness and stability of FTW-based methods will be evaluated
in the next weeks.
d. Topic 4: Learning a sparse model in on-line and semi-supervised manner.
1) According to the current experimental results
-
The learnt model of support vector machine using the ramp-lossed function is more
sparse than using hinge-lossed function. However, using the hinge-lossed function in
the initial phase of on-line learning will provide a perform-well initial decision
boundary, and using the ramp-lossed function in the following learning process will
help to alleviate the problem of overfitting the outliers and simultaneously to keep
the learnt model as sparse as only using ramp-lossed function.
-
The performance of adding the unlabeled data to training set with the
semi-supervised learning manner did not be improved compared with only using
labeled training data.
-
In the following weeks, we will try to fix the problem of current semi-supervised
learning framework, by using the improved variant of ramp-lossed function or
adopting new framework of semi-supervised learning.
4. Brief plan for the next month
a. We will continuous paper survey and refine our proposed approaches.
b. To implement our proposed approaches and evaluate their performance.
5. Research Byproducts
a. Paper: N/A
b. Served on the Editorial Board of International Journals: N/A
c. Invited Lectures: N/A
d. Significant Honors / Awards: N/A
Download