NTU/Intel M2M Project: Wireless Sensor Networks Content Analysis

advertisement
NTU/Intel M2M Project: Wireless Sensor Networks
Content Analysis and Management Special Interest Group
Data Analysis Team
Monthly Report
1. Team Organization
Principal Investigator: Shou-De Lin
Co-Principal Investigator: Mi-Yen Yeh
Team Members: Sheng-Hua Chen (postdoc), Winny Lo (RA), Ruei-Jun Xiao (Undergraduate
student), Yu-Chen Lu (Undergraduate student), Jun-Kun Wang (RA), Han Xiao (intern
student), Jing Wang (intern student)
2. Discussion with Champions
a. Number of meetings with champion in current month: 1
b. Major comments/conclusion from the discussion: discussing the progress on air
quality inference
3. Current Progress on Compressive Learning
a. Problem overview
Machine learning algorithms rely critically on the features used to represent data,
and the feature set provides the primary interface through which an algorithm can
reason about the data at hand. A typical pitfall for many learning problems is that
there are too many potential features to choose from. Intelligent selection is
essential in these scenarios because it can discard noise from irrelevant features,
thereby requiring fewer training examples and preventing overfitting.
Computationally, a smaller feature set is almost always advantageous as it requires
less time and space to train the algorithm and make inferences
In recent years, compressed sensing has attracted considerable attention in
areas of applied mathematics, computer science, and electrical engineering by
suggesting that it may be possible to surpass the traditional limits of sampling theory.
Our goal is using the technique from compressed sensing to do machine learning.
b. Method
We would like to use appropriate compressed sensing matrices, that if the data are
approximately linearly separable in a high dimensional space, and the data has
sparse representation even is some unknown basis, then compressed sensing
approximately preserves the linear separability, and hence learnability. In other
words, by the theoretical bounds guaranteeing that if the data is measured directly in
the compressed domain, a soft margin SVM’s classifier that is trained based on the
compressed data performs almost as well as the best possible classifier in the high
domain.
c. flowchart
(a) Paper survey
(b) data collection: CNAE-9 data set, Farm Ads data set, Dexter data set, Dorothea
data set, colon-cancer data set, duke breast-cancer data set.
(c) generating compressed sensing matrices
(d) compress data
(e) do classification (libsvm)
4. Brief plan for the next month
a. We will continuous paper survey and refine our proposed approaches.
b. We will analysis the real data set and generate some simulated data to figure out how it
works.
5. Research Byproducts
a. Paper: N/A
b. Served on the Editorial Board of International Journals: N/A
c. Invited Lectures: N/A
d. Significant Honors / Awards: N/A
Download