NTU/Intel M2M Project: Wireless Sensor Networks Content Analysis and Management Special Interest Group Data Analysis Team Monthly Report 1. Team Organization Principal Investigator: Shou-De Lin Co-Principal Investigator: Mi-Yen Yeh Team Members: Sheng-Hua Chen (postdoc), Winny Lo (RA), Ruei-Jun Xiao (Undergraduate student), Yu-Chen Lu (Undergraduate student), Jun-Kun Wang (RA), Han Xiao (intern student), Jing Wang (intern student) 2. Discussion with Champions a. Number of meetings with champion in current month: 1 b. Major comments/conclusion from the discussion: discussing the progress on air quality inference 3. Current Progress on Compressive Learning a. Problem overview Machine learning algorithms rely critically on the features used to represent data, and the feature set provides the primary interface through which an algorithm can reason about the data at hand. A typical pitfall for many learning problems is that there are too many potential features to choose from. Intelligent selection is essential in these scenarios because it can discard noise from irrelevant features, thereby requiring fewer training examples and preventing overfitting. Computationally, a smaller feature set is almost always advantageous as it requires less time and space to train the algorithm and make inferences In recent years, compressed sensing has attracted considerable attention in areas of applied mathematics, computer science, and electrical engineering by suggesting that it may be possible to surpass the traditional limits of sampling theory. Our goal is using the technique from compressed sensing to do machine learning. b. Method We would like to use appropriate compressed sensing matrices, that if the data are approximately linearly separable in a high dimensional space, and the data has sparse representation even is some unknown basis, then compressed sensing approximately preserves the linear separability, and hence learnability. In other words, by the theoretical bounds guaranteeing that if the data is measured directly in the compressed domain, a soft margin SVM’s classifier that is trained based on the compressed data performs almost as well as the best possible classifier in the high domain. c. flowchart (a) Paper survey (b) data collection: CNAE-9 data set, Farm Ads data set, Dexter data set, Dorothea data set, colon-cancer data set, duke breast-cancer data set. (c) generating compressed sensing matrices (d) compress data (e) do classification (libsvm) 4. Brief plan for the next month a. We will continuous paper survey and refine our proposed approaches. b. We will analysis the real data set and generate some simulated data to figure out how it works. 5. Research Byproducts a. Paper: N/A b. Served on the Editorial Board of International Journals: N/A c. Invited Lectures: N/A d. Significant Honors / Awards: N/A