Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot more unlabeled data available than labeled Assume a set L of labeled data and U of unlabeled data (from the same distribution) Focus on Semi-Supervised Classification though there are many other variations – Aiding clustering with some labeled data – Regression – Model selection with unlabeled data (COD) Transduction vs Induction CS 678 - Ensembles and Bayes 1 How Semi-Supervised Works Approaches make strong model assumptions (guesses). If wrong can make things worse. Some common used assumptions – Clusters of data are from the same class – Data can be represented as a mixture of parameterized distributions – Decision boundaries should go through non-dense areas of the data – Model should be as simple as possible (Occam) CS 678 - Ensembles and Bayes 2 Unsupervised Learning of Domain Features PCA, SVD NLDR – Non-Linear Dimensionality Reduction Deep Learning – Deep Belief Nets – Sparse Auto-encoders – Self-Taught Learning CS 678 - Ensembles and Bayes 3 Self-Training (Bootstrap) Self-Training – Train supervised model on labeled data L – Test on unlabeled data U – Add the most confidently classified members of U to L – Repeat Multi-Model – Uses an ensemble to trained models for Self-Training – Co-Training Train two models with different independent features sets Add most confident instances from U of one model into L of the other – Multi-View training Find ensemble of multiple diverse models trained on L which also tend to all agree well on U CS 678 - Ensembles and Bayes 4 More Models Generative – Assume data can be represented by some mixture of parameterized models (e.g. Gaussian) and use EM to learn parameters (ala Baum-Welch) CS 678 - Ensembles and Bayes 5 Graph Models Graph Models – Neighbor nodes assumed to be similar with larger edge weights. – Force same class member in L to be close, while maintaining smoothness with respect to the graph for U. – Add in members of U as neighbors based on some similarity – Iteratively label U (breadth first) CS 678 - Ensembles and Bayes 6 TSVM Transductive SVM (TSVM) or Semi-Supervised SVM (S3VM) Maximize margin of both L and U. Decision surface placed in non-dense spaces – Assumes classes are "well-separated" – Can also try simultaneously maintain class proportion on both sides similar to labeled proportion CS 678 - Ensembles and Bayes 7 CS 678 - Ensembles and Bayes 8