Semi-Supervised Learning
Can we improve the quality of our learning by combining
labeled and unlabeled data
Usually a lot more unlabeled data available than labeled
Assume a set L of labeled data and U of unlabeled data
(from the same distribution)
Focus on Semi-Supervised Classification though there are
many other variations
– Aiding clustering with some labeled data
– Regression
– Model selection with unlabeled data (COD)
Transduction vs Induction
CS 678 - Ensembles and Bayes
How Semi-Supervised Works
Approaches make strong model assumptions (guesses). If
wrong can make things worse.
Some common used assumptions
– Clusters of data are from the same class
– Data can be represented as a mixture of parameterized distributions
– Decision boundaries should go through non-dense areas of the data
– Model should be as simple as possible (Occam)
CS 678 - Ensembles and Bayes
Unsupervised Learning of Domain
NLDR – Non-Linear Dimensionality Reduction
Deep Learning
– Deep Belief Nets
– Sparse Auto-encoders
– Self-Taught Learning
CS 678 - Ensembles and Bayes
Self-Training (Bootstrap)
– Train supervised model on labeled data L
– Test on unlabeled data U
– Add the most confidently classified members of U to L
– Repeat
– Uses an ensemble to trained models for Self-Training
– Co-Training
 Train two models with different independent features sets
 Add most confident instances from U of one model into L of the other
– Multi-View training
 Find ensemble of multiple diverse models trained on L which also
tend to all agree well on U
CS 678 - Ensembles and Bayes
More Models
Generative – Assume data can be represented by some
mixture of parameterized models (e.g. Gaussian) and use
EM to learn parameters (ala Baum-Welch)
CS 678 - Ensembles and Bayes
Graph Models
Graph Models
– Neighbor nodes assumed to be similar with larger edge weights.
– Force same class member in L to be close, while maintaining
smoothness with respect to the graph for U.
– Add in members of U as neighbors based on some similarity
– Iteratively label U (breadth first)
CS 678 - Ensembles and Bayes
Transductive SVM (TSVM) or Semi-Supervised SVM
Maximize margin of both L and U. Decision surface
placed in non-dense spaces
– Assumes classes are "well-separated"
– Can also try simultaneously maintain class proportion on both
sides similar to labeled proportion
CS 678 - Ensembles and Bayes
CS 678 - Ensembles and Bayes