The introduction of the 40 algorithms whose correct rate on training

advertisement
The introduction of the 40 algorithms whose correct rate on training set is higher than
50%.
BayesNet
BayesNet learns Bayesian networks under the assumptions that normal attributes and no
missing values with two different algorithms for estimating the conditional probability tables of
the network. K2 or TAN algorithm or more sophisticated methods is employed to search.
ComplementNaiveBaye
ComplementNaiveBaye builds and uses a Complement class Naive Bayes classifier. (Jason et
al. 2003)
NaiveBayes
NaiveBayes implements the probabilistic Naive Bayes classifier. And kernel density
estimators is employed in this classifier. (George and Pat 1995)
NaiveBayesMultinomial
NaiveBayesMultinomial implements the multinomial Bayes classifier which is a modified
form of Naive Bayes by accommodating words frequencies. (Andrew and Kamal 1998)
NaiveBayesSimple
NaiveBayesSimple builds and uses a simple Naive Bayes classifier. Normal distribution is
employed to model numeric attributes. (Richard and Peter 1973)
NaiveBayesUpdateable
NaiveBayesUpdateable is the updateable version of NaiveBayes which can process only one
instance at a time. Kernel estimator but not discretization is employed in this classifier. (Jason et
al.2003)
Logistic
Logistic builds and uses a multinomial logistic regression model with a ridge estimator which
can guard against overfitting by penalizing large coefficients. (le and van 1992)
MultilayerPerceptron
MultilayerPerceptron is a neural network that trains using backpropagation to classify
instances. The network can be built either by hand or an algorithm which can also be monitored
and modified during training time.
SimpleLogistic
SimpleLogistic builds linear logistic regression models. In order to fit this models, LogitBoost
with simple regression functions as base learners is employed. The optimal number of iterations
toperform is determined by using cross-validated, which supports automatic attribute
selection.(Niels et al. 2005, Marc et al. 2005 )
SMO
SMO implements John Platt's sequential minimal optimization algorithm, using polynomial or
Gaussian kernels, for training a support vector classifier. (Platt 1998, Keerthi 2001, Trevor and
Robert 1998)
IB1
IB1 is a nearest-neighbour classifier. Normalized Euclidean distance is employed to find the
training instance closest to the given test instance, and it predicts the same class as this training
instance. If several instances have the same (smallest) distance to the test instance, the first one
found is used. (Aha and Kibler 1991)
IBk
IBK is a k-nearest-neighbour classifier that uses Euclidean distance metric. The number of
nearest neighbors can be determined automatically using leave-one-out cross-validation. (Aha
and Kibler 1991)
Kstar
KStar is a nearest-neighbor classifier using a generalized distance function which is defined as
the complexity of transforming one instance into another. It uses an entropy-based distance
function which is different from other instance-based learners. (John and Leonard 1995)
BFTree
BFTree builds a best-first decision tree which uses binary split for both nominal and numeric
attributes. (Shi 2007, Jerome et al. 2000)
J48
J48 generates a pruned or unpruned C4.5 decision tree. (Ross 1993)
J48graft
J48graft generates a grafted (pruned or unpruned) C4.5 decision tree. (Geoff 1999)
NBTree
NBTree is a hybrids between decision tree and Naive Bayes which creates trees whose leaves
are Naive Bayes classifiers for instances that reach the leaf. (Ron 1996)
RandomForest
RandomForest constructs random forests by bagging ensembles of random trees. (Leo 2001)
REPTree
REPTree builds a decision or regression tree using information gain or variance, and
reduced-error pruning is employed to prune this tree.
SimpleCart
SimpleCart implements minimal cost-complexity pruning which deals with missing values by
using the method of fractional instances instead of surrogate split method. (Leo 1984)
DecisionTable
DecisionTable builds a simple decision table majority classifier which evaluates feature
subsets using best-first search and use cross-validation for evaluation. (Ron 1995)
Jrip
Jrip implements Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which
is an optimized version of IREP. (William 1995)
PART
PART generates a PART decision list using separate-and-conquer. It builds a partial C4.5
decision tree in each iteration and makes the best leaf into a rule. (Eibe and Ian 1998)
AttributeSelectedClassifier
AttributeSelectedClassifier selects attributes to reduce the data’s dimensionality before
passing it to the classifier.
Bagging
Bagging bags a classifier to reduce variance which can do classification and regression
depending on the base learner. (Leo 1996)
ClassificationViaClustering
ClassificationViaClustering uses a cluster for classification which uses a fixed number of
clusters in cluster algorithms. The number of clusters to generate is equal to the number of class
labels in the dataset in order to obtain a useful model.
ClassificationViaRegression
ClassificationViaRegressions performs classification using regression methods. Class is
binarized and one regression model is built for each class value. (Frank et al. 1998)
Dagging
Dagging creates a number of disjoint, stratified folds out of the data and feeds each chunk of
data to a copy of the supplied base classifier. Since all generated base classifiers are put into the
vote classifier, majority voting is employed to predict. (Ting and Witten 1997)
Decorate
Decorate builds diverse ensembles of classifiers by using specially constructed artificial
training examples. (Melville and Mooney 2003, Melville and Mooney 2004)
END
END builds an ensemble of nested dichotomies to handle multi-class datasets with 2-class
classifiers. (Dong et al.2005, Eibe and Stefan 2004)
EnsembleSelection
EnsembleSelection uses ensemble selection method to combine several classifiers from
libraries of thousands of models which are generated using different learning algorithms and
parameter settings. (Caruana 2004)
FilteredClassifier
FilteredClassifier runs an arbitrary classifier on data which has been passed through an
arbitrary filter whose structure is based exclusively on the training data. And test instances will
be processed by the filter without changing their structure.
LogitBoost
LogitBoost performs additive logistic regression using a regression scheme as the base learner.
And it can handle multi-class problems. (Friedman 1998)
MultiClassClassifier
MultiClassClassifier handles multi-class datasets with 2-class classifiers using any of the
following methods: one versus all the rest, pairwise classification using voting to predict,
exhaustive error-correcting codes and randomly selected error-correcting codes.
RacedIncrementalLogitBoost
RacedIncrementalLogitBoost learns large datasets by way of racing LogitBoosted committees
and operates incrementally by processing that datasets in batches.
RandomCommittee
RandomCommittee builds an ensemble of randomizable base classifiers which are built using
a different random number seed (but based one the same data). The final prediction is a straight
average of the predictions generated by the individual base classifiers.
RandomSubSpace
RandomSubSpace constructs a decision tree based classifier that maintains highest accuracy
on training data and improves on generalization accuracy as it grows in complexity. The classifie
consists of multiple trees constructed systematically by pseudorandomly selecting subsets of
components of the feature vector, that is, trees constructed in randomly chosen subspaces. (Tin
1998)
ClassBalancedND
ClassBalancedND handles multi-class datasets with 2-class classifiers by building a random
class-balanced tree structure. (Dong et al.2005, Eibe and Stefan 2004)
DataNearBalancedND
DataNearBalancedND handles multi-class datasets with 2-class classifiers by building a
random data-balanced tree structure. (Dong et al.2005, Eibe and Stefan 2004)
ND
ND handles multi-class datasets with 2-class classifiers by building a random tree structure.
(Dong et al.2005, Eibe and Stefan 2004)
Reference:
Aha, D., Kibler, D. 1991. Instance-based learning algorithms. Machine Learning. 6:37-66.
Andrew Mccallum, Kamal Nigam. 1998. A Comparison of Event Models for Naive Bayes Text
Classification. In: AAAI-98 Workshop on 'Learning for Text Categorization'.
Caruana, Rich, Niculescu, Alex, Crew, Geoff, and Ksikes, Alex. 2004. Ensemble Selection from
Libraries of Models, The International Conference on Machine Learning (ICML'04).
Dong Lin, Eibe Frank, Stefan Kramer. 2005. Ensembles of Balanced Nested Dichotomies for
Multi-class Problems. In: PKDD, 84-95.
Eibe Frank, Ian H. Witten. 1998. Generating Accurate Rule Sets Without Global Optimization. In:
Fifteenth International Conference on Machine Learning, 144-151.
Eibe Frank, Stefan Kramer. 2004. Ensembles of nested dichotomies for multi-class problems. In:
Twenty-first International Conference on Machine Learning.
Frank, E., Wang, Y., Inglis, S., Holmes, G., Witten, I.H. 1998. Using model trees for classification.
Machine Learning. 32(1):63-76.
Friedman, J., Hastie, T., Tibshirani, R. 1998. Additive Logistic Regression: a Statistical View of
Boosting. Stanford University.
Geoff Webb. 1999. Decision Tree Grafting From the All-Tests-But-One Partition. In, San
Francisco, CA.
George H. John, Pat Langley. 1995. Estimating Continuous Distributions in Bayesian Classifiers.
In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, 338-345.
Ian H. Witten, Eibe Frank. 2005. Data Mining Practical Machine Learning Tools and Techniques
(Second Edition), MORGAN KAUFMANN PUBLISHER.
Jason D. Rennie, Lawrence Shih, Jaime Teevan, David R. Karger. 2003. Tackling the Poor
Assumptions of Naive Bayes Text Classifiers. In: ICML, 616-623.
Jerome Friedman, Trevor Hastie, Robert Tibshirani. 2000. Additive logistic regression : A
statistical view of boosting. Annals of statistics. 28(2):337-407.
John G. Cleary, Leonard E. Trigg. 1995. K*: An Instance-based Learner Using an Entropic
Distance Measure. In: 12th International Conference on Machine Learning, 108-114.
Keerthi, S.S., Shevade, S.K. C. Bhattacharyya, K.R.K. Murthy. 2001. Improvements to Platt's
SMO Algorithm for SVM Classifier Design. Neural Computation. 13(3):637-649.
le Cessie, S., van Houwelingen, J.C. 1992. Ridge Estimators in Logistic Regression. Applied
Statistics. 41(1):191-201.
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone. 1984. Classification and
Regression Trees. Wadsworth International Group, Belmont, California.
Leo Breiman. 1996. Bagging predictors. Machine Learning. 24(2):123-140.
Leo Breiman. 2001. Random Forests. Machine Learning. 45(1):5-32.
Marc Sumner, Eibe Frank, Mark Hall. 2005. Speeding up Logistic Model Tree Induction. In: 9th
European Conference on Principles and Practice of Knowledge Discovery in Databases, 675-683.
Melville, Mooney, R.J. 2003. Constructing Diverse Classifier Ensembles Using Artificial Training
Examples. In: Eighteenth International Joint Conference on Artificial Intelligence, 505-510.
Melville, Mooney, R.J. 2004. Creating Diversity in Ensembles Using Artificial Data. Information
Fusion: Special Issue on Diversity in Multiclassifier Systems.
Niels Landwehr, Mark Hall, Eibe Frank. 2005. Logistic Model Trees.
Platt, J. 1998. Machines using Sequential Minimal Optimization. In B. Schoelkopf and C. Burges
and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning.
Richard Duda, Peter Hart. 1973. Pattern Classification and Scene Analysis. Wiley, New York.
Ron Kohavi. 1995. The Power of Decision Tables. In: 8th European Conference on Machine
Learning, 174-189.
Ron Kohavi. 1996. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid.
In: Second International Conference on Knoledge Discovery and Data Mining, 202-207.
Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San
Mateo, CA.
Shi Haijian. 2007. Best-first decision tree learning. Hamilton, NZ.
Tin Kam Ho. 1998. The Random Subspace Method for Constructing Decision Forests. IEEE
Transactions on Pattern Analysis and Machine Intelligence. 20(8):832-844.
Ting, K. M., Witten, I. H. 1997. Stacking Bagged and Dagged Models. In: Fourteenth
international Conference on Machine Learning, San Francisco, CA, 367-375.
Trevor Hastie, Robert Tibshirani. 1998. Classification by Pairwise Coupling. In: Advances in
Neural Information Processing Systems.
William W. Cohen. 1995. Fast Effective Rule Induction. In: Twelfth International Conference on
Machine Learning, 115-123.
Download