Classification Problems Sign up Classification is a central topic in machine learning that has to do with teaching machines how to group together data by particular criteria. Classification is the process where computers group data together based on predetermined characteristics — this is called supervised learning. There is an unsupervised version of classification, called clustering where computers find shared characteristics by which to group data when categories are not specified. A common example of classification comes with detecting spam emails. To write a program to filter out spam emails, a computer programmer can train a machine learning algorithm with a set of spam-like emails labelled as spam and regular emails labelled as not-spam. The idea is to make an algorithm that can learn characteristics of spam emails from this training set so that it can filter out spam emails when it encounters new emails. Classification is an important tool in today’s world, where big data is used to make all kinds of decisions in government, economics, medicine, and more. Researchers have access to huge amounts of data, and classification is one tool that helps them to make sense of the data and find patterns. While classification in machine learning requires the use of (sometimes) complex algorithms, classification is something that humans do naturally everyday. Classification is simply grouping things together according to similar features and attributes. When you go to a grocery store, you can fairly accurately group the foods by food group (grains, fruit, vegetables, meat, etc.) In machine learning, classification is all about teaching computers to do the same. Intuition Classification Algorithms Choosing the right classification algorithm is very important. An algorithm Sign up that performs classification is called a classifier. A classifier algorithm should be fast, accurate, and sometimes, minimize the amount of training data that it needs. Generally, the more parameters a set of data has, the larger the training set for an algorithm must be. Different classification algorithms basically have different ways of learning patterns from examples. [1] More formally, classification algorithms map an observation v to a concept/class/label ω . Many times, classification algorithms will take in data in the form of a feature vector which is basically a vector containing numeric descriptions of various features related to each data object. For example, if the algorithm deals with sorting images of animals into various classes (based on what type of animal they are, for example), the feature vector might include information about the pixels, colors in the image, etc. Here are some common classification algorithms and techniques: Linear Regression A common and simple method for classification is linear regression. Linear regression is a technique used to model the relationships between observed variables. The idea behind simple linear regression is to "fit" the observations of two variables into a linear relationship between them. Graphically, the task is to draw the line that is "best-fitting" or "closest" to the points (xi , yi ), where xi and yi are observations of the two variables which are expected to depend linearly on each other. The best-fitting linear relationship between the variables x and y . [2] EXAMPLE Sign up Which of these lines, H1, H2, and H3, represents the worst classifier algorithm? (The classifier algorithms identify and label data and place them on one side of the line or the other according to the results). [3] Show answer Perceptrons A perceptron is an algorithm used to produce a binary classifier. That is, the algorithm takes binary classified input data, along with their classification and outputs a line that attempts to separate data of one class from data of the other: data points on one side of the line are of one class and data points on the other side are of the other. Binary classified data is data where the label is one thing or another, like "yes" or "no"; 1 or 0; etc. The perceptron algorithm returns values of w0 , w1 , ..., wk and b such that data points on one side of the line are of one class and data points on the other side are of the other. Mathematically, the values of w and b are used by the binary classifier in the following way. If w ⋅ x + b > 0, the classifier returns 1; otherwise, it returns 0. Note that 1 represents membership of one class and 0 represents membership of the other. This can be seen more clearly with the AND operator, replicated below for convenience. Sign up The AND operation between two numbers. A red dot represents one class (x1 AND x2 = 0) and a blue dot represents the other class (x1 AND x2 = 1). The line is the result of the perceptron algorithm, which separates all data points of one class from those of the other. The perceptron algorithm is one of the most commonly used machine learning algorithms for binary classification. Some machine learning tasks that use the perceptron include determining gender, low vs high risk for diseases, and virus detection. Naive Bayes Classifier Naive Bayes classifiers are probabilistic classifiers with strong independence assumptions between features. Unlike many other classifiers which assume that, for a given class, there will be some correlation between features, naive Bayes explicitly models the features as conditionally independent given the class. Because of the independence assumption, naive Bayes classifiers are highly scalable and can quickly learn to use high dimensional (many parameters) features with limited training data. This is useful for many real world datasets where the amount of data is small in comparison with the number of features for each individual piece of data, such as speech, text, and image data. Decision Trees Another way to do a classification is to use a decision tree. Sign up EXAMPLE Say you have the following training data set of basketball players that includes information about what color jersey they have, which position they play, and whether or not they are injured. The training set is labelled according to whether or not a player will be able to play for Team A. Person Jersey Offense or Injured? Will they play for Color Defense John Blue Offense No Yes Steve Red Offense No No Sarah Blue Defense No Yes Rachel Blue Offense Yes No Richard Red Defense No No Alex Red Defense Yes No Lauren Blue Offense No Yes Carol Blue Defense No Yes Team A? What is the rule for whether or not a player may play for Team A? Show answer To use a decision tree to classify this data, select a rule to start the tree. Here we will use “jersey color” as the root node. Sign up Next, we will include a node that will distinguish between injured and uninjured players. Other types of classification algorithms include support vector machines (SVMs), Bayesian, and logistic regression. Use of Statistics In Input Data Classification algorithms often include statistics data. EXAMPLE Let's say a company is trying to write software that will take in as input the full text of a book. From the frequency of certain words, the program will determine which genre the book belongs to. Perhaps if the word "detective" appears very frequently, the program will label the book as a "mystery." If the book contains the word "magic" or "wizard" many times, perhaps the software should label the book as "fantasy." And so forth. (In this example, the language of the books is English). Let's say that the computer program goes through each book and keeps track of the number of times each word occurs. The algorithm might find that across all genres, the words "the," "is," "and,", "I," and Sign up other very common English words occur with about the same frequency. Classifying the novels based on these word frequencies would probably not be very helpful. However, if the algorithm notices that a particular subset of words tend to occur more often in science-fiction novels and fantasy novels than in mystery novels or non-fiction novels, the algorithm can use this information to sort future book instances. Error In the basketball team example above, the rules for determining if a player would play for Team A were fairly straightforward with just two binary data points to consider. In book genre example, a historical-fiction novel might contain the word "detective" many times if its topic has to do with a famous unsolved crime. It is possible that the machine learning algorithm would classify this novel as a mystery book. This is called error. Many times, error can be reduced by feeding the algorithm more training examples. However, eliminating error completely is very difficult to do, so in general, a good classifier algorithm will have as low an error rate as possible. Conclusion Classification, and its unsupervised learning counterpart, clustering, are central ideas behind many other techniques and topics in machine learning. Being able to classify and recognize certain kinds of data allows computer scientists to expand on knowledge and applications in other machine learning fields such as computer vision, natural language processing, deep learning, building predictive economic, market, and weather models, and more. See Also References Sign up About Careers Testimonials Help Terms Brilliant 2023 Privacy California Privacy Policy ©