Uploaded by Barnabas861

Classification Problems Brilliant Math & Science Wiki

advertisement
Classification Problems
Sign up
Classification is a central topic in machine learning that has to do with
teaching machines how to group together data by particular criteria.
Classification is the process where computers group data together based
on predetermined characteristics — this is called supervised learning. There
is an unsupervised version of classification, called clustering where computers find shared characteristics by which to group data when categories
are not specified.
A common example of classification comes with detecting spam emails. To
write a program to filter out spam emails, a computer programmer can
train a machine learning algorithm with a set of spam-like emails labelled
as spam and regular emails labelled as not-spam. The idea is to make an
algorithm that can learn characteristics of spam emails from this training
set so that it can filter out spam emails when it encounters new emails.
Classification is an important tool in today’s world, where big data is used
to make all kinds of decisions in government, economics, medicine, and
more. Researchers have access to huge amounts of data, and classification
is one tool that helps them to make sense of the data and find patterns.
While classification in machine learning requires the use of (sometimes)
complex algorithms, classification is something that humans do naturally
everyday. Classification is simply grouping things together according to
similar features and attributes. When you go to a grocery store, you can
fairly accurately group the foods by food group (grains, fruit, vegetables,
meat, etc.) In machine learning, classification is all about teaching computers to do the same.
Intuition
Classification Algorithms
Choosing the right classification algorithm is very important. An algorithm
Sign up
that performs classification is called a classifier. A classifier algorithm
should be fast, accurate, and sometimes, minimize the amount of training
data that it needs. Generally, the more parameters a set of data has, the
larger the training set for an algorithm must be. Different classification algorithms basically have different ways of learning patterns from examples.
[1]
More formally, classification algorithms map an observation v to a
concept/class/label ω .
Many times, classification algorithms will take in data in the form of a feature vector which is basically a vector containing numeric descriptions of
various features related to each data object. For example, if the algorithm
deals with sorting images of animals into various classes (based on what
type of animal they are, for example), the feature vector might include information about the pixels, colors in the image, etc.
Here are some common classification algorithms and techniques:
Linear Regression
A common and simple method for classification is linear regression.
Linear regression is a technique used to model the relationships between
observed variables. The idea behind simple linear regression is to "fit" the
observations of two variables into a linear relationship between them.
Graphically, the task is to draw the line that is "best-fitting" or "closest" to
the points (xi , yi ), where xi and yi are observations of the two variables
​
​
​
​
which are expected to depend linearly on each other.
The best-fitting linear relationship between the variables x and y . [2]
EXAMPLE
Sign up
Which of these lines, H1, H2, and H3, represents the worst classifier algorithm? (The classifier algorithms identify and label data and place
them on one side of the line or the other according to the results).
[3]
Show answer
Perceptrons
A perceptron is an algorithm used to produce a binary classifier. That is, the
algorithm takes binary classified input data, along with their classification
and outputs a line that attempts to separate data of one class from data of
the other: data points on one side of the line are of one class and data
points on the other side are of the other. Binary classified data is data
where the label is one thing or another, like "yes" or "no"; 1 or 0; etc.
The perceptron algorithm returns values of w0 , w1 , ..., wk and b such that
​
​
​
data points on one side of the line are of one class and data points on the
other side are of the other. Mathematically, the values of w and b are used
by the binary classifier in the following way. If w
⋅ x + b > 0, the classifier
returns 1; otherwise, it returns 0. Note that 1 represents membership of one
class and 0 represents membership of the other. This can be seen more
clearly with the AND operator, replicated below for convenience.
Sign up
The AND operation between two numbers. A red dot represents one class (x1 AND x2 = 0) and a blue dot represents the
other class (x1 AND x2 = 1). The line is the result of the perceptron algorithm, which separates all data points of one class
​
​
​
​
from those of the other.
The perceptron algorithm is one of the most commonly used machine
learning algorithms for binary classification. Some machine learning tasks
that use the perceptron include determining gender, low vs high risk for
diseases, and virus detection.
Naive Bayes Classifier
Naive Bayes classifiers are probabilistic classifiers with strong independence assumptions between features. Unlike many other classifiers which
assume that, for a given class, there will be some correlation between features, naive Bayes explicitly models the features as conditionally independent given the class.
Because of the independence assumption, naive Bayes classifiers are highly
scalable and can quickly learn to use high dimensional (many parameters)
features with limited training data. This is useful for many real world
datasets where the amount of data is small in comparison with the number
of features for each individual piece of data, such as speech, text, and image data.
Decision Trees
Another way to do a classification is to use a decision tree.
Sign up
EXAMPLE
Say you have the following training data set of basketball players that
includes information about what color jersey they have, which position
they play, and whether or not they are injured. The training set is labelled according to whether or not a player will be able to play for Team
A.
Person
Jersey
Offense or
Injured? Will they play for
Color
Defense
John
Blue
Offense
No
Yes
Steve
Red
Offense
No
No
Sarah
Blue
Defense
No
Yes
Rachel
Blue
Offense
Yes
No
Richard Red
Defense
No
No
Alex
Red
Defense
Yes
No
Lauren
Blue
Offense
No
Yes
Carol
Blue
Defense
No
Yes
Team A?
What is the rule for whether or not a player may play for Team A?
Show answer
To use a decision tree to classify this data, select a rule to start the tree.
Here we will use “jersey color” as the root node.
Sign up
Next, we will include a node that will distinguish between injured and
uninjured players.
Other types of classification algorithms include support vector machines
(SVMs), Bayesian, and logistic regression.
Use of Statistics In Input Data
Classification algorithms often include statistics data.
EXAMPLE
Let's say a company is trying to write software that will take in as input
the full text of a book. From the frequency of certain words, the program will determine which genre the book belongs to. Perhaps if the
word "detective" appears very frequently, the program will label the
book as a "mystery." If the book contains the word "magic" or "wizard"
many times, perhaps the software should label the book as "fantasy."
And so forth. (In this example, the language of the books is English).
Let's say that the computer program goes through each book and
keeps track of the number of times each word occurs. The algorithm
might find that across all genres, the words "the," "is," "and,", "I," and
Sign up
other very common English words occur with about the same frequency. Classifying the novels based on these word frequencies would
probably not be very helpful. However, if the algorithm notices that a
particular subset of words tend to occur more often in science-fiction
novels and fantasy novels than in mystery novels or non-fiction novels,
the algorithm can use this information to sort future book instances.
Error
In the basketball team example above, the rules for determining if a player
would play for Team A were fairly straightforward with just two binary data
points to consider. In book genre example, a historical-fiction novel might
contain the word "detective" many times if its topic has to do with a famous
unsolved crime. It is possible that the machine learning algorithm would
classify this novel as a mystery book. This is called error. Many times, error
can be reduced by feeding the algorithm more training examples.
However, eliminating error completely is very difficult to do, so in general, a
good classifier algorithm will have as low an error rate as possible.
Conclusion
Classification, and its unsupervised learning counterpart, clustering, are
central ideas behind many other techniques and topics in machine learning. Being able to classify and recognize certain kinds of data allows computer scientists to expand on knowledge and applications in other machine
learning fields such as computer vision, natural language processing, deep
learning, building predictive economic, market, and weather models, and
more.
See Also
References
Sign up
About
Careers
Testimonials
Help
Terms
Brilliant 2023
Privacy
California Privacy Policy
©
Download