• The goal of machine learning is to build computer systems that can adapt and learn from their experience. • Hypothesis: concept (i.e., classification function) belonging to the Hypothesis Space of a learning algorithm • Hypothesis Space: The space of classifiers from which the learning algorithm selects a hypothesis. • Inductive reasoning moves from specific instances into a generalized conclusion, while Deductive reasoning moves from generalized principles that are known to be true to a true and specific conclusion. The accuracy of inductive reasoning is questionable. • Inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered. • Example of Machine Learning: 1) Learning to recognize spoken words (Lee, 1989; Waibel, 1989). 2) Learning to drive an autonomous vehicle (Pomerleau, 1989). 3) Learning to classify new astronomical structures (Fayyad et al., 1995). 4) Learning to play world-class backgammon (Tesauro 1992, 1995). • Supervised learning – Use specific examples to reach general conclusions or extract general rule 1) Classification 2)Regression • Unsupervised learning (Clustering) – Unsupervised identification of natural groups in data • Reinforcement learning– Feedback (positive or negative reward) given at the end of a sequence of steps Inductive reasoning and Deductive reasoning • Deduction: reasoning from general premises, which are known or presumed to be known, to more specific, certain conclusions. All men are mortal. (Premise) John is a man. (Premise) John is mortal. (Conclusion) • Induction: reasoning from specific cases to more general, but uncertain, conclusions. John and Tim are in the college Basketball team Deductive Validity of Conclusions conclusions can be proven All Basketball team members are tall to be valid if All politicians believe in the inclusive idea of their nations. the premises are known to X is a politician be true. X believes in the inclusive idea of his country Deductive, invalid John and Tim are tall Inductive Conclusions may be incorrect even if the premises are true. Hypothesis • Target function: In machine learning, we want to learn or approximate a particular function that maps input x to f(x). For example, let's us distinguish spam from non-spam email. The target function f(x) = y is the true function f that we want to model. • Hypothesis: A hypothesis is a certain function that we believe (or hope) is similar to the true function, the target function that we want to model. In context of email spam classification, it would be the rule we came up with that allows us to separate spam from non-spam emails. What is a Concept? • A Concept is a a subset of objects or events defined over a larger set [Example: The concept of a bird is the subset of all objects (i.e., the set of all things or all animals) that belong to the category of bird.] • Alternatively, a concept is a Boolean-valued function defined over this larger set [Example: a function defined over all animals whose value is true for birds and false for every other animal]. Things Birds Animals Cars What is Concept-Learning? • Given a set of examples labeled as members or non-members of a concept, concept-learning consists of automatically inferring the general definition of this concept. • In other words, concept-learning consists of approximating a boolean-valued function from training examples of its input and output. Example of a Concept Learning task • Concept: Good Days for Water Sports • Attributes/Features: • Sky (values: Sunny, Cloudy, Rainy) • AirTemp (values: Warm, Cold) • Humidity (values: Normal, High) • Wind (values: Strong, Weak) • Water (Warm, Cool) • Forecast (values: Same, Change) • Inductive Bias (values: Yes, No) Example of a Training Point: <Sunny, Warm, High, Strong, Warm, Same, Yes> Concept Learning as Search • Concept Learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation. • Selecting a Hypothesis Representation is an important step since it restricts (or biases) the space that can be searched. [For example, the hypothesis “If the air temperature is cold or the humidity high then it is a good day for water sports” cannot be expressed in our chosen representation.] Sl. No x1 x2 y 1 2 3 4 5 6 0.7 0.8 0.8 1.2 0.6 1.3 0.7 0.9 0.25 0.8 0.4 0.5 0 1 0 1 0 1 7 8 0.9 0.9 0.5 1.1 0 1 The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered. Symbols used in propositional logic – Connectives: and or not + , > Conjunction Disjunction Negation implies If then equivalent to xor False, True Decision Trees • • • • • 1) 2) 3) 4) The most commonly used classification technique Use supervised learning Easy for us to understand the learned results Can deal with missing values and irrelevant features Computationally cheap to use Decision tree classify instances by sorting them down from the root tree to some leaf node. Each node specifies a test of some attribute of the instance Each branch from the node corresponds to one of the possible values of this attribute. An instance is classified by starting at the root node of the tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of attribute in the given example. 5) This process is repeated for the subtree rooted at the new node. A Decision Tree for Play Tennis (Outlook = Sunny , Temperature = Hot, Humidity = High, Wind = Strong) YES = (Outlook = Sunny Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain Wind = Weak) Select attribute which partitions the learning set into subsets as “pure” as possible 0 log 0 0 Information gain of an attribute A relative to a collection of examples Repeat the steps for each non-terminal descendant node, using only the training examples associated with that node. Attribute that have been incorporated higher in the tree are excluded. This process continues until either of the two conditions is met: (1)Every attribute has already been included along this path through the tree, or (2) the training examples associated with this leaf node all have the same target attribute value. •The depth of a node is the number of edges from the node to the tree's root node. A root node will have a depth of 0. •The height of a node is the number of edges on the longest path from the node to a leaf. A leaf node will have a height of 0 •The height of a tree would be the height of its root node, or equivalently, the depth of its deepest node.