Principles of Artificial Intelligence (18AI55) Module 1 Artificial Intelligence Artificial intelligence (AI) is the study of how to make computers do things which, at the moment, people do better. – Rich and Knight, 1991 The study of the computations that make it possible to perceive, reason and act. – Patrick Henry Winston, 1992 AI is a branch of computer science which deals with automation of intelligent behaviour – Luger & Stubblefield, 1993 The automation of activities that we associate with human thinking. – Bellman, 1978 More Formal Definition of AI: Artificial Intelligence (AI) is the study of making machines to perform tasks that typically require human intelligence. Artificial intelligence allows machines to model, and even improve upon, the capabilities of the human mind. History of AI Year 1943: The first work which is now recognized as AI was done by Warren McCulloch and Walter pits in 1943. They proposed a model of artificial neurons. Year 1950: English mathematician Alan Turing published "Computing Machinery and Intelligence" in which he proposed a test. The test can check the machine's ability to exhibit intelligent behaviour equivalent to human intelligence, called a Turing test. Year 1956: The word "Artificial Intelligence" first adopted by American scientist John McCarthy at the Dartmouth Conference. For the first time, AI coined as an academic field. Year 1966: The researchers emphasized developing algorithms which can solve mathematical problems. Joseph Weizenbaum created the first chatbot in 1966, which was named as ELIZA. Year 1972: The first intelligent humanoid robot was built in Japan which was named as WABOT-1. Year 1980: AI Expert systems was programmed that emulate the decision-making ability of a human expert. In the Year 1980, the first national conference of the American Association of Artificial Intelligence was held at Stanford University. Year 1997: In the year 1997, IBM Deep Blue defeated world chess champion and became the first computer to beat a world chess champion. Year 2006: AI came in the Business world till the year 2006. Companies like Facebook, Twitter, and Netflix also started using AI. Year 2012: Google has launched an Android app feature "Google now", which was able to provide information to the user as a prediction. Year 2018: The "Project Debater" from IBM debated on complex topics with two master debaters and also performed extremely well. Now AI has developed to a remarkable level. The concept of Deep learning, big data, and data science are now trending like a boom. Nowadays companies like Google, Facebook, IBM, and Amazon are working with AI and creating amazing devices. The future of Artificial Intelligence is inspiring and will come with high intelligence. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 1 Characteristics of AI systems Learn new concepts and tasks. Should draw useful conclusions about the world around us. Understand ideas and help computers to communicate in natural languages. Plan sequences of actions to complete a goal. Offer advice based on rules and situations. May not necessarily imitate human senses and thought processes. But indeed, in performing some tasks differently, they may actually exceed human abilities. Capable of performing intelligent tasks effectively and efficiently. Should perform tasks that require high levels of intelligence. Understanding of AI (Techniques of AI) AI techniques and ideas seem to be harder to understand than most things in computer science. AI shows best on complex problems for which general principles don't help much, though there are a few useful general principles. Artificial intelligence is also difficult to understand by its content. Boundaries of AI are not well defined. It is thought to be an advanced software engineering, sophisticated software techniques for hard problems that can't be solved in any easy way. AI programs - like people - are usually not perfect, and even make mistakes. AI involves working in domains for which the problem is often poorly understood. AI programs can be treated as non-numeric (common sense) ways of solving problems and need not be perfect but can give reasonably good solutions. Understanding of AI also requires an understanding of related terms such as intelligence, knowledge, reasoning, thought, cognition, learning, and a number of other computer related terms. Intelligence Intelligence means ability to acquire and apply knowledge. This includes capabilities to: o Reason o Plan o Solve problems o Think abstractly o Understand ideas and language o Learn Intelligent system: There are lot of challenges in building systems that mimic the behaviour of the human brain which is made up of billions of neurons. The earliest best method to measure the intelligence of a system was the Turing test that was proposed by Alan Turing in 1950. Turing test: A system is said to have passed Turing test if a human questioner is unable to determine from repeated questions of any kind, whether he or she is talking to another person or to a machine/ system. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 2 ELIZA Eliza is the first intelligent system which passed Turing test. Eliza was developed by Joseph Weizenbaum in mid-1960's. Eliza was a program that communicated with user in English. People were amazed to see this program. The program was able to converse about any subject, because it stored subject information in data banks. Another feature of Eliza was its ability to pick up speech patterns from user's questions and provide responses using those patterns. In this mode, ELIZA mostly rephrased the user's statements as questions and posed those to the user. Main Characteristics of Eliza Simulation of Intelligence: o Eliza programs are not intelligent at all in real sense. o They do not understand the meaning of utterance. o Instead, these programs simulate intelligent behaviour quite effectively by recognizing key words and phrases. o By using a lookup table, response to the question is chosen. Quality of Response: o Its performance depends on how they can process the input text. o The number of templates available is a serious limitation. o The success depends heavily on the fact that the user has a fairly restricted view of the expected response from the system. Coherence: o The earlier version of the system imposed no structure on the conversation. o Each statement was based entirely on the current input and no practical information was used. o Advanced versions of Eliza performed better. o Any sense of intelligence depends strongly on the conversation as judged by the user Semantics: o Such systems have no logic representation of the content of either the user's input or the reply. o That is why we say that it does not have intelligence of understanding of what we are saying. o But it looks that it imitates the human conversation style. Because of this, it passed Turing test. Categories of AI/ Intelligent Systems Systems that think like humans Systems that act like humans Systems that think rationally Systems that act rationally Systems that think like humans Systems that think like humans requires cognitive modelling approaches. One has to know the functioning of brain and its mechanism for processing information. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 3 It is an area of cognitive science. The stimuli are converted into mental representation. Cognitive processes manipulate representation to build new representations that are used to generate actions. Neural network is a computing model for processing information similar to brain. Systems that act like humans The overall behaviour of the system should be human like. It could be achieved by observation. Turing test is an example. Systems that think rationally Such systems rely on logic rather than human to measure correctness. For thinking rationally or logically, logic formulas and theories are used for synthesizing outcomes. o For example, given John is a human and all humans are person then one can conclude logically that John is person. Systems that act rationally Rational behaviour means doing right thing. Even if method is illogical, the observed behaviour must be rational. Components of AI Program AI techniques must be independent of the problem domain as far as possible. AI program should have o Knowledge base o Control strategy o Inference mechanism Knowledge Base: AI programs should be learning in nature and update its knowledge accordingly. Knowledge base consists of facts and rules. Characteristics of Knowledge base: o It is huge in nature and requires proper structuring o It may be incomplete and inaccurate o It may keep on changing (dynamic) Control strategy: Control Strategy determines the rule to be applied. To know the rule, thump rules based on problem domain may be used. Inference mechanism: It requires search through knowledge base. Derive new knowledge using the existing knowledge with the help of inference rules. Foundations of AI Foundation of AI is based on o Mathematics o Neuroscience o Control Theory Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 4 o Linguistics Mathematics: Al systems use formal logical methods and Boolean logic, analysis of limits to what can be computed, probability theory, uncertainty that forms the basis for most modern approaches to AI, fuzzy logic, etc. Neuroscience: Neuroscience helps in studying the functioning of brains. In early studies, injured and abnormal people were monitored/used to understand how brain works. Recent studies use accurate sensors to correlate brain activity to human thought. Moore's law states that the computers will have as many transistors as humans have neurons in the year 2020. Researchers are working on how to make a mechanical brain. Such systems will require parallel computation, remapping, and interconnections to a large extent. Control theory: Control theory deals with the control of dynamical systems in engineered processes and machines. Machines can modify their behaviour in response to the environment (sense/ action loop). Linguistics: Speech demonstrates so much of human intelligence. Analysis of human language reveals thought taking place in ways not understood in other settings. Children can create sentences they have never heard before. Languages and thoughts are believed to be closely connected. Sub-areas of AI AI is an interdisciplinary area having numerous subfields. Some of the sub-areas of AI are: o Knowledge representation models o Game playing o Fuzzy Logic o Natural language understanding o Intelligent tutoring systems o Robotics o Neural Networks o Expert problem solving o Web agents o Computer vision o Data mining o Learning models, inference techniques, pattern recognition, search and matching etc. Development of AI languages Building an AI solution not only requires a clear set of requirements but also the right selection of technologies and AI programming languages. AI languages stress on knowledge representation schemes, pattern matching, flexible search and programs on data. Examples of such languages are LISP, Pop-2, ML and Prolog. LISP (List Processing) is a functional language based on calculus. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 5 LISP was the first language developed for AI. It is the second oldest programming language after Fortran and is considered one of the pioneers in computer science and the implementation of AI. Prolog stands for Programming in logic. It is mainly used for AI and computational linguistics. It is a declarative language and particularly useful for symbolic reasoning, database, language parsing applications, and natural language processing. Prolog was one of the popular logic programming languages at that time with its usage in expert systems, theorem proving, type systems, and automated planning. POP-2 is a stack-based language providing greater flexibility and has similarity with LISP. POP-11 is a reflective compiled programming language. It is an evolution of POP-2. ML (Meta Language) is a general-purpose functional programming language. Today there are several languages in the ML family; the three most prominent are Standard ML (SML), OCaml and F#. Python is a high-level programming language for AI. It’s one of the most frequently used programming languages, with applications in AI, machine learning, data science, web apps, desktop apps, networking apps, and scientific computing. Python is an object-oriented, high-level programming language. Object-oriented means this language is based on objects (such as data) rather than functions, and high-level means it's easy for humans to understand. Current trends in AI Over the past decade, AI has meshed into various industries. The era witnessed a dramatic increase in platforms based on AI and Machine Learning (ML). These technologies have impacted healthcare, manufacturing, law, finance, retail, real estate, accountancy, digital marketing, and several other areas. The process of completing a particular task with the help of a computer or a computing device is called computing. Soft computing and hard computing are the two computing methods. Hard computing needs predefined instructions and does not work beyond those lines. Its principle relies on certainty and flexibility. Soft computing is a computing model evolved to solve the non-linear problems which involve uncertain, imprecise and approximate solutions of a problem. These types of problems are considered as real-life problems where the human-like intelligence is required to solve it. The concept of soft computing is based on learning from experimental data. It means that soft computing does not require any mathematical model to solve the problem. Soft computing helps users to solve real-world problems by providing approximate results that conventional and analytical models cannot solve. Current trends in AI are basically towards the development of technologies which have origin with biological or behavioural phenomena related to human or animals such as evolutionary computation. Artificial neural networks (ANN) were developed based on functioning of human brain to make predictions. ANN, usually called as neural networks or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Evolutionary techniques mostly involve meta-heuristic optimization algorithms such as evolutionary algorithms (comprising genetic algorithms, evolutionary programming etc.) and swarm intelligence (comprising ant colony optimization and particle swarm optimization). Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 6 A genetic algorithm is inspired by Charles Darwin’s theory of natural evolution. This algorithm reflects the process of natural selection where the fittest individuals will survive and produce offspring of the next generation. Swarm intelligence is an emerging field of biologically-inspired artificial intelligence based on the behavioural models of social insects. The inspiration often comes from nature, especially biological systems. The agents follow very simple rules, and although there is no centralized control structure saying how individual agents should behave or interact. Examples of swarm intelligence in natural systems include ant colonies, bee colonies, bird flocking, hawks hunting, animal herding, bacterial growth, fish schooling and microbial intelligence. Advantages of such systems are adaptability, robustness, reliability, simplicity and has no central control. AI based Expert system are designed to solve complex problems and to provide decision-making ability like a human expert. It performs this by extracting knowledge from its knowledge base using the reasoning and inference rules according to the user queries. It is called so because it contains the expert knowledge of a specific domain and can solve any complex problem of that particular domain. These systems are designed for a specific domain, such as medicine, science, etc. The performance of an expert system is based on the expert's knowledge stored in its knowledge base. The more knowledge stored, the more that system improves its performance. One of the common examples of an ES is a suggestion of spelling errors while typing in the Google search box. Applications of AI Business: Financial strategies, Give advice Engineering: Check design, offer suggestions to create new product, Expert systems engineering applications Manufacturing: Assembly, Inspection, and Maintenance Medicine: Monitoring, Diagnosing, and Prescribing Education: In teaching Fraud detection Object identification Space shuttle scheduling Information retrieval Tic Tac Toe Board Game Playing Tic-tac-toe, noughts and crosses, or Xs and Os is a paper-and-pencil game for two players who take turns marking the spaces in a three-by-three grid with X or O. The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal row is the winner. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 7 Tic-tac-toe is played on a three-by-three grid by two players, who alternately place the marks X and O in one of the nine positions in the grid. We can play this game with one human player and the other player to be a computer. The objective is to write a program which never loses. Three are three approaches to implement this game using a computer program. Approach 1 Consider a Board having nine elements vector. Following symbols are used in approach 1: o 0 for blank o 1 indicating X player move o 2 indicating O player move Computer may play as X or O player. Move Table MT: MT is a vector of 39 elements, each element of which is a nine-element vector representing board position. Total of 39 (19683) elements in MT All possible board positions are stored in Current board position along with its corresponding next best possible board position in new board position. Once table is designed, the computer program has to simply do the table look up. Algorithm: View the vector (board) as a ternary number and convert it to its corresponding decimal number. Use the computed number as an index into the MT and access the vector stored there. The selected vector represents the way the board will look after the move. Set board equal to that vector. Drawbacks of Approach 1: Very efficient in terms of time but has several disadvantages. Lot of space to store the move table. Lot of work to specify all the entries in move table. Highly error prone as the data is large. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 8 Poor extensibility Not appropriate for 3D tic-tac-toe (327 board position to be stored) Not intelligent as it does not meet AI requirements. Approach 2 A nine-element vector representing the board: B[1..9] Following symbols are used in approach 2: o 2 - indicates blank o 3 – X player o 5 – O player There are total 9 moves represented by an integer 1 to 9. Strategy: The strategy applied by human for this game is that if he/she is winning in the next move, then play in the desired square. Otherwise, check if the opponent is winning. If so, then block that square, otherwise try making a valid movement in any row, column, or diagonal. Following three functions are used in this approach: o Go(n) - Using this function computer can make a move in square n. o Make_2 - this function helps the computer to make valid 2 moves. o PossWin(P) - If player P can win in the next move, then it returns the index (from 1 to 9) of the square that constitutes a winning move, otherwise it returns 0. If PossWin (P) = 0 {P cannot win} then find whether opponent can win. If so, then block it. PossWin checks one at a time, for each rows /column and diagonals as follows. o If 3 * 3 * 2 = 18 then player X can win o else if 5 * 5 * 2 = 50 then player O can win o These procedures are used in the algorithm. Algorithm: Conditions: The first player always uses symbol X. There are in all 9 moves in the worst case. Computer is represented by C and Human is represented by H. If C plays first (Computer plays X, Human plays O) - Odd moves If H plays first (Human plays X, Computer plays O) - Even moves Complete Algorithm – Odd moves or even moves for C playing first or second Move 1: Go (5)/ Go (1) Move 2: If B[5] is blank, then Go(5) else Go(1) Move 3: If B[9] is blank, then Go(9) else Go(3) {make 2} Move 4: {By now human (playing X) has played 2 chances} If PossWin(X) then {block H} Go (PossWin(X)) else Go (Make_2) Move 5: {By now computer has played 2 chances} If PossWin(X) then {won} Go(PossWin(X)) else {block H} if PossWin(O) then Go(PossWin(O)) else if B[7] is blank then Go(7) else Go(3) Move 6: {By now both have played 2 chances} If PossWin(O) then {won} Go(PossWin(O)) else {block H} if PossWin(X) then Go(PossWin(X)) else Go(Make_2) Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 9 Moves 7 & 9: {By now human (playing O) has played 3 chances} If PossWin(X) then {won} Go(PossWin(X)) else {block H} if PossWin(O) then Go(PossWin(O)) else Go(Anywhere) Move 8: {By now computer has played 3 chances} If PossWin(O) then {won} Go(PossWin(O)) else {block H} if PossWin(X) then Go(PossWin(X)) else Go(Anywhere) Features of Approach 2: Not as efficient as first one in terms of time. Several conditions are checked before each move. It is memory efficient. Easier to understand & complete strategy has been determined in advance. Cannot generalize to 3-D. Approach 3 Same as approach 2 except for one change in the representation of the board. Board is considered to be a magic square of size 3 X 3 with 9 blocks numbered by numbers indicated by magic square. Each row, column and diagonals add to 15. Execution of Approach 3: Suppose H plays in block 8 Next C plays in block numbered as 5 H plays in block numbered 3 Now it is the turn of computer to play Strategy by computer: Since H has played two turns and C has played only one turn, C checks if H can win or not. o Compute sum of blocks played by H o S = 8 + 3 = 11 o Compute D = 15 – 11 = 4 o Block 4 is a winning block for H. o So, block this block and play in block numbered 4. o The list of C gets updated with block number 4 as follows: First Player H: 8 3 Second Player C: 5 4 Assume that H plays in block numbered 6. Now it’s the turn of C o C checks, if C can win as follows: o Compute sum of blocks played by C o S=5+4=9 o Compute D = 15 – 9 = 6 o Block 6 is not free, so C cannot win at this turn. o Now check if H can win. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 10 o o o o o Compute sum of new pairs (8, 6) and (3, 6) from the list of H S = 8 + 6 = 14 Compute D = 15 – 14 = 1 Block 1 is not used by either player, so C plays in block numbered as 1 The updated lists at 6th move looks as follows: First Player H: 8 3 6 Second Player C: 5 4 1 Assume that now H plays in 2. Using same strategy, C checks its pair (5, 1) and (4, 1) and finds bock numbered as 9 {15-6 = 9}. Block 9 is free, so C plays in 9 and win the game. Features of Approach 3: This program requires more time than other two methods. It has to search all possible move sequences before making each move. This could be extended to handle 3-dimensional tic-tac-toe. Can be used to design games more complicated than tic-tac-toe. 3-Dimensional Tic Tac Toe The goal is still to be the first to get 3-in-a-row, but now the 3-in-a-row can be on any of the three levels, or between levels. To win, a player must place three of their symbols on three squares that line up vertically, horizontally, or diagonally on a single grid, or spaced evenly over all three grids. In this game one may use magic cube which is three-dimensional equivalent of a magic square, that is, numbers (from 1 to 27) are arranged in a (3x3x3) pattern such that the sum of the numbers on each row, each column on six outer surfaces of a cube, each row, each column and two diagonal of middle grid and the four main space diagonals is equal to a single number (in this case 42) called a magic constant of the cube. Problem Solving Problem solving is a method of deriving solution steps beginning from initial description of the problem to the desired solution. In AI, the problems are frequently modelled as a state space problem where the state space is a set of all possible states from start to goal states. Search in AI is the process of navigating from a starting state to a goal state by transitioning through intermediate states. Almost any AI problem can be defined in these terms. Two types of problem-solving methods that are generally followed: General purpose and specialpurpose methods. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 11 A general-purpose method is applicable to a wide variety of problems, whereas a special-purpose method is tailor-made for a particular problem and often exploits very specific features of the problem. General-purpose problem-solving approaches: Production system (PS) Production system is a type of cognitive architecture that defines specific actions as per certain rules. The rules represent the declarative knowledge of a machine to respond according to different conditions. Many expert systems and automation methodologies rely on the rules of production systems. PS consists of number of production rules in which each production rule has left side that determines the applicability of the rule and a right side that describes the action to be performed if the rule is applied. Left side of the rule is current state, whereas the right side describes the new state that is obtained from applying the rule. These production rules operate on the databases that change as these rules are applied. PS also consists of control strategies that specify the sequence in which the rules are applied when several rules match at once. Advantages of PS in Al: Provides an excellent base for structuring AI Problems. Each rule can be added, removed or modified independently, which makes the system highly modular. Helpful in designing real-time applications. A good way to model the state-driven nature of intelligent machines. More flexible than algorithmic control. Language independent. Water Jug Problem (Example for Production system/Rule) Problem statement: We have two jugs, a 5-gallon (5-g) and the other 3-gallon (3-g) with no measuring marker on them. There is endless supply of water through tap. Our task is to get 4 gallon of water in the 5-g jug. Solution: State space for this problem can be described as the set of ordered pairs of integers (X,Y) such that X represents the number of gallons of water in 5-g jug and Y for 3-g jug. o Start state is (0, 0) o Goal state is (4, N) for any value of N ≤3. The possible operations that can be used in this problem: o Fill 5-g jug from the tap and empty the 5-g jug by throwing water down the drain o Fill 3-g jug from the tap and empty the 3-g jug by throwing water down the drain o Pour some or 3-g water from 5-g jug into the 3-g jug to make it full o Pour some or full 3-g jug water into the 5-g jug Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 12 Missionaries and Cannibals Problem Problem statement: Three missionaries and three cannibals want to cross a river. There is a boat on their side of the river that can be used by either one or two persons. How should they use this boat to cross the river in such a way that cannibals never outnumber missionaries on either side of the river? If the cannibals ever outnumber the missionaries (on either bank) then the missionaries will be eaten. How can they all cross over without anyone being eaten? Solution: State space for this problem can be described as the set of ordered pairs of left and right banks of the river as (L, R) where each bank is represented as a list [nM, mC, B). Here n is the number of missionaries M, m is the number of cannibals C, and B represents the boat. o Start state: ([3M, 3C, 1B]. [0M, 0C, 0B]) o Any state ([n1M, m1C, _], [n2M, m2C, _]), with constraints/conditions at any state as n1≥m1; n2≥m2; n1+n2=3; m1+ m2=3 o Goal state: ([0M, 0C, 0B], [3M, 3C, 1B]) Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 13 State Space Search State space is another method of problem solving that provides easy search similar to Production system. In this method also problem is viewed as finding a path from start state to goal state. A state space consists of four components: o Set S containing start states of the problem o Set G containing goal states of the problem o Set of nodes (states) in the graph/tree. (Each node represents the state) o Set of arcs connecting nodes. (Each arc corresponds to operator that is a step in a problemsolving process) A solution path is a path through the graph from a node in S to a node in G. The main objective of search algorithm is to determine a solution path in the graph. There may be more than one solution paths, as there may be more than one way of solving the problem. Missionaries and Cannibals Problem using State Space Search The possible operators that are applied in this problem are (2M0C, 1M1C, 0M2C, 1MOC, 0M1C). Here M is missionary and C is cannibal. Digit before these characters indicates number of missionaries and cannibals possible at any point in time. These operators can be used in both the situations, i.e., if the boat is on the left bank, “Operator→” is applied, and if the boat is on the right bank, then "Operator← “is applied. Let us represent state (L: R), where L= n1Mm1C1B and R = n2Mm2C0B. Here B represents boat with 1 or 0 indicating the presence or absence of the boat. Start state: (3M3C1B: 0M0C0B) or simply (331:000) Goal state: (0M0C0B: 3M3C1B) or simply (000:331) Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 14 Eight-Puzzle Problem Problem Statement: The eight-puzzle problem has a 3x3 grid with 8 randomly numbered (1 to 8) tiles arranged on it with one empty cell. At any point, the adjacent tile can move to the empty cell creating a new empty cell. Solving this problem involves arranging tiles such that we get the goal state from the start state. The start state could be represented as: [ [3,7,6], [5,1, 2], [4,0,8] The goal state could be represented as: [ [5,3,6] [7,0,2], [4,1,8] Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 15 The operators can be moving {up, down, left, right}. Partial search tree is shown here. Continue searching like this till the goal state is reached. Control Strategies Control strategy describes the order of application of the rules to the current state. Control strategy should be such that it causes motion towards a solution. For example, in water jug problem, if we apply a simple control strategy of starting each time from the top of rule list and select the first applicable one, then we will never move towards solution. The second requirement of control strategy is that it should explore the solution space in a systematic manner. For example, if we select a control strategy where we select a rule randomly from the applicable rules, then definitely it causes motion and eventually will lead to a solution. However, there is every possibility that we arrive to same state several times This is because control strategy is not systematic. To solve some real-world problems, effective control strategy must be used. The problem can be solved by searching for a solution. There are two types of search strategy. o Data-driven search, called forward chaining o Goal-driven search, called backward chaining Forward Chaining: The process of forward chaining begins with known facts and works towards a conclusion. For example, in eight-puzzle problem, we start from the start state and work forward to the conclusion, i.e., the goal state. In this case, we begin building a tree of move sequences with the root of the tree as start state. This process is continued until a configuration that matches the goal state is generated. Language OPS5 use forward reasoning rules. Backward Chaining: It is a goal-directed strategy that begins with the goal state and continues working backward, generating more sub-goals that must also be satisfied to satisfy main goal until we reach to start state. Prolog language uses this strategy. Characteristics of Problem Before performing the search and finding the solution, we must analyse the problem based on the following characteristics. Type of problem: There are three types of problems in real life. o Ignorable: These are the problems where we can ignore the solution steps. For example, in proving a theorem, if some lemma is proved to prove a theorem and later on, we realize that it is not useful then we can ignore this solution step and prove another lemma. Such problems can be solved using simple control strategy. o Recoverable. These are the problems where solution steps can be undone. For example, in water jug problem, if we have filled up the jug, we can empty it also. Any state can be reached again by undoing the steps. These problems are generally puzzles played by a single player. o Irrecoverable: The problems where solution steps cannot be undone. For example, any twoplayer game such as chess, playing cards, snake and ladder, etc. are examples of this category. Such problems can be solved by planning process. Decomposability of a problem: Divide the problem into a set of independent smaller sub-problems, solve them and combine the solutions to get the final solution. Role of knowledge: Knowledge plays an important role in solving any problem. Knowledge could be in the form of rules and facts which help generating search space for finding the solution. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 16 Consistency of Knowledge Base used in solving problem: Make sure that knowledge base used to solve problem is consistent. Inconsistent knowledge base will lead to wrong solutions. Requirement of solution: We should analyse the problem whether solution required is absolute or relative. We call solution to be absolute if we have to find exact solution, whereas it is relative if we have reasonably good and approximate solution. o For example, in water jug problem, if there are more than one way to solve a problem, then we follow one path successfully. There is no need to go back and find a better solution. In this case, the solution is absolute. o In travelling salesman problem, our goal is to find the shortest route. Unless all routes are known, it is difficult to know the shortest route. This is a best-path problem, whereas water jug is any-path problem. Exhaustive Search Techniques Exhaustive Search is an algorithmic technique in which first all possible solutions are listed out and then we select the most feasible solution. Different types of Exhaustive Search Techniques are: o Breadth-First Search (BFS) o Depth-First Search (DFS) o Depth-First Iterative Deepening (DFID) o Bidirectional Search Breadth-First Search (BFS) The breadth-first search (BFS) expands all the states one step away from the start state, and then expands all states two steps from start state, then three steps, etc., until a goal state is reached. All successor states are examined at the same depth before going deeper. The BFS always gives an optimal path or solution. Let us see the search tree generation from start state of the water jug problem using BFS algorithm. At each state, we apply first applicable rule. If it generates previously generated state then cross it and try another rule in the sequence to avoid the looping. If new state is generated then expand this state in breadth-first fashion. The rules of water jug problem are applied and are mentioned here inside the curly brackets. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 17 This search is implemented using two lists called OPEN and CLOSED. The OPEN list contains those states that are to be expanded and CLOSED list keeps track of states already expanded. Here OPEN list is maintained as a queue and CLOSED list as a stack. For the sake of simplicity, we are writing BFS algorithm for checking whether a goal node exists or not. Furthermore, this algorithm can be modified to get a path from start to goal nodes by maintaining CLOSED list with pointer back to its parent in the search tree. Depth-First Search (DFS) In depth-first search we go as far down as possible into the search tree/graph before backing up and trying alternatives. It works by always generating a branch of the most recently expanded node until some depth cut off is reached. DFS is memory efficient, as it only stores a single path from the root to leaf node along with the remaining unexpanded siblings for each node on the path. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 18 We can implement DFS by using two lists called OPEN and CLOSED. The OPEN list contains those states that are to be expanded, and CLOSED list keeps track of states already expanded. Here OPEN and CLOSED lists are maintained as stacks. If we discover that first element of OPEN is the Goal state, then search terminates successfully. Comparison of BFS and DFS BFS is effective when the search tree has a low branching factor. BFS can work even in trees that are infinitely deep. BFS requires a lot of memory as number of nodes in level of the tree increases exponentially. BFS is superior when the GOAL exists in the upper right portion of a search tree. BFS gives optimal solution. DFS is best when the GOAL exists in the lower left portion of the search tree. DFS is memory efficient as the path from start to current node is stored. DFS may not give optimal solution. DFS is effective when there are few sub trees in the search tree that have only one connection point to the rest of the states. Depth-First Iterative Deepening (DFID) DFID takes advantages of both BFS and DFS. For a given depth ‘d’, DFID performs a DFS and never searches deeper than depth d and d is increased by 1 in next iteration if solution is not found. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 19 Bidirectional Search Bidirectional search is a graph search algorithm that runs two simultaneous searches. One search moves forward from start state and other moves backward from the goal and stops when the two meet in the middle. Consider the following graph shown below. We can find the route/path from node 1 to node 16 using bidirectional search. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 20 Analysis of Search methods Effectiveness of any search strategy in problem solving is measured in terms of following: o Completeness: Completeness means that an algorithm guarantees a solution if it exists. o Time Complexity: Time required by an algorithm to find a solution. o Space Complexity: Space required by an algorithm to find a solution. o Optimality: The algorithm is optimal if it finds the highest quality solution when there are several different solutions for the problem. Performance comparison of Exhaustive Search Techniques: Travelling Salesman Problem Statement. In travelling salesman problem (TSP), one is required to find the shortest route of visiting all the cities once and returning back to starting point. Assume that there are 'n' cities and the distance between each pair of the cities is given. The problem seems to be simple, but deceptive. The TSP is one of the most intensely studied problems in computational mathematics and yet no effective solution method is known for the general case. This will require (n - 1)! (i.e., factorial of n - 1) paths to be examined for 'n' cities. Start generating complete paths, keeping track of the shortest path found so far. Stop exploring any path as soon as its partial length becomes greater than the shortest path length found so far. Let us consider an example of five cities and see how we can find solution to this problem. Assume that C1 is the start city. First complete path is taken to be the minimum and ‘√’ is put along with the distance, and if the distance (full or partial) is greater than the previously calculated minimum, then ‘x’ is put to show pruning of that path. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 21 The partial paths are pruned if the distance computed is less than minimum computed distance so far between any pair of cities. Continue till all the paths have been explored. In this case, there will be 4! = 24 possible paths. There will be more ways of solving this problem and one would like to exercise a choice between various solution paths based on some criteria of goodness. Heuristic Search Techniques (Informed Search Algorithms) A heuristic is a technique that is used to solve a problem faster than the classic methods. These techniques try to solve problems with minimum steps/costs. Heuristics are said to be the problem-solving techniques that result in practical and quick solutions. Different types of Heuristic Search Techniques are: o Hill Climbing Algorithm o Best first search algorithm o Beam search algorithm o Branch and bound search o A* Algorithm Hill Climbing Algorithm It is a technique for optimizing the mathematical problems. Hill Climbing is widely used when a good heuristic is available. It is a local search algorithm that continuously moves in the direction of increasing elevation/value to find the mountain's peak or the best solution to the problem. It terminates when it reaches a peak value where no neighbour has a higher value. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 22 Traveling-salesman Problem is one of the widely discussed examples of the Hill climbing algorithm, in which we need to minimize the distance travelled by the salesman. It is also called greedy local search as it only looks to its good immediate neighbour state and not beyond that. Problems with Hill Climbing in AI: Local Maximum- All neighbouring states have values worse than the current. The greedy approach means we won’t be moving to a worse state. This terminates the process even though there may have been a better solution. As a workaround, we use backtracking. Plateau- All neighbours to it have the same value. This makes it impossible to choose a direction. To avoid this, we randomly make a big jump. Ridge- At a ridge, movement in all possible directions is downward. This makes it look like a peak and terminates the process. To avoid this, we may use two or more rules before testing. Best first search algorithm Best-first search algorithm always selects the path which appears best at that moment. It uses the heuristic function and search. In the best first search algorithm, expand the node which is closest to the goal node and the closest cost is estimated by heuristic function. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 23 Beam search algorithm Beam search is a heuristic search algorithm in which W number of best nodes at each level is always expanded. It progresses level by level and moves downward only from the best W nodes at each level. Beam search uses breadth-first search to build its search tree. At each level of the tree, it generates all successors of the states at the current level, sorts them in order of increasing heuristic values. However, it only considers a W number of states at each level. Other nodes are ignored. Best nodes are decided on the heuristic cost associated with the node. Here W is called width of beam search. If B is the branching factor, there will be only W*B nodes under consideration at any depth but only W nodes will be selected. If beam width is smaller, the more states are pruned. If W = 1, then it becomes hill climbing search where always best node is chosen from successor nodes. If beam width is infinite, then no states are pruned and beam search is identical to breath-first search. Branch and Bound Search (Uniform Cost Search) In branch and bound search method, cost function (denoted by g(X)) is designed that assigns cumulative expense to the path from start node to the current node X by applying the sequence of operators. While generating a search space, a least cost path obtained so far is expanded at each iteration till we reach to goal state. Since branch and bound search expands the least-cost partial path, it is sometimes also called a uniform cost search. In branch and bound method, if g(X) = 1 for all operators, then it degenerates to simple breadth- first search. From Al point of view, it is as bad as depth first and breadth first. This can be improved if we expand it by dynamic programming, that is, delete those paths which are redundant. A* Algorithm It is a searching algorithm that is used to find the shortest path between an initial and a final point. It searches for shorter paths first, thus making it an optimal and complete algorithm. An optimal algorithm will find the least cost outcome for a problem. It uses a heuristic function usually denoted by f(X), which gives an estimation on the cost of getting from node ‘N’ to the goal state. This helps in selecting the optimal node for expansion. The heuristic function for a node N is defined as follows: f(N)= g(N)+h(N) The function g is a measure of the cost of getting from the start node to the current node N. The function h is an estimate of additional cost of getting from the current node N to the goal node. A* algorithm incrementally searches all the routes starting from the start node until it finds the shortest path to a goal. Starting with a given node, the algorithm expands the node with the lowest f(X) value. Example: Consider a tree diagram where heuristic values of each state are given in the table. We have to calculate the f(n) of each state using the formula f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state. Here we will use OPEN and CLOSED list. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 24 Iteration1: {(S--> A, 4), (S-->G, 10)} Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)} Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)} Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with cost 6. Algorithm of A* search: Step1: Place the starting node in the OPEN list. Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops. Step 3: Select the node from the OPEN list which has the smallest value of evaluation function (g+h), if node n is goal node, then return success and stop, otherwise Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute evaluation function for n' and place into Open list. Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back pointer which reflects the lowest g(n') value. Step 6: Return to Step 2. Advantages: A* search algorithm gives better output compared to other search algorithms. A* search algorithm is optimal and complete. This algorithm can solve very complex problems. Disadvantages: It does not always produce the shortest path as it is mostly based on heuristics and approximation. A* search algorithm has some complexity issues. The main drawback of A* is memory requirement as it keeps all generated nodes in the memory, so it is not practical for various large-scale problems. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 25 Eight Puzzle problem using A* Algorithm The simple evaluation function f(x) is defined as follows: f(x) = g(X)+h(X) where, h(X) = the number of tiles not in their goal position in a given state X g(X)= depth of node X in the search tree Optimal Solution by A* Algorithm A* Algorithm finds optimal solution if heuristic function is carefully designed and is underestimated. Underestimation: o If we can guarantee that h never overestimates actual value from current to goal, then A* algorithm ensures to find an optimal path to a goal, if one exists. Consider the following example. o Here assume that h value for each node X is underestimated, i.e., heuristic value is less than actual value from node X to goal node. o Start node A is expanded to B, C, and D with f values as 4, 5, and 6, respectively. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 26 o o o o o Note that node B has minimum value, so expand this node to E which has f value as 5. Since f value of C is also 5, go in favour of E, the path currently we are expanding. Next node E is expanded to node F with f value as 6. Now, expansion of a node F is stopped as f value of C is now the smallest. Thus, we see that by underestimating heuristic value, we have wasted some effort but eventually discovered that B was farther away than we thought. o Now we go back and try another path and will find the optimal path. Overestimation o Let us consider another situation. Here we are overestimating heuristic value of each node in the graph/tree. o Expand B to E, E to F, and F to G for a solution path of length 4. But assume that there is a direct path from D to a solution giving a path of length 2 as h value of D is also overestimated. We will never find it because of overestimating h(D). We may find some other worse solution without ever expanding D. So, by overestimating h, we cannot be guaranteed to find the shortest path. Admissibility of A o A search algorithm is admissible, if for any graph, it always terminates in an optimal path from start state to goal state, if path exists. o If heuristic function 'h' underestimates the actual value from current state to goal state, then it bounds to give an optimal solution and hence is called admissible function. Iterative deepening A* (IDA*) Algorithm Iterative deepening A* is a graph traversal and path search algorithm that can find the shortest path between a designated start node and any member of a set of goal nodes in a weighted graph. It is a variant of iterative deepening depth-first search that borrows the idea to use a heuristic function to evaluate the remaining cost to get to the goal from the A* search algorithm. Since it is a depth-first search algorithm, its memory usage is lower than in A*, but unlike ordinary iterative deepening search, it concentrates on exploring the most promising nodes and thus doesn’t go to the same depth everywhere in the search tree. Unlike A*, IDA* doesn’t utilize dynamic programming and therefore often ends up exploring the same nodes many times. IDA* is a memory constrained version of A*. It does everything that the A* does, it has the optimal characteristics of A* to find the shortest path but it uses less memory than A*. Pros: It will always find the optimal solution. Various heuristics can be integrated to the algorithm without changing the basic code. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 27 Uses a lot less memory which increases linearly as it doesn’t store and forgets after it reaches a certain depth and start over again. Cons: Doesn’t keep track of visited nodes and thus explores already explored nodes again. Slower due to repeating the exploring of explored nodes. Requires more processing power and time than A*. Constrained Satisfaction Many AI problems can be viewed as problems of constrained satisfaction in which the goal is to solve some problem state that satisfies a given set of constraints. Examples of such a problem are: o Crypt-Arithmetic puzzles o N-Queen: Given the condition that no two queens on the same row/column/diagonal attack each other. o Map colouring: Given a map, colour three regions in blue, red and black, such that no two neighbouring regions have the same colour. Crypt-Arithmetic puzzle A crypt-arithmetic puzzle is a problem where the digits of some numbers are represented by letters (or symbols). Each letter represents a unique digit. The goal is to find the digits such that a given mathematical equation is verified. Problem Statement: Solve the following puzzle by assigning numbers (0-9) in such a way that each letter is assigned a unique digit which satisfy the following addition. Constraints: No two letters have the same value. Clearly, O = 1., as it is the carry generated by G + T. Since O = 1, O + O = 1 + 1 =2. So, T = 2. G + T = U + 10 (Add 10 as carry is generated) G + 2 = U + 10 U will become a 2-digit number only if G = 8 or 9. If G = 9, U = 1. Which is not valid since O = 1 So, G = 8 and U = 0. Hence, O + U + T = 1 + 0 + 2 = 3 Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 28 Problem Statement: Solve the following puzzle by assigning numbers (0-9) in such a way that each letter is assigned unique digit which satisfy the following addition. Constraints: No two letters have the same value. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 29 Problem: The heuristic path algorithm is a best-first search in which the objective function is f(n) = (2 − w)g(n) + wh(n). For what values of w is this algorithm guaranteed to be optimal. What kind of search does this perform when w = 0? When w = 1? When w = 2? Answer: The algorithm is guaranteed to be optimal for 0 ≤ w ≤ 1, since scaling g(n) by a constant has no effect on the relative ordering of the chosen paths, but, if w > 1 then it is possible the wh(n) will overestimate the distance to the goal, making the heuristic inadmissible. If w ≤ 1, then it will reduce the estimate, but it is still guaranteed to underestimate the distance to the goal state. Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 30