Tentamen TDDC17 Artificial Intelligence Example Exam 2011a

advertisement
Linköpings Universitet
Institutionen för Datavetenskap
Patrick Doherty
Tentamen
TDDC17 Artificial Intelligence
Example Exam 2011a
Points:
The exam consists of exercises worth 33 points.
To pass the exam you need 17 points.
Auxiliary help items:
Hand calculators.
Directions:
You can answer the questions in English or Swedish.
Use notations and methods that have been discussed in the course.
In particular, use the definitions, notations and methods in appendices 1-3.
Make reasonable assumptions when an exercise has been under-specified.
Begin each exercise on a new page.
Write only on one side of the paper.
Write clearly and concisely.
Jourhavande: Mariusz Wzorek, 0703887122. Mariusz will arrive for questions around 15.00.
1
1. Consider the following theory (where x, y and z are variables and history, lottery and john are constants):
∀x([P ass(x, history) ∧ W in(x, lottery)] ⇒ Happy(x))
(1)
∀x∀y([Study(x) ∨ Lucky(x)] ⇒ P ass(x, y))
(2)
¬Study(john) ∧ Lucky(john)
(3)
∀x(Lucky(x) ⇒ W in(x, lottery))
(4)
(a) Convert formulas (1) - (4) into clause form. [1p]
(b) Prove that Happy(john) is a logical consequence of (1) - (4) using the resolution proof procedure.
[2p]
• Your answer should be structured using a resolution refutation tree (as used in the book).
• Since the unifications are trivial, it suffices to simply show the binding lists at each resolution
step.
2. Constraint satisfaction problems consist of a set of variables, a value domain for each variable and a set
of constraints. A solution to a CS problem is a consistent set of bindings to the variables that satisfy
the contraints. A standard backtracking search algorithm can be used to find solutions to CS problems.
In the simplest case, the algorithm would choose variables to bind and values in the variable’s domain
to be bound to a variable in an arbitrary manner as the search tree is generated. This is inefficient and
there are a number of strategies which can improve the search. Describe the following three strategies:
(a) Minimum remaining value heuristic (MRV). [1p]
(b) Degree heuristic. [1p]
(c) Least constraining value heuristic. [1p]
Constraint propagation is the general term for propagating constraints on one variable onto other
variables. Describe the following:
(d) What is the Forward Checking technique? [1p]
(e) What is arc consistency? [1p]
3. The following questions pertain to the course article by Newell and Simon entitled Computer Science
as an Empirical Enquiry: Symbols and Search..
(a) What is a physical symbol system (PSS) and what does it consist of? [2p]
(b) What is the Physical Symbol System Hypothesis? [1p]
2
4. Use the Bayesian network in Figure 1 together with the conditional probability tables below to answer
the following questions. Appendix 2 may be helpful to use.
(a) Write the formula for the full joint probability distribution P (A, B, C, D, E) in terms of (conditional) probabilities derived from the bayesian network below. [1p]
(b) P (a, ¬b, c, ¬d, e)[1p]
(c) P (e | a, c, ¬b)[2p]
Winter
(A)
Sprinkler
(B)
Rain
(C)
Wet Grass
(D)
Slippery Road
(E)
Figure 1: Bayesian Network Example
A
T
F
P(A)
.6
.4
A
T
T
F
F
B
T
F
T
F
P (B | A)
.2
.8
.75
.25
A
T
T
F
F
C
T
F
T
F
P (C | A)
.8
.2
.1
.9
B
T
T
T
T
F
F
F
F
C
T
T
F
F
T
T
F
F
D
T
F
T
F
T
F
T
F
P (D | B, C)
.95
.05
.9
.1
.8
.2
0
1
C
T
T
F
F
E
T
F
T
F
P (E | C)
.7
.3
0
1
5. A∗ search is the most widely-known form of best-first search. The following questions pertain to A∗
search:
(a) Explain what an admissible heuristic function is using the notation and descriptions in (c). [1p]
(b) Suppose a robot is searching for a path from one location to another in a rectangular grid of
locations in which there are arcs between adjacent pairs of locations and the arcs only go in
north-south (south-north) and east-west (west-east) directions. Furthermore, assume that the
robot can only travel on these arcs and that some of these arcs have obstructions which prevent
passage across such arcs.
The Mahattan distance between two locations is the shortest distance between the locations
ignoring obstructions. Is the Manhattan distance in the example above an admissible heuristic?
Justify your answer explicitly. [2p]
(c) Let h(n) be the estimated cost of the cheapest path from a node n to the goal. Let g(n) be the
path cost from the start node n0 to n. Let f (n) = g(n) + h(n) be the estimated cost of the
cheapest solution through n.
Provide a general proof that A∗ using tree-search is optimal if h(n) is admissible. If possible, use
a diagram to structure the proof. [2p]
3
6. The following question pertains to Decision Tree Learning. Use the definitions and data table in
Figure 3, Appendix 3 to answer this question. Figure 2 shows a partial decision tree for the Table in
Figure 3 with target attribute PlayTennis.
(a) What attribute should be tested in the box with the question mark on the right branch of the decision tree in Figure 2? Justify your answer by computing the information gain for the appropriate
attributes in the Table in Figure 3 in Appendix 3. [3p]
[D1, D2, ...., D14]
[9+, 5-]
Outlook
Sunny
Overcast
Rain
[D1,D2,D8,D9,D11]
[D3,D7,D12,D13]
[D4,D5,D6,D10,D14]
[2+, 3-]
[4+,0-]
[3+,2-]
Humidity
Yes
?
Which attribute should be tested here?
Figure 2: Partial Decision Tree for PlayTennis
7. Domain-independent heuristics play an important part in many planning algorithms. Such heuristics
are often based on relaxing the original planning problem.
(a) Explain the concept of relaxing a problem and how this helps us find useful heuristics.[2p]
(b) Provide a concrete example of this technique using an action schema in a planning domain of your
choice. [2p]
8. Modeling actions and change in incompletely represented, dynamic worlds is a central problem in
knowledge representation. The following questions pertain to reasoning about action and change.
(a) What is the frame problem? Use the Wumpus world to provide a concrete example of the problem.
[2p]
(b) What is the ramification problem? Use the Wumpus world to provide a concrete example of the
problem. [2p]
(c) What is nonmonotonic logic? How can it be used to provide solutions to the frame problem? [2p]
4
Appendix 1
Converting arbitrary wffs to clause form:
1. Eliminate implication signs.
2. Reduce scopes of negation signs.
3. Standardize variables within the scopes of quantifiers (Each quantifier should have its own unique
variable).
4. Eliminate existential quantifiers. This may involve introduction of Skolem constants or functions.
5. Convert to prenex form by moving all remaining quantifiers to the front of the formula.
6. Put the matrix into conjunctive normal form. Two useful rules are:
• ω1 ∨ (ω2 ∧ ω3 ) ≡ (ω1 ∨ ω2 ) ∧ (ω1 ∨ ω3 )
• ω1 ∧ (ω2 ∨ ω3 ) ≡ (ω1 ∧ ω2 ) ∨ (ω1 ∧ ω3 )
7. Eliminate universal quantifiers.
8. Eliminate ∧ symbols.
9. Rename variables so that no variable symbol appears in more than one clause.
5
Appendix 2
A generic entry in a joint probability distribution is the probability of a conjunction of particular assignments
to each variable, such as P (X1 = x1 ∧ . . . ∧ Xn = xn ). The notation P (x1 , . . . , xn ) can be used as an
abbreviation for this.
The chain rule states that any entry in the full joint distribution can be represented as a product of conditional
probabilities:
P (x1 , . . . , xn ) =
n
Y
P (xi | xi−1 , . . . , x1 )
(5)
i=1
Given the independence assumptions implicit in a Bayesian network a more efficient representation of entries
in the full joint distribution may be defined as
P (x1 , . . . , xn ) =
n
Y
P (xi | parents(Xi )),
(6)
i=1
where parents(Xi ) denotes the specific values of the variables in Parents(Xi ).
Recall the following definition of a conditional probability:
P(X | Y ) =
P(X ∧ Y )
P(Y )
(7)
The following is a useful general inference procedure:
Let X be the query variable, let E be the set of evidence variables, let e be the observed values for them,
let Y be the remaining unobserved variables and let α be the normalization constant:
P(X | e) = αP(X, e) = α
X
P(X, e, y)
(8)
y
where the summation is over all possible y’s (i.e. all possible combinations of values of the unobserved
variables Y).
Equivalently, without the normalization constant:
P(X | e) =
P
P(X, e)
y P(X, e, y)
=P P
P(e)
x
y P(x, e, y)
(9)
6
Appendix 3
Definition 1
Given a collection S, containing positive and negative examples of some target concept, the entropy of S
relative to this boolean classification is
Entropy(S) ≡ −p⊕ log2 p⊕ − p log2 p ,
where p⊕ is the proportion of positive examples in S and p is the proportion of negative examples in S.
Definition 2
Given a collection S, containing positive and negative examples of some target concept, and an attribute A,
the information gain, Gain(S, A), of A relative to S is defined as
X
Gain(S, A) ≡ Entropy(S) −
v∈Values(A)
| Sv |
Entropy(Sv ),
|S|
where values(A) is the set of all possible values for attribute A and Sv is the subset of S for which the
attribute A has value v (i.e., Sv = {s ∈ S | A(s) = v}).
Day
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
D14
Outlook
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Temperature
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
Humidity
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
Wind
Weak
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Strong
PlayTennis
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
Figure 3: Sample Table for Target Attribute PlayTennis
For help in converting from one logarithm base to another (if needed):
loga x =
logb x
logb a
ln x = 2.303 log10 x
Note also that for the example, we define 0log0 to be 0.
7
Download