Document

advertisement
Artificial Intelligence
Rehearsal Lesson
1
Ram Meshulam 2004
Solving Problems with Search
Algorithms
• Input: a problem P.
• Preprocessing:
– Define states and a state space
– Define Operators
– Define a start state and goal set of states.
• Processing:
– Activate a Search algorithm to find a path form
start to one of the goal states.
2
Ram Meshulam 2004
Uninformed Search
• Uninformed search methods use only
information available in the problem
definition.
–
–
–
–
–
3
Breadth First Search (BFS)
Depth First Search (DFS)
Iterative DFS (IDA)
Bi-directional search
Uniform Cost Search (a.k.a. Dijkstra alg.)
Ram Meshulam 2004
Breadth-First-Search Attributes
• Completeness – yes ( b   , d   )
• Optimality – yes, if graph is unweighted.
• Time Complexity: O (1  b  b 2  ...  b d 1  b )  O ( b d 1 )
• Memory Complexity: O ( b d 1 )
– Where b is branching factor and d is the
solution depth
4
Ram Meshulam 2004
Depth-First-Search Attributes
• Completeness – No. Infinite loops or
Infinite depth can occur.
• Optimality – No.
m
O
(
b
)
• Time Complexity:
• Memory Complexity: O ( b m )
– Where b is branching factor and m is the
2
maximum depth of search tree
3
4
5
Ram Meshulam 2004
1
5
Limited DFS Attributes
• Completeness – Yes, if d≤l
• Optimality – No.
• Time Complexity: O ( b l )
– If d<l, it is larger than in BFS
• Memory Complexity: O ( bl )
– Where b is branching factor and l is the
depth limit.
6
Ram Meshulam 2004
Depth-First Iterative-Deepening
0
1,3,
9
12
14
8,20
7,17c
5,13
c
4,10
11
2,6,16
15
c
18
19
21
The numbers represent the order generated by DFID
7
Ram Meshulam 2004
22
Iterative-Deepening Attributes
• Completeness – Yes
• Optimality – yes, if graph is un-weighted.
• Time Complexity:
O (( d ) b  ( d  1) b  ...  (1) b )  O ( b )
2
d
d
• Memory Complexity: O ( db )
– Where b is branching factor and d is the maximum
depth of search tree
8
Ram Meshulam 2004
State Redundancies
• Closed list - a hash table which holds the
visited nodes.
• For example BFS:
Closed List
Open List (Frontier)
9
Ram Meshulam 2004
Uniform Cost Search Attributes
• Completeness: yes, for positive weights
• Optimality: yes
c / e 
)
• Time & Memory complexity: O ( b
– Where b is branching factor, c is the optimal solution cost
and e is the minimum edge cost
10
Ram Meshulam 2004
Best First Search Algorithms
• Principle: Expand node n with the best
evaluation function value f(n).
• Implement via a priority queue
• Algorithms differ with definition of f :
–
–
–
–
11
Greedy Search: f ( n )  h ( n )
A*:
f (n)  g (n)  h(n)
IDA*: iterative deepening version of A*
Etc’
Ram Meshulam 2004
Best-FS Algorithm Pseudo code
1. Start with open = [initial-state].
2. While open is not empty do
1. Pick the best node on open.
2. If it is the goal node then return with success.
Otherwise find its successors.
3. Assign the successor nodes a score using the
evaluation function and add the scored nodes
to open
12
Ram Meshulam 2004
General Framework using Closedlist (Graph-Search)
GraphSearch(Graph graph, Node start, Vector goals)
1. O make_data_structure(start) // open list
2. Cmake_hash_table // closed list
3. While O not empty loop
1. n  O.remove_front()
2. If goal (n) return n
3. If n is found on C  continue
4. //otherwise
5.
4.
13
O  successors (n)
6. Cn
Return null //no goal found
Ram Meshulam 2004
Greedy Search Attributes
• Completeness: No. Inaccurate heuristics can
cause loops (unless using a closed list), or
entering an infinite path
• Optimality: No. Inaccurate heuristics can
lead to a non optimal solution.
s
1
3
• Time & Memory complexity:
m
O (b )
14
Ram Meshulam 2004
a
h=1
h=2
2
g
1
b
A* Algorithm (1)
• Combines greedy h(n) and uniform cost g(n)
approaches.
• Evaluation function: f(n)=g(n)+h(n)
• Completeness:
– In a finite graph: Yes
– In an infinite graph: if all edge costs are finite and have
a minimum positive value, and all heuristic values are
finite and non-negative.
• Optimality:
– In tree-search: if h(n) is admissible
– In graph-search: if it is also consistent
15
Ram Meshulam 2004
Heuristic Function h(n)
• Admissible/Underestimate: h(n) never
overestimate the actual cost from n to goal
• Consistent/monotonic (desirable) :
h(m)-h(n) ≤w(n,m) where m is parent of n. This
ensures f(n) ≥f(m).
16
Ram Meshulam 2004
A* Algorithm (2)
• optimally efficient: A* expands the
minimal number of nodes possible with any
given (consistent) heuristic.
• Time and space complexity:
– Worst case: Cost function f(n) = g(n)
c/e
O (b )
– Best case: Cost function f(n) = g(n) + h*(n)
O (bd )
17
Ram Meshulam 2004
Duplicate Pruning
• Do not enter the father of the current state
– With or without using closed-list
• Using a closed-list, check the closed list before
entering new nodes to the open list
– Note: in A*, h has to be consistent!
– Do not remove the original check
• Using a stack, check the current branch and
stack status before entering new nodes
18
Ram Meshulam 2004
IDA* Algorithm
• Each iteration is a depth-first search that
keeps track of the cost evaluation f = g + h
of each node generated.
• The cost threshold is initialized to the
heuristic of the initial state.
• If a node is generated whose cost exceeds
the threshold for that iteration, its path is cut
off.
19
Ram Meshulam 2004
IDA* Attributes
• The cost threshold increases in each iteration to
the total cost of the lowest-cost node that was
pruned during the previous iteration.
• The algorithm terminates when a goal state is
reached whose total cost does not exceed the
current threshold.
• Completeness and Optimality: Like A*
• Space complexity: O (c )
• Time complexity*: O ( b c / e )
20
Ram Meshulam 2004
Local Search – Cont.
• In order to avoid local
maximum and
plateaus we permit
moves to states with
lower values in
probability p.
• The different
algorithms differ in p.
21
Algorithm
p
Hill
p=0
Climbing,GSAT
Random Walk
p=1
Mixed Walk,
Mixed GSAT
p=c (domain
specific)
Simulated
Annealing
p=acceptor(dh,
T)
Ram Meshulam 2004
Hill Climbing
• Always choose the next best successor
• Stop when no improvement possible
• In order to avoid plateaus and local
maximum:
- Sideways move
- Stochastic hill climbing
- Random-restart algorithm
22
Ram Meshulam 2004
Simulated Annealing – Pseudo code
Cont.
• Acceptor func.
example:

e
h
c t
0  c 1
• Schedule func.
example:
23
c
round
0 < c< 1
Ram Meshulam 2004
 startT emp
Search Algorithms Hierarchy
Global
Informed
A*
IDA*
Uninformed
DFS
IDS
Greedy
BFS
Uniform Cost
Local
GSAT
Hill Climbing
24
Random
Walk
Ram Meshulam 2004
Mixed Walk
Mixed GSAT
Simulated Annealing
Exercise
• What are the different
data structures used to
implement the open
list in BFS,DFS,BestFS:
BFS
Queue
DFS
Stack
Best-FS
Priority
(Greedy,A*,Unifo Queue
rm-Cost Alg).
25
Ram Meshulam 2004
Minimax
• Perfect play for deterministic games
• Idea: choose move to position with highest minimax value
= best achievable payoff against best play
• E.g., 2-ply game:
26
Ram Meshulam 2004
Properties of minimax
• Complete? (=will not run forever) Yes (if tree is finite)
• Optimal? (=will find the optimal response) Yes (against an
optimal opponent)
• Time complexity? O(bm)
• Space complexity? O(bm) (depth-first exploration), O(bm)
for saving the optimal response
• For chess, b ≈ 35, m ≈100 for "reasonable" games
 exact solution completely infeasible
Ram Meshulam 2004
•27
α-β pruning example
28
Ram Meshulam 2004
α-β pruning example
29
Ram Meshulam 2004
α-β pruning example
30
Ram Meshulam 2004
α-β pruning example
31
Ram Meshulam 2004
α-β pruning example
32
Ram Meshulam 2004
Planning
• Traditional search methods does not fit to a
large, real world problem
• We want to use general knowledge
• We need general heuristic
• Problem decomposition
33
Ram Meshulam 2004
STRIPS – Representation
• States and goal – sentences in FOL.
• Operators – are combined of 3 parts:
– Operator name
– Preconditions – a sentence describing the conditions
that must occur so that the operator can be executed.
– Effect – a sentence describing how the world has
change as a result of executing the operator. Has 2
parts:
• Add-list
• Delete-list
– Optionally, a set of (simple) variable constraints
34
Ram Meshulam 2004
Choosing an attribute
• Idea: a good attribute splits the examples into subsets that
are (ideally) "all positive" or "all negative"
• Patrons? is a better choice
35
Ram Meshulam 2004
Using information theory
• To implement Choose-Attribute in the DTL
algorithm
• Information Content of an answer (Entropy):
I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi)
• For a training set containing p positive examples
and n negative examples:
I(
36
p
pn
,
n
pn
)
p
pn
log
p
2
pn
Ram Meshulam 2004

n
pn
log
n
2
pn
Information gain
• A chosen attribute A divides the training set E into subsets
E1, … , Ev according to their values for A, where A has v
distinct values.
v
remainder ( A ) 

i 1
p i  ni
pn
I(
pi
pi  ni
,
ni
pi  ni
)
• Information Gain (IG) or reduction in entropy from the
attribute test:
p
n
IG ( A )  I (
,
)  remainder ( A )
pn pn
• Choose the attribute with the largest IG
37
Ram Meshulam 2004
Information gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit
Consider the attributes Patrons and Type (and others too):
IG ( Patrons )  1  [
2
12
IG (Type )  1  [
I ( 0 ,1) 
4
12
I (1, 0 ) 
6
2 4
I ( , )]  . 0541 bits
12
6 6
2
1 1
2
1 1
4
2 2
4
2 2
I( , )
I( , )
I( , )
I ( , )]  0 bits
12
2 2
12
2 2
12
4 4
12
4 4
Patrons has the highest IG of all attributes and so is chosen by the DTL
algorithm as the root
38
Ram Meshulam 2004
Bayes’ Rule
P(B|A) =
39
P(A|B)*P(B)
P(A)
Computing the denominator:
#1 approach - compute relative likelihoods:
• If M (meningitis) and W(whiplash) are two possible
explanations
#2 approach - Using M & ~M:
• Checking the probability of M, ~M when S
– P(M|S) = P(S| M) * P(M) / P(S)
– P(~M|S) = P(S| ~M) * P(~M)/ P(S)
• P(M|S) + P(~M | S) = 1 (must sum to 1)
40
Perceptrons
•
Linear separability
–
A set of (2D) patterns (x1, x2) of two classes is linearly
separable if there exists a line on the (x1, x2) plane
•
•
–
A perceptron can be built with
•
–
3 input x0 = 1, x1, x2 with weights w0, w1, w2
n dimensional patterns (x1,…, xn)
•
41
w0 + w1 x1 + w2 x2 = 0
Separates all patterns of one class from the other class
Hyperplane w0 + w1 x1 + w2 x2 +…+ wn xn = 0 dividing the
space into two regions
Ram Meshulam 2004
Backpropagation example
w13
x1
x3
w35
w14
x5
w23
x2
w24
x4
w45
Sigmoid as activation function with x=3:
• g(in) = 1/(1+℮-3·in)
• g’(in) = 3g(in)(1-g(in))
42
Ram Meshulam 2004
Adding the threshold
1
1
x0
w03
w04
x1
w13
x6
w65
x3
w35
w14
x5
w23
x2
43
w24
x4
w45
Ram Meshulam 2004
Training Set
• Logical XOR (exclusive OR) function
x1 x2 output
0 0
0
0 1
1
1 0
1
1 1
0
• Choose random weights
• <w03,w04,w13,w14,w23,w24,w65,w35,w45> =
<0.03,0.04,0.13,0.14,-0.23,-0.24,0.65,0.35,0.45>
• Learning rate: 0.1 for the hidden layers, 0.3 for the output layer
44
Ram Meshulam 2004
First Example
• Compute the outputs
• a0 = 1 , a1= 0 , a2 = 0
• a3 = g(1*0.03 + 0*0.13 + 0*-0.23) = 0.522
• a4 = g(1*0.04 + 0*0.14 + 0*-0.24) = 0.530
• a6 = 1, a5 = g(0.65*1 + 0.35*0.522 + 0.45*0.530) = 0.961
• Calculate ∆5 = 3*g(1.0712)*(1-g(1.0712))*(0-0.961) = -0.108
• Calculate ∆6, ∆3, ∆4
• ∆6 = 3*g(1)*(1-g(1))*(0.65*-0.108) = -0.010
• ∆3 = 3*g(0.03)*(1-g(0.03))*(0.35*-0.108) = -0.028
• ∆4 = 3*g(0.04)*(1-g(0.04))*(0.45*-0.108) = -0.036
• Update weights for the output layer
• w65 = 0.65 + 0.3*1*-0.108 = 0.618
• w35 = 0.35 + 0.3*0.522*-0.108 = 0.333
• w45 = 0.45 + 0.3*0.530*-0.108 = 0.433
45
Ram Meshulam 2004
First Example (cont)
• Calculate ∆0, ∆1, ∆2
• ∆0 = 3*g(1)*(1-g(1))*(0.03*-0.028 + 0.04*-0.036) = -0.001
• ∆1 = 3*g(0)*(1-g(0))*(0.13*-0.028 + 0.14*-0.036) = -0.006
• ∆2 = 3*g(0)*(1-g(0))*(-0.23*-0.028 + -0.24*-0.036) = 0.011
• Update weights for the hidden layer
• w03 = 0.03 + 0.1*1*-0.028 = 0.027
• w04 = 0.04 + 0.1*1*-0.036 = 0.036
• w13 = 0.13 + 0.1*0*-0.028 = 0.13
• w14 = 0.14 + 0.1*0*-0.036 = 0.14
• w23 = -0.23 + 0.1*0*-0.028 = -0.23
• w24 = -0.24 + 0.1*0*-0.036 = -0.24
46
Ram Meshulam 2004
Second Example
• Compute the outputs
• a0 = 1, a1= 0 , a2 = 1
• a3 = g(1*0.027 + 0*0.13 + 1*-0.23) = 0.352
• a4 = g(1*0.036 + 0*0.14 + 1*-0.24) = 0.352
• a6 = 1, a5 = g(0.618*1 + 0.333*0.352 + 0.433*0.352) = 0.935
• Calculate ∆1 = 3*g(0.888)*(1-g(0.888))*(1-0.935) = 0.012
• Calculate ∆6, ∆3, ∆4
• ∆6 = 3*g(1)*(1-g(1))*(0.618*0.012) = 0.001
• ∆3 = 3*g(-0.203)*(1-g(-0.203))*(0.333*0.012) = 0.003
• ∆4 = 3*g(-0.204)*(1-g(-0.204))*(0.433*0.012) = 0.004
• Update weights for the output layer
• w65 = 0.618 + 0.3*1*0.012 = 0.623
• w35 = 0.333 + 0.3*0.352*0.012 = 0.334
• w45 = 0.433 + 0.3*0.352*0.012 = 0.434
47
Ram Meshulam 2004
Second Example (cont)
• Calculate ∆0, ∆1, ∆2
• Skipped, we do not use them
• Update weights for the hidden layer
• w03 = 0.027 + 0.1*1*0.003 = 0.027
• w04 = 0.036 + 0.1*1*0.004 = 0.036
• w13 = 0.13 + 0.1*0*0.003 = 0.13
• w14 = 0.14 + 0.1*0*0.004 = 0.14
• w23 = -0.23 + 0.1*1*0.003 = -0.23
• w24 = -0.24 + 0.1*1*0.004 = -0.24
48
Ram Meshulam 2004
Bayesian networks
• Syntax:
– a set of nodes, one per variable
– a directed, acyclic graph (link ≈ "directly influences")
– a conditional distribution for each node given its parents:
P (Xi | Parents (Xi))- conditional probability table (CPT)
49
Ram Meshulam 2004
Calculation of Joint Probability
• Given its parents, each node is conditionally independent
of everything except its descendants
• Thus,
P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi))
 full joint distribution table
• Every BN over a domain implicitly represents some joint
distribution over that domain
50
Ram Meshulam 2004
Connection Types
Name
Causal chain
Diagram
X ind. Z?
X ind. Z, given Y?
Not necessarily
Yes
Common Cause
No
Yes
Common Effect
Yes
No
51
Ram Meshulam 2004
Reachability (the Bayes Ball)
•
•
•
•
•
Shade evidence nodes
Start at source node
Try to reach target by search
States: node, along with previous arc
Successor function:
– Unobserved nodes:
• To any child of X
• To any parent of X if coming from a child
– Observed nodes:
• From parent of X to parent of X
• If you can’t reach a node, it’s conditionally independent of
the start node
52
Ram Meshulam 2004
Naive Bayes Classifiers
Task: Classify a new instance D based on a tuple of attribute values into
one of the classes cj  C
D  x1 , x 2 ,  , x n
c MAP  argmax P ( c | x1 , x 2 ,  , x n )
cC
 argmax
c C
P ( x1 , x 2 ,  , x n | c ) P ( c )
P ( x1 , x 2 ,  , x n )
 argmax P ( x1 , x 2 ,  , x n | c ) P ( c )
c C
53
CIS 391- Intro to AI
Robots Environment Assumptions
• Static - to be able to guarantee completeness
• Inaccessible - greater impact on the on-line version
• Non-deterministic (move 5M, but able to move 5.1M)
• Continuous
– Exact cellular decomposition
– Approximate cellular decomposition
54
MSTC- Multi Robot Spanning Tree
Coverage
• Complete - with approximate cellular decomposition
• Robust
– Coverage completed as long as one robot is alive
– The robustness mechanism is simple
• Off-line and On-line algorithms
– Off-line:
o Analysis according to initial positions
o Efficiency improvements
– On-line:
o Implemented on simulation of real-robots
55
Off-line Coverage, Basic Assumptions
•
•
•
•
Area division – n cells
k homogenous robots
Equal associated tool size
Robots movement
56
STC: Spanning Tree Coverage
(Gabrieli and Rimon 2001)
• Area division
• Graph definition
• Building the spanning tree
57
Non-backtracking MSTC
• Initialization phase: Build STC, distribute to robots
• Distributed execution: Each robot follows its section
– Low risk of collisions
Robot B is done!
C
Robot A is done!
B
A
Robot C is done!
58
Backtracking MSTC
• Similar initialization phase
• Robots backtrack to assist others
• No point is covered more than twice
D
C
B
B
A
59
Download