0% found this document useful (0 votes)
19 views100 pages

AI Unit3

Unit III covers adversarial search and game theory, focusing on optimal decision-making in games using algorithms like MinMax and Alpha-Beta pruning. It discusses the limitations of game search algorithms and introduces Monte Carlo Tree Search as a probabilistic method for decision-making in AI. The unit emphasizes the importance of evaluating game positions and strategies in competitive environments.

Uploaded by

Tanuja mulla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views100 pages

AI Unit3

Unit III covers adversarial search and game theory, focusing on optimal decision-making in games using algorithms like MinMax and Alpha-Beta pruning. It discusses the limitations of game search algorithms and introduces Monte Carlo Tree Search as a probabilistic method for decision-making in AI. The unit emphasizes the importance of evaluating game positions and strategies in competitive environments.

Uploaded by

Tanuja mulla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

UNIT III

ADVERSARIAL
SEARCH AND
GAMES
CONTENTS…
 Game Theory,
 Optimal Decisions in Games,
 Heuristic Alpha–Beta Tree Search,
 Monte Carlo Tree Search,
 Stochastic Games, Partially Observable
Games,
 Limitations of Game Search Algorithms,
 Constraint Satisfaction Problems (CSP),
Constraint Propagation: Inference in
CSPs, Backtracking Search for CSPs.
 Many applications for AI
 Computer vision, natural language processing,
speech recognition, search …
 But games are some of the more interesting
 Opponents that are challenging, or allies
that are helpful
 Unit that is credited with acting on own
 Human-level intelligence too hard
 But under narrow circumstances can do pretty
well (ex: chess and Deep Blue)
 For many games, often constrained (by game
rules)
 we cover competitive environments,
in which the agents’
 goals are in conflict, giving rise GAME to
adversarial search problems—often
known as games.
MINMAX - OVERVIEW
 MinMax the heart of almost every computer
board game
 Applies to games where:
 Playerstake turns
 Have perfect information
 Chess, Checkers, Tactics
 But can work for games without perfect
information or chance
 Poker, Monopoly, Dice
 Can work in real-time (ie- not turn based) with
timer (iterative deepening, later)
MINMAX - OVERVIEW
 Search tree
 Squares represent decision states (ie- after a move)
 Branches are decisions (ie- the move)
 Start at root
 Nodes at end are leaf nodes
 Ex: Tic-Tac-Toe (symmetrical positions removed)

• Unlike binary trees can have any number of children


– Depends on the game situation
• Levels usually called plies (a ply is one level)
– Each ply is where "turn" switches to other player
• Players called Min and Max (next)
MAXMIN - ALGORITHM
 Named MinMax because of algorithm
behind data structure
 Assign points to the outcome of a game
 Ex:Tic-Tac-Toe: X wins, value of 1. O wins,
value -1.
 Max (X) tries to maximize point value, while
Min (O) tries to minimize point value
 Assume both players play to best of their
ability
 Always make a move to minimize or maximize
points
 So, in choosing, Max will choose best move
to get highest points, assuming Min will
choose best move to get lowest points
MINMAX AND CHESS
 With full tree, can determine best possible move
 However, full tree impossible for some games! Ex: Chess
 At a given time, chess has ~ 35 legal moves. Exponential
growth:
 35 at one ply, 352 = 1225 at two plies … 356 = 2 billion and
3510 = 2 quadrillion
 Games can last 40 moves (or more), so 3540 … Stars in
universe: ~ 228
 For large games (Chess) can’t see end of the game. Must
estimate winning or losing from top portion
 Evaluate() function to guess end given board
 A numeric value, much smaller than victory (ie-
Checkmate for Max will be one million, for Min minus one
million)
 So, computer’s strength at chess comes from:
 How deep can search
 How well can evaluate a board position
 (In some sense, like a human – a chess grand master can
evaluate board better and can look further ahead)
GAME TREE (2-PLAYER,
DETERMINISTIC, TURNS)

How do we search this tree to find the optimal move?


SEARCH VERSUS GAMES
 Search – no adversary
 Solution is (heuristic) method for finding goal
 Heuristics and CSP techniques can find optimal solution
 Evaluation function: estimate of cost from start to goal through given node
 Examples: path planning, scheduling activities

 Games – adversary
 Solution is strategy
 strategy specifies move for every possible opponent reply.

 Time limits force an approximate solution


 Evaluation function: evaluate “goodness” of game position

Examples: chess, checkers, Othello, backgammon
GAMES AS SEARCH
 Two players: MAX and MIN

 MAX moves first and they take turns until the game is over
 Winner gets reward, loser gets penalty.
 “Zero sum” means the sum of the reward and the penalty is a constant.

 Formal definition as a search problem:


 Initial state: Set-up specified by the rules, e.g., initial board configuration
of chess.
 Player(s): Defines which player has the move in a state.
 Actions(s): Returns the set of legal moves in a state.
 Result(s,a): Transition model defines the result of a move.
 (2nd ed.: Successor function: list of (move,state) pairs specifying legal
moves.)
 Terminal-Test(s): Is the game finished? True if finished, false otherwise.
 Utility function(s,p): Gives numerical value of terminal state s for player
p.
 E.g., win (+1), lose (-1), and draw (0) in tic-tac-toe.

 E.g., win (+1), lose (0), and draw (1/2) in chess.

 MAX uses search tree to determine next move.


AN OPTIMAL PROCEDURE:
THE MIN-MAX METHOD
Designed to find the optimal strategy for Max and find
best move:

 1. Generate the whole game tree, down to the leaves.

 2. Apply utility (payoff) function to each leaf.

 3. Back-up values from leaves through branch nodes:


 a Max node computes the Max of its child values
 a Min node computes the Min of its child values

 4. At root: choose the move leading to the child of


highest value.
GAME TREES
TWO-PLY GAME TREE
TWO-PLY GAME TREE
Minimax maximizes the utility for the worst-case outcome for max

The minimax decision


THE MINIMAX ALGORITHM

Mini-max algorithm is a recursive or backtracking


algorithm which is used in decision-making and game
theory.

It provides an optimal move for the player assuming


that opponent is also playing optimally.

Mini-Max algorithm uses recursion to search through


the game-tree.
Min-Max algorithm is mostly used for game playing in
AI.

Such as Chess, Checkers, tic-tac-toe, go, and various


two-players game. This Algorithm computes the
minimax decision for the current state.
In this algorithm two players play the game, one is called
MAX and other is called MIN.

Both the players fight it as the opponent player gets the


minimum benefit while they get the maximum benefit.

Both Players of the game are opponent of each other,


where MAX will select the maximized value and MIN will
select the minimized value.

The minimax algorithm performs a depth-first search


algorithm for the exploration of the complete game tree.

The minimax algorithm proceeds all the way down to the


terminal node of the tree, then backtrack the tree as the
recursion.
function MINIMAX-DECISION(state) returns an
action
return argmax
a ∈ ACTIONS(s) MIN-VALUE(RESULT(state, a))
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return
UTILITY(state)
v ←−∞
for each a in ACTIONS(state) do
v ←MAX(v, MIN-VALUE(RESULT(s, a)))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return
UTILITY(state)
v←∞
for each a in ACTIONS(state) do
v ←MIN(v, MAX-VALUE(RESULT(s, a)))
return v
LIMITATION OF THE MINIMAX ALGORITHM:

The main drawback of the minimax algorithm is


that it gets really slow for complex games such
as Chess, go, etc.
This type of games has a huge branching factor,
and the player has lots of choices to decide.
This limitation of the minimax algorithm can be
improved from alpha-beta pruning
CODE
[Link]
THREE PLIES
ALPHA–BETA PRUNING
•Alpha-beta pruning is a modified version of
the minimax algorithm.

• It is an optimization technique for the


minimax algorithm.

•game tree we can compute the correct


minimax decision, and this technique is
called pruning.
ALPHA–BETA PRUNING
The two-parameter can be defined as:
Alpha: The best (highest-value)
choice we have found so far at any
point along the path of Maximizer. The
initial value of alpha is -∞.
Beta: The best (lowest-value) choice
we have found so far at any point
along the path of Minimizer. The initial
value of beta is +∞.
ALPHA-BETA ALGORITHM
 Depth first search
 only considers nodes along a single path from root at
any time

a = highest-value choice found at any choice point of path


for MAX
(initially, a = −infinity)
b = lowest-value choice found at any choice point of path
for MIN
(initially,  = +infinity)

 Pass current values of a and b down to child nodes during


search.
 Update values of a and b during search:
 MAX updates  at MAX nodes
 MIN updates  at MIN nodes
WHEN TO PRUNE

 Prune whenever  ≥ .

 Prune below a Max node whose alpha value becomes


greater than or equal to the beta value of its ancestors.
 Max nodes update alpha based on children’s

returned values.

 Prune below a Min node whose beta value becomes less


than or equal to the alpha value of its ancestors.
 Min nodes update beta based on children’s

returned values.
ALPHA-BETA EXAMPLE
REVISITED
Do DF-search until first leaf
, , initial values
=−
 =+

, , passed to kids
=−
 =+
ALPHA-BETA EXAMPLE (CONTINUED)

=−
 =+

=−
 =3

MIN updates , based on kids


ALPHA-BETA EXAMPLE (CONTINUED)

=−
 =+

=−
 =3

MIN updates , based on kids.


No change.
ALPHA-BETA EXAMPLE (CONTINUED)

MAX updates , based on kids.


=3
 =+

3 is returned
as node value.
ALPHA-BETA EXAMPLE (CONTINUED)

=3
 =+

, , passed to kids
=3
 =+
ALPHA-BETA EXAMPLE (CONTINUED)

=3
 =+

MIN updates ,
based on kids.
=3
 =2
ALPHA-BETA EXAMPLE (CONTINUED)

=3
 =+

=3  ≥ ,
 =2 so prune.
ALPHA-BETA EXAMPLE (CONTINUED)
MAX updates , based on kids.
No change. =3
 =+

2 is returned
as node value.
ALPHA-BETA EXAMPLE (CONTINUED)

=3
 =+ ,

, , passed to kids

=3
 =+
ALPHA-BETA EXAMPLE (CONTINUED)

=3
 =+ ,
MIN updates ,
based on kids.
=3
 =14
ALPHA-BETA EXAMPLE (CONTINUED)

=3
 =+ ,
MIN updates ,
based on kids.
=3
 =5
ALPHA-BETA EXAMPLE (CONTINUED)

=3
 =+ 2 is returned
as node value.

2
ALPHA-BETA EXAMPLE (CONTINUED)

Max calculates the


same node value, and
makes the same move!

2
EFFECTIVENESS OF ALPHA-BETA
SEARCH
 Worst-Case
 branches are ordered so that no pruning takes place. In this case
alpha-beta gives no improvement over exhaustive search

 Best-Case
 each player’s best move is the left-most child (i.e., evaluated first)
 in practice, performance is closer to best rather than worst-case
 E.g., sort moves by the remembered move values found last time.
 E.g., expand captures first, then threats, then forward moves, etc.
 E.g., run Iterative Deepening search, sort by value last iteration.

 In practice often get O(b(d/2)) rather than O(bd)


 this is the same as having a branching factor of sqrt(b),
 (sqrt(b))d = b(d/2),i.e., we effectively go from b to square root of

b
 e.g., in chess go from b ~ 35 to b ~ 6
 this permits much deeper search in the same amount of time
FINAL COMMENTS ABOUT ALPHA-BETA PRUNING
 Pruning does not affect final results

 Entire subtrees can be pruned.

 Good move ordering improves effectiveness of pruning

 Repeated states are again possible.


 Store them in memory = transposition table
MONTE CARLO TREE
SEARCH
Monte Carlo Tree Search (MCTS) is a
search technique in the field of Artificial
Intelligence (AI).

It is a probabilistic and heuristic driven


search algorithm that combines the
classic tree search implementations
alongside machine learning principles of
reinforcement learning.
MCTS algorithm becomes useful as it
continues to evaluate other alternatives
periodically during the learning phase by
executing them, instead of the current
perceived optimal strategy. This is known as
the ” exploration-exploitation trade-off “.

Search can be broken down into four distinct steps, viz.,


1. selection,
[Link],
[Link] and
4. backpropagation.
•the MCTS algorithm traverses the
current tree from the root node using a
specific strategy.
•The strategy uses an evaluation
function to optimally select nodes with
the highest estimated value.
•MCTS uses the Upper Confidence
Bound (UCB) formula applied to trees
as the strategy in the selection process
to traverse the tree.
where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection
process, the child node that returns the
greatest value from the above equation will
be one that will get selected.
Expansion: In this process, a new child node
is added to the tree to that node which was
optimally reached during the selection
process.

Simulation: In this process, a simulation is


performed by choosing moves or strategies
until a result or predefined state is achieved.

Backpropagation: After determining the


value of the newly added node, the remaining
tree must be updated. So, the
backpropagation process is performed, where
it backpropagates from the new node to the
root node.
MONTE CARLO TREE SEARCH IS A
METHOD USUALLY USED IN GAMES TO
PREDICT THE PATH (MOVES) THAT
SHOULD BE TAKEN BY THE POLICY TO
REACH THE FINAL WINNING SOLUTION.
These types of algorithms are particularly
useful in turn based games where there is no
element of chance in the game mechanics,
such as Tic Tac Toe, Connect 4, Checkers,
Chess, Go, etc.
STOCHASTIC GAMES
Many games mirror this
unpredictability by including a random
element, such as the throwing of dice.
We call these stochastic games.
Backgammon is a typical game that combines luck
and skill.
 Dice are rolled at the beginning of a player’s turn to
determine the legal moves.
In the backgammon ,for example, White has rolled a 6–5
and has four possible moves.
P(1,1)=1/36 (36 are ways to roll two dice.)
15 distinct roll each have 1/18 probability
THE STATE OF PLAY

 Checkers:
 Chinook ended 40-year-reign of human world champion
Marion Tinsley in 1994.

 Chess:
 Deep Blue defeated human world champion Garry Kasparov
in a six-game match in 1997.

 Othello:
 human champions refuse to compete against computers:
they are too good.

 Go:
 human champions refuse to compete against computers:
they are too bad
 b > 300 (!)

 See (e.g.) [Link] for more information


DEEP BLUE
 1957: Herbert Simon
 “within 10 years a computer will beat the world chess
champion”

 1997: Deep Blue beats Kasparov

 Parallel machine with 30 processors for “software” and 480


VLSI processors for “hardware search”

 Searched 126 million nodes per second on average


 Generated up to 30 billion positions per move
 Reached depth 14 routinely

 Uses iterative-deepening alpha-beta search with


transpositioning
 Can explore beyond depth-limit for interesting moves
CONSTRAINT
SATISFACTION
PROBLEM
CSP
 Many problems in AI can be considered as
problems of constraint satisfaction, in which the
goal state satisfies a given set of constraint.
 constraint satisfaction problems can be solved
by using any of the search strategies.
 A constraint satisfaction problem (CSP) is
a problem that requires its solution to be
within some limitations or conditions, also
known as constraints, consisting of a finite
variable set, a domain set and a
finite constraint set. ... The optimal solution
should satisfy all constraints.
EXAMPLE: MAP-COLORING

 Variables WA, NT, Q, NSW, V, SA, T

 Domains Di = {red,green,blue}

 Constraints: adjacent regions must have different colors


 e.g., WA ≠ NT

63
EXAMPLE: MAP-COLORING

 Solutions are complete and consistent


assignments, e.g., WA = red, NT = green,Q =
red,NSW = green,V = red,SA = blue,T =
green
64
CONSTRAINT GRAPH
 Binary CSP: each constraint relates two
variables
 Constraint graph: nodes are variables, arcs
are constraints

65
BACKTRACKING EXAMPLE

66
BACKTRACKING EXAMPLE

67
BACKTRACKING EXAMPLE

68
BACKTRACKING EXAMPLE

69
IMPROVING BACKTRACKING
EFFICIENCY
 General-purpose methods can give huge
gains in speed:
 Which variable should be assigned next?
 In what order should its values be tried?
 Can we detect inevitable failure early?

70
MOST CONSTRAINED
VARIABLE
 Most constrained variable:
choose the variable with the fewest legal
values

 a.k.a. minimum remaining values (MRV)


heuristic
 Picks a variable which will cause failure
as soon as possible, allowing the tree to
be pruned.

71
MOST CONSTRAINING
VARIABLE
 Tie-breaker among most constrained
variables

 Most constraining variable:


 choosethe variable with the most
constraints on remaining variables (most
edges in graph)

72
LEAST CONSTRAINING
VALUE
 Given a variable, choose the least
constraining value:
 theone that rules out the fewest values in
the remaining variables

 Leaves maximal flexibility for a solution.


 Combining these heuristics makes 1000
queens feasible

73
FORWARD CHECKING
 Idea:
 Keep track of remaining legal values for
unassigned variables
 Terminate search when any variable has no legal
values

74
FORWARD CHECKING
 Idea:
 Keep track of remaining legal values for
unassigned variables
 Terminate search when any variable has no legal
values

75
FORWARD CHECKING
 Idea:
 Keep track of remaining legal values for
unassigned variables
 Terminate search when any variable has no legal
values

76
FORWARD CHECKING
 Idea:
 Keep track of remaining legal values for
unassigned variables
 Terminate search when any variable has no legal
values

77
CONSTRAINT
PROPAGATION
 Forward checking propagates information
from assigned to unassigned variables, but
doesn't provide early detection for all
failures:

 NT and SA cannot both be blue!


 Constraint propagation repeatedly enforces
constraints locally
78
CSP
ARC CONSISTENCY
 Simplest form of propagation makes each
arc consistent
 X Y is consistent iff
for every value x of X there is some allowed y

constraint propagation propagates arc consistency on the graph.

80
ARC CONSISTENCY
 Simplest form of propagation makes each
arc consistent
 X Y is consistent iff
for every value x of X there is some allowed y

81
ARC CONSISTENCY
 Simplest form of propagation makes each arc
consistent
 X Y is consistent iff
for every value x of X there is some allowed y

 If X loses a value, neighbors of X need to be


rechecked
82
ARC CONSISTENCY
 Simplest form of propagation makes each arc
consistent
 X Y is consistent iff
for every value x of X there is some allowed y

 If X loses a value, neighbors of X need to be rechecked


 Arc consistency detects failure earlier than forward
checking
 Can be run as a preprocessor or after each assignment

 Time complexity: O(n2d3)


83
84
JUNCTION TREE
DECOMPOSITIONS

85
LOCAL SEARCH FOR CSPS
 Note: The path to the solution is
unimportant, so we can
apply local search!

 To apply to CSPs:
 allow states with unsatisfied constraints
 operators reassign variable values

 Variable selection: randomly select any


conflicted variable

 Value selection by min-conflicts heuristic:


 choose value that violates the fewest constraints
 i.e., hill-climb with h(n) = total number of violated
constraints
86
CRYPTARITHMETIC PROBLEM

 Cryptarithmetic Problem is a type of


constraint satisfaction problem where
the game is about digits and its unique
replacement either with alphabets or
other symbols. In cryptarithmetic
problem, the digits (0-9) get substituted
by some possible alphabets or symbols.
 The rules or constraints on a crypt
arithmetic problem are as follows:
 There should be a unique digit to be replaced
with a unique alphabet.
 The result should satisfy the predefined
arithmetic rules, i.e., 2+2 =4, nothing else.
 Digits should be from 0-9 only.
 There should be only one carry forward, while
performing the addition operation on a
problem.
 The problem can be solved from both sides,
i.e., lefthand side (L.H.S), or righthand
side (R.H.S)
 Given a cryptarithmetic problem, i.e.,

 Starting from the left hand side (L.H.S) , the


terms are S and M. Assign a digit which could
give a satisfactory result. Let’s assign S-
>9 and M->1.
 Now, move ahead to the next
terms E and O to get N as its output

Adding E and O, which means 5+0=0, which is not possible


because according to cryptarithmetic constraints, we cannot
assign the same digit to two letters. So, we need to think more and
assign some other value.
 Further, adding the next two
terms N and R we get,

But, we have already assigned E->5. Thus, the above result does
not satisfy the values
 where 1 will be carry forward to the
above term
 Let’s move ahead.
 Again, on adding the last two terms, i.e.,
the rightmost terms D and E, we
get Y as its result.
SOLVE IT
CRYPTARITHMETIC PUZZLES

TWO
+ TWO
FOUR
 We decided to look at the value of O again.
If O = 0, then R would also be 0 so that doesn’t work
and O can’t be 1 because F = 1.

If O = 2,

TW2
+TW2
 −−−−−−−
12UR

then R = 4 and T = 6 and we also know that W < 5


because there can’t be anything carried to the
hundreds column. The only possible value of W that
hasn’t already been used is 3 but this would mean
that U is 6 which is the same as T.
 If O = 3,

TW3
+TW3
 −−−−−−−
13UR

then R = 6 and T = 6 which doesn’t


work.
 If O = 4,

TW4
+TW4
 −−−−−−−
14UR

then R = 8 and T = 7 and we also know that W <


5 because there can’t be anything carried to the
hundreds column. So W could be 0, 2 or 3.

 W can’t be 0 because then U would be 0 and it


can’t be 2 because U would be 4.
If W = 3, U = 6 which works: 734 + 734 = 1468.
 If O = 5,

TW5
+TW5−−−−−−−
15UR−−−−−−−
11

then R = 0 and T = 7 and we also know that


W ≥ 5 because there has to be 1 carried to
the hundreds column.

W can’t be 5 because O = 5.
If W = 6, U = 3 which works: 765 + 765 =
1530.
 So there are seven possible answers:
938+938=1876
928+928=1856
867+867=1734
846+846=1692
836+836=1672
765+765=1530
734+734=1468
SUMMARY
 Game playing is best modeled as a search problem

 Game trees represent alternate computer/opponent moves

 Evaluation functions estimate the quality of a given board


configuration for the Max player.

 Minimax is a procedure which chooses moves by assuming that


the opponent will always choose the move which is best for
them

 Alpha-Beta is a procedure which can prune large parts of the


search tree and allow search to go deeper

 For many well-known games, computer algorithms based on


heuristic search match or out-perform human world experts.
SPPU QUESTIONS
 Comment on Backtracking and look
ahead strategies (forward)in constraint
 satisfaction problems. [6]
 Apply crypt arithmetic to solve the
problem and represent the state search
space to solve ,TWO+TWO=FOUR
(OCT2019)

You might also like