UNIT III
ADVERSARIAL
SEARCH AND
GAMES
CONTENTS…
Game Theory,
Optimal Decisions in Games,
Heuristic Alpha–Beta Tree Search,
Monte Carlo Tree Search,
Stochastic Games, Partially Observable
Games,
Limitations of Game Search Algorithms,
Constraint Satisfaction Problems (CSP),
Constraint Propagation: Inference in
CSPs, Backtracking Search for CSPs.
Many applications for AI
Computer vision, natural language processing,
speech recognition, search …
But games are some of the more interesting
Opponents that are challenging, or allies
that are helpful
Unit that is credited with acting on own
Human-level intelligence too hard
But under narrow circumstances can do pretty
well (ex: chess and Deep Blue)
For many games, often constrained (by game
rules)
we cover competitive environments,
in which the agents’
goals are in conflict, giving rise GAME to
adversarial search problems—often
known as games.
MINMAX - OVERVIEW
MinMax the heart of almost every computer
board game
Applies to games where:
Playerstake turns
Have perfect information
Chess, Checkers, Tactics
But can work for games without perfect
information or chance
Poker, Monopoly, Dice
Can work in real-time (ie- not turn based) with
timer (iterative deepening, later)
MINMAX - OVERVIEW
Search tree
Squares represent decision states (ie- after a move)
Branches are decisions (ie- the move)
Start at root
Nodes at end are leaf nodes
Ex: Tic-Tac-Toe (symmetrical positions removed)
• Unlike binary trees can have any number of children
– Depends on the game situation
• Levels usually called plies (a ply is one level)
– Each ply is where "turn" switches to other player
• Players called Min and Max (next)
MAXMIN - ALGORITHM
Named MinMax because of algorithm
behind data structure
Assign points to the outcome of a game
Ex:Tic-Tac-Toe: X wins, value of 1. O wins,
value -1.
Max (X) tries to maximize point value, while
Min (O) tries to minimize point value
Assume both players play to best of their
ability
Always make a move to minimize or maximize
points
So, in choosing, Max will choose best move
to get highest points, assuming Min will
choose best move to get lowest points
MINMAX AND CHESS
With full tree, can determine best possible move
However, full tree impossible for some games! Ex: Chess
At a given time, chess has ~ 35 legal moves. Exponential
growth:
35 at one ply, 352 = 1225 at two plies … 356 = 2 billion and
3510 = 2 quadrillion
Games can last 40 moves (or more), so 3540 … Stars in
universe: ~ 228
For large games (Chess) can’t see end of the game. Must
estimate winning or losing from top portion
Evaluate() function to guess end given board
A numeric value, much smaller than victory (ie-
Checkmate for Max will be one million, for Min minus one
million)
So, computer’s strength at chess comes from:
How deep can search
How well can evaluate a board position
(In some sense, like a human – a chess grand master can
evaluate board better and can look further ahead)
GAME TREE (2-PLAYER,
DETERMINISTIC, TURNS)
How do we search this tree to find the optimal move?
SEARCH VERSUS GAMES
Search – no adversary
Solution is (heuristic) method for finding goal
Heuristics and CSP techniques can find optimal solution
Evaluation function: estimate of cost from start to goal through given node
Examples: path planning, scheduling activities
Games – adversary
Solution is strategy
strategy specifies move for every possible opponent reply.
Time limits force an approximate solution
Evaluation function: evaluate “goodness” of game position
Examples: chess, checkers, Othello, backgammon
GAMES AS SEARCH
Two players: MAX and MIN
MAX moves first and they take turns until the game is over
Winner gets reward, loser gets penalty.
“Zero sum” means the sum of the reward and the penalty is a constant.
Formal definition as a search problem:
Initial state: Set-up specified by the rules, e.g., initial board configuration
of chess.
Player(s): Defines which player has the move in a state.
Actions(s): Returns the set of legal moves in a state.
Result(s,a): Transition model defines the result of a move.
(2nd ed.: Successor function: list of (move,state) pairs specifying legal
moves.)
Terminal-Test(s): Is the game finished? True if finished, false otherwise.
Utility function(s,p): Gives numerical value of terminal state s for player
p.
E.g., win (+1), lose (-1), and draw (0) in tic-tac-toe.
E.g., win (+1), lose (0), and draw (1/2) in chess.
MAX uses search tree to determine next move.
AN OPTIMAL PROCEDURE:
THE MIN-MAX METHOD
Designed to find the optimal strategy for Max and find
best move:
1. Generate the whole game tree, down to the leaves.
2. Apply utility (payoff) function to each leaf.
3. Back-up values from leaves through branch nodes:
a Max node computes the Max of its child values
a Min node computes the Min of its child values
4. At root: choose the move leading to the child of
highest value.
GAME TREES
TWO-PLY GAME TREE
TWO-PLY GAME TREE
Minimax maximizes the utility for the worst-case outcome for max
The minimax decision
THE MINIMAX ALGORITHM
Mini-max algorithm is a recursive or backtracking
algorithm which is used in decision-making and game
theory.
It provides an optimal move for the player assuming
that opponent is also playing optimally.
Mini-Max algorithm uses recursion to search through
the game-tree.
Min-Max algorithm is mostly used for game playing in
AI.
Such as Chess, Checkers, tic-tac-toe, go, and various
two-players game. This Algorithm computes the
minimax decision for the current state.
In this algorithm two players play the game, one is called
MAX and other is called MIN.
Both the players fight it as the opponent player gets the
minimum benefit while they get the maximum benefit.
Both Players of the game are opponent of each other,
where MAX will select the maximized value and MIN will
select the minimized value.
The minimax algorithm performs a depth-first search
algorithm for the exploration of the complete game tree.
The minimax algorithm proceeds all the way down to the
terminal node of the tree, then backtrack the tree as the
recursion.
function MINIMAX-DECISION(state) returns an
action
return argmax
a ∈ ACTIONS(s) MIN-VALUE(RESULT(state, a))
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return
UTILITY(state)
v ←−∞
for each a in ACTIONS(state) do
v ←MAX(v, MIN-VALUE(RESULT(s, a)))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return
UTILITY(state)
v←∞
for each a in ACTIONS(state) do
v ←MIN(v, MAX-VALUE(RESULT(s, a)))
return v
LIMITATION OF THE MINIMAX ALGORITHM:
The main drawback of the minimax algorithm is
that it gets really slow for complex games such
as Chess, go, etc.
This type of games has a huge branching factor,
and the player has lots of choices to decide.
This limitation of the minimax algorithm can be
improved from alpha-beta pruning
CODE
[Link]
THREE PLIES
ALPHA–BETA PRUNING
•Alpha-beta pruning is a modified version of
the minimax algorithm.
• It is an optimization technique for the
minimax algorithm.
•game tree we can compute the correct
minimax decision, and this technique is
called pruning.
ALPHA–BETA PRUNING
The two-parameter can be defined as:
Alpha: The best (highest-value)
choice we have found so far at any
point along the path of Maximizer. The
initial value of alpha is -∞.
Beta: The best (lowest-value) choice
we have found so far at any point
along the path of Minimizer. The initial
value of beta is +∞.
ALPHA-BETA ALGORITHM
Depth first search
only considers nodes along a single path from root at
any time
a = highest-value choice found at any choice point of path
for MAX
(initially, a = −infinity)
b = lowest-value choice found at any choice point of path
for MIN
(initially, = +infinity)
Pass current values of a and b down to child nodes during
search.
Update values of a and b during search:
MAX updates at MAX nodes
MIN updates at MIN nodes
WHEN TO PRUNE
Prune whenever ≥ .
Prune below a Max node whose alpha value becomes
greater than or equal to the beta value of its ancestors.
Max nodes update alpha based on children’s
returned values.
Prune below a Min node whose beta value becomes less
than or equal to the alpha value of its ancestors.
Min nodes update beta based on children’s
returned values.
ALPHA-BETA EXAMPLE
REVISITED
Do DF-search until first leaf
, , initial values
=−
=+
, , passed to kids
=−
=+
ALPHA-BETA EXAMPLE (CONTINUED)
=−
=+
=−
=3
MIN updates , based on kids
ALPHA-BETA EXAMPLE (CONTINUED)
=−
=+
=−
=3
MIN updates , based on kids.
No change.
ALPHA-BETA EXAMPLE (CONTINUED)
MAX updates , based on kids.
=3
=+
3 is returned
as node value.
ALPHA-BETA EXAMPLE (CONTINUED)
=3
=+
, , passed to kids
=3
=+
ALPHA-BETA EXAMPLE (CONTINUED)
=3
=+
MIN updates ,
based on kids.
=3
=2
ALPHA-BETA EXAMPLE (CONTINUED)
=3
=+
=3 ≥ ,
=2 so prune.
ALPHA-BETA EXAMPLE (CONTINUED)
MAX updates , based on kids.
No change. =3
=+
2 is returned
as node value.
ALPHA-BETA EXAMPLE (CONTINUED)
=3
=+ ,
, , passed to kids
=3
=+
ALPHA-BETA EXAMPLE (CONTINUED)
=3
=+ ,
MIN updates ,
based on kids.
=3
=14
ALPHA-BETA EXAMPLE (CONTINUED)
=3
=+ ,
MIN updates ,
based on kids.
=3
=5
ALPHA-BETA EXAMPLE (CONTINUED)
=3
=+ 2 is returned
as node value.
2
ALPHA-BETA EXAMPLE (CONTINUED)
Max calculates the
same node value, and
makes the same move!
2
EFFECTIVENESS OF ALPHA-BETA
SEARCH
Worst-Case
branches are ordered so that no pruning takes place. In this case
alpha-beta gives no improvement over exhaustive search
Best-Case
each player’s best move is the left-most child (i.e., evaluated first)
in practice, performance is closer to best rather than worst-case
E.g., sort moves by the remembered move values found last time.
E.g., expand captures first, then threats, then forward moves, etc.
E.g., run Iterative Deepening search, sort by value last iteration.
In practice often get O(b(d/2)) rather than O(bd)
this is the same as having a branching factor of sqrt(b),
(sqrt(b))d = b(d/2),i.e., we effectively go from b to square root of
b
e.g., in chess go from b ~ 35 to b ~ 6
this permits much deeper search in the same amount of time
FINAL COMMENTS ABOUT ALPHA-BETA PRUNING
Pruning does not affect final results
Entire subtrees can be pruned.
Good move ordering improves effectiveness of pruning
Repeated states are again possible.
Store them in memory = transposition table
MONTE CARLO TREE
SEARCH
Monte Carlo Tree Search (MCTS) is a
search technique in the field of Artificial
Intelligence (AI).
It is a probabilistic and heuristic driven
search algorithm that combines the
classic tree search implementations
alongside machine learning principles of
reinforcement learning.
MCTS algorithm becomes useful as it
continues to evaluate other alternatives
periodically during the learning phase by
executing them, instead of the current
perceived optimal strategy. This is known as
the ” exploration-exploitation trade-off “.
Search can be broken down into four distinct steps, viz.,
1. selection,
[Link],
[Link] and
4. backpropagation.
•the MCTS algorithm traverses the
current tree from the root node using a
specific strategy.
•The strategy uses an evaluation
function to optimally select nodes with
the highest estimated value.
•MCTS uses the Upper Confidence
Bound (UCB) formula applied to trees
as the strategy in the selection process
to traverse the tree.
where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection
process, the child node that returns the
greatest value from the above equation will
be one that will get selected.
Expansion: In this process, a new child node
is added to the tree to that node which was
optimally reached during the selection
process.
Simulation: In this process, a simulation is
performed by choosing moves or strategies
until a result or predefined state is achieved.
Backpropagation: After determining the
value of the newly added node, the remaining
tree must be updated. So, the
backpropagation process is performed, where
it backpropagates from the new node to the
root node.
MONTE CARLO TREE SEARCH IS A
METHOD USUALLY USED IN GAMES TO
PREDICT THE PATH (MOVES) THAT
SHOULD BE TAKEN BY THE POLICY TO
REACH THE FINAL WINNING SOLUTION.
These types of algorithms are particularly
useful in turn based games where there is no
element of chance in the game mechanics,
such as Tic Tac Toe, Connect 4, Checkers,
Chess, Go, etc.
STOCHASTIC GAMES
Many games mirror this
unpredictability by including a random
element, such as the throwing of dice.
We call these stochastic games.
Backgammon is a typical game that combines luck
and skill.
Dice are rolled at the beginning of a player’s turn to
determine the legal moves.
In the backgammon ,for example, White has rolled a 6–5
and has four possible moves.
P(1,1)=1/36 (36 are ways to roll two dice.)
15 distinct roll each have 1/18 probability
THE STATE OF PLAY
Checkers:
Chinook ended 40-year-reign of human world champion
Marion Tinsley in 1994.
Chess:
Deep Blue defeated human world champion Garry Kasparov
in a six-game match in 1997.
Othello:
human champions refuse to compete against computers:
they are too good.
Go:
human champions refuse to compete against computers:
they are too bad
b > 300 (!)
See (e.g.) [Link] for more information
DEEP BLUE
1957: Herbert Simon
“within 10 years a computer will beat the world chess
champion”
1997: Deep Blue beats Kasparov
Parallel machine with 30 processors for “software” and 480
VLSI processors for “hardware search”
Searched 126 million nodes per second on average
Generated up to 30 billion positions per move
Reached depth 14 routinely
Uses iterative-deepening alpha-beta search with
transpositioning
Can explore beyond depth-limit for interesting moves
CONSTRAINT
SATISFACTION
PROBLEM
CSP
Many problems in AI can be considered as
problems of constraint satisfaction, in which the
goal state satisfies a given set of constraint.
constraint satisfaction problems can be solved
by using any of the search strategies.
A constraint satisfaction problem (CSP) is
a problem that requires its solution to be
within some limitations or conditions, also
known as constraints, consisting of a finite
variable set, a domain set and a
finite constraint set. ... The optimal solution
should satisfy all constraints.
EXAMPLE: MAP-COLORING
Variables WA, NT, Q, NSW, V, SA, T
Domains Di = {red,green,blue}
Constraints: adjacent regions must have different colors
e.g., WA ≠ NT
63
EXAMPLE: MAP-COLORING
Solutions are complete and consistent
assignments, e.g., WA = red, NT = green,Q =
red,NSW = green,V = red,SA = blue,T =
green
64
CONSTRAINT GRAPH
Binary CSP: each constraint relates two
variables
Constraint graph: nodes are variables, arcs
are constraints
65
BACKTRACKING EXAMPLE
66
BACKTRACKING EXAMPLE
67
BACKTRACKING EXAMPLE
68
BACKTRACKING EXAMPLE
69
IMPROVING BACKTRACKING
EFFICIENCY
General-purpose methods can give huge
gains in speed:
Which variable should be assigned next?
In what order should its values be tried?
Can we detect inevitable failure early?
70
MOST CONSTRAINED
VARIABLE
Most constrained variable:
choose the variable with the fewest legal
values
a.k.a. minimum remaining values (MRV)
heuristic
Picks a variable which will cause failure
as soon as possible, allowing the tree to
be pruned.
71
MOST CONSTRAINING
VARIABLE
Tie-breaker among most constrained
variables
Most constraining variable:
choosethe variable with the most
constraints on remaining variables (most
edges in graph)
72
LEAST CONSTRAINING
VALUE
Given a variable, choose the least
constraining value:
theone that rules out the fewest values in
the remaining variables
Leaves maximal flexibility for a solution.
Combining these heuristics makes 1000
queens feasible
73
FORWARD CHECKING
Idea:
Keep track of remaining legal values for
unassigned variables
Terminate search when any variable has no legal
values
74
FORWARD CHECKING
Idea:
Keep track of remaining legal values for
unassigned variables
Terminate search when any variable has no legal
values
75
FORWARD CHECKING
Idea:
Keep track of remaining legal values for
unassigned variables
Terminate search when any variable has no legal
values
76
FORWARD CHECKING
Idea:
Keep track of remaining legal values for
unassigned variables
Terminate search when any variable has no legal
values
77
CONSTRAINT
PROPAGATION
Forward checking propagates information
from assigned to unassigned variables, but
doesn't provide early detection for all
failures:
NT and SA cannot both be blue!
Constraint propagation repeatedly enforces
constraints locally
78
CSP
ARC CONSISTENCY
Simplest form of propagation makes each
arc consistent
X Y is consistent iff
for every value x of X there is some allowed y
constraint propagation propagates arc consistency on the graph.
80
ARC CONSISTENCY
Simplest form of propagation makes each
arc consistent
X Y is consistent iff
for every value x of X there is some allowed y
81
ARC CONSISTENCY
Simplest form of propagation makes each arc
consistent
X Y is consistent iff
for every value x of X there is some allowed y
If X loses a value, neighbors of X need to be
rechecked
82
ARC CONSISTENCY
Simplest form of propagation makes each arc
consistent
X Y is consistent iff
for every value x of X there is some allowed y
If X loses a value, neighbors of X need to be rechecked
Arc consistency detects failure earlier than forward
checking
Can be run as a preprocessor or after each assignment
Time complexity: O(n2d3)
83
84
JUNCTION TREE
DECOMPOSITIONS
85
LOCAL SEARCH FOR CSPS
Note: The path to the solution is
unimportant, so we can
apply local search!
To apply to CSPs:
allow states with unsatisfied constraints
operators reassign variable values
Variable selection: randomly select any
conflicted variable
Value selection by min-conflicts heuristic:
choose value that violates the fewest constraints
i.e., hill-climb with h(n) = total number of violated
constraints
86
CRYPTARITHMETIC PROBLEM
Cryptarithmetic Problem is a type of
constraint satisfaction problem where
the game is about digits and its unique
replacement either with alphabets or
other symbols. In cryptarithmetic
problem, the digits (0-9) get substituted
by some possible alphabets or symbols.
The rules or constraints on a crypt
arithmetic problem are as follows:
There should be a unique digit to be replaced
with a unique alphabet.
The result should satisfy the predefined
arithmetic rules, i.e., 2+2 =4, nothing else.
Digits should be from 0-9 only.
There should be only one carry forward, while
performing the addition operation on a
problem.
The problem can be solved from both sides,
i.e., lefthand side (L.H.S), or righthand
side (R.H.S)
Given a cryptarithmetic problem, i.e.,
Starting from the left hand side (L.H.S) , the
terms are S and M. Assign a digit which could
give a satisfactory result. Let’s assign S-
>9 and M->1.
Now, move ahead to the next
terms E and O to get N as its output
Adding E and O, which means 5+0=0, which is not possible
because according to cryptarithmetic constraints, we cannot
assign the same digit to two letters. So, we need to think more and
assign some other value.
Further, adding the next two
terms N and R we get,
But, we have already assigned E->5. Thus, the above result does
not satisfy the values
where 1 will be carry forward to the
above term
Let’s move ahead.
Again, on adding the last two terms, i.e.,
the rightmost terms D and E, we
get Y as its result.
SOLVE IT
CRYPTARITHMETIC PUZZLES
TWO
+ TWO
FOUR
We decided to look at the value of O again.
If O = 0, then R would also be 0 so that doesn’t work
and O can’t be 1 because F = 1.
If O = 2,
TW2
+TW2
−−−−−−−
12UR
then R = 4 and T = 6 and we also know that W < 5
because there can’t be anything carried to the
hundreds column. The only possible value of W that
hasn’t already been used is 3 but this would mean
that U is 6 which is the same as T.
If O = 3,
TW3
+TW3
−−−−−−−
13UR
then R = 6 and T = 6 which doesn’t
work.
If O = 4,
TW4
+TW4
−−−−−−−
14UR
then R = 8 and T = 7 and we also know that W <
5 because there can’t be anything carried to the
hundreds column. So W could be 0, 2 or 3.
W can’t be 0 because then U would be 0 and it
can’t be 2 because U would be 4.
If W = 3, U = 6 which works: 734 + 734 = 1468.
If O = 5,
TW5
+TW5−−−−−−−
15UR−−−−−−−
11
then R = 0 and T = 7 and we also know that
W ≥ 5 because there has to be 1 carried to
the hundreds column.
W can’t be 5 because O = 5.
If W = 6, U = 3 which works: 765 + 765 =
1530.
So there are seven possible answers:
938+938=1876
928+928=1856
867+867=1734
846+846=1692
836+836=1672
765+765=1530
734+734=1468
SUMMARY
Game playing is best modeled as a search problem
Game trees represent alternate computer/opponent moves
Evaluation functions estimate the quality of a given board
configuration for the Max player.
Minimax is a procedure which chooses moves by assuming that
the opponent will always choose the move which is best for
them
Alpha-Beta is a procedure which can prune large parts of the
search tree and allow search to go deeper
For many well-known games, computer algorithms based on
heuristic search match or out-perform human world experts.
SPPU QUESTIONS
Comment on Backtracking and look
ahead strategies (forward)in constraint
satisfaction problems. [6]
Apply crypt arithmetic to solve the
problem and represent the state search
space to solve ,TWO+TWO=FOUR
(OCT2019)