Artificial Intelligence
Artificial Intelligence
Example : Suppose that the problem we want to solve deals with playing
and winning a game. This game could be a simple one such as tic-
tac-toe or a more complex one such as checkers, chess, or
backgammon
current state, thereby creating a new state. In our case, this would
be a player marking an empty space with either an X or an O.
The combination of the initial state and the set of operators make up
the state space of the problem. The sequence of states produced by the
valid application of operators from the initial state is called the path in
the state space.
State space search characterizes problem solving as the process of
finding a solution path form the start state to a goal
However, for many real-world problems the search space can grow very
large, so we need algorithms to deal with that.
3. Search Strategies
Breadth-First Search
Depth-First Search
The breadth-first algorithm spreads out in a uniform manner from the start node. From
the start, it looks at each node one edge away. Then it moves out from those nodes to
all nodes two edges away from the start. This continues until either the goal node is
found or the entire tree is searched.
م أحمد المحنا.م
الذكاء االصطناعي
الملزمة الثالثة المرحله الثالثة
Almustaqbal university college
Let’s walk through an example to see how breadth-first search could find a city on our
map.
1- Our search begins in Rochester (Start State), and we want to know if we can
get to Wausau (Goal State) from there.
2- The Rochester node is placed on the queue in step 1 in previous algorithm.
Next we enter our search loop at step 2. Queue=[ Rochester]
3- We remove Rochester, the first node from the queue. Rochester does not
contain our goal state (Wausau) so we expand it by taking each child node in
Rochester, and adding them to the back of the queue. Queue= [ Sioux Falls,
Minneapolis, LaCrosse, and Dubuque ].
4- We remove the first node from the queue (Sioux Falls) and test it to see if it is
our goal state. It is not, so we expand it, adding Fargo and Rochester to the
end of our queue, which now contains [Minneapolis, LaCrosse, Dubuque,
Fargo, and Rochester].
5- We remove Minneapolis, the goal test fails, and we expand that node, adding
St.Cloud, Wausau, Duluth, LaCrosse, and Rochester to the search queue,
now holding [ LaCrosse, Dubuque, Fargo, Rochester, St.Cloud, Wausau,
Duluth, LaCrosse, and Rochester ].
7- At this point, we have tested every node which is one level in the tree away
from the start node (Rochester). Our search queue contains the following
nodes: [Fargo, Rochester, St.Cloud, Wausau, Duluth, LaCrosse,
Rochester, Minneapolis, GreenBay, Madison, Dubuque, Rochester,
Rochester, LaCrosse, and Rockford].
8- We remove Fargo, which is two levels away from Rochester, and add Grand
Forks, St. Cloud, and Sioux Falls.
9- Then we test and expand Rochester (Rochester to Minneapolis to Rochester is
two levels away from our start). Next is St. Cloud; again we expand that node.
10- Finally, we get to Wausau; our goal test succeeds and we declare success.
11- Our search order was Rochester, Sioux Falls, Minneapolis, LaCrosse,
Dubuque, Fargo, Rochester, St. Cloud, and Wausau as shown below:-
Note that this trace could have been greatly simplified by keeping track of nodes
which had been tested and expanded. This would have cut down on our time and
space complexity.
Characteristics of Depth First Search
The depth-first algorithm searches from the start or root node all the way
down to a leaf node. If it does not find the goal node, it backtracks up the tree
and searches down the next untested path until it reaches the next leaf.
If you imagine a large tree, the depth-first algorithm may spend a large amount
of time searching the paths on the lower left when the answer is really in the
lower right.
Depth-first search is a brute-force method, it will blindly follow this search
pattern until it comes across a node containing the goal state, or it searches the
entire tree.
Depth-first search has lower memory requirements than breadth-first search,
It is neither complete nor optimal.
م أحمد المحنا.م
الذكاء االصطناعي
الملزمة الثالثة المرحله الثالثة
Almustaqbal university college
Example
Let’s walk through a simple example of how depth-first search would work if we
started in Rochester and wanted to see if we could get to Wausau.
1. Starting with Rochester, we test and expand it, placing Sioux Falls, then
Minneapolis, then LaCrosse, then Dubuque at the front of the search queue
Queue=[ Dubuque, LaCrosse, Minneapolis, Sioux Falls ].
2. We remove Dubuque and test it; it fails, so we expand it adding Rochester to
the front, then LaCrosse, then Rockford. Our search queue now looks like
[Rockford, LaCrosse, Rochester, LaCrosse, Minneapolis, Sioux Falls].
3. We remove Rockford, and add Dubuque, Madison, and Chicago to the front
of the queue in that order, yielding Queue=[ Chicago, Madison, Dubuque,
LaCrosse, Rochester, LaCrosse, Minneapolis, Sioux Falls].
4. We test Chicago, and place Rockford, and Milwaukee on the queue.
5. We take Milwaukee from the front and add Chicago, Madison, and Green
Bay to the search queue. It is now Queue=[ Green Bay, Madison, Chicago,
Rockford, Chicago, Madison, Dubuque, LaCrosse, Rochester, LaCrosse,
Minneapolis, Sioux Falls].
6. We remove Green Bay and add Milwaukee, LaCrosse, and Wausau to the
queue in that order. Finally, Wausau is at the front of the queue and our goal
test succeeds and our search ends. Our search order was Rochester, Dubuque,
Rockford, Chicago, Milwaukee, Green Bay, and Wausau.
م أحمد المحنا.م
الذكاء االصطناعي
الملزمة الثالثة المرحله الثالثة
Almustaqbal university college
In this example, we again did not prevent tested nodes from being added to the search
queue. As a result, we had duplicate nodes on the queue. In the depth-first case, this
could have been disastrous. We could have easily had a cycle or loop where we tested
one city, then a second, then the first again, ad infinitum.
Open: [A]
Closed[ ]
م أحمد المحنا.م
الذكاء االصطناعي
الملزمة الثالثة المرحله الثالثة
Almustaqbal university college
Open: [B,C]
Closed:[A]
Open:[C,D,E]
Closed:[A,B]
Time Complexity
Space Complexity
م أحمد المحنا.م
الذكاء االصطناعي
الملزمة الثالثة المرحله الثالثة
Almustaqbal university college
Example3: Apply this example using Depth First Search with goal
[N]?
Artificial Intelligence (AI). Definition
Artificial Intelligence (AI) is the field of designing, studying and using
artificial Intelligence
Systems.
What is Intelligent System?
It is computer based system with human like intelligence.
Why we do Need Artificial Intelligence?
Compared to human intelligence, artificial intelligence (AI) is:
Fast.
Accurate.
Scalable. Once built for a small task, it can be extended to larger tasks.
Robust. Doesn't get tired. Commits fewer mistakes.
Absence of fear: in dangerous situations
Replaceable. Once lost or damaged it can easily be replaced by an exact
copy of itself.
Cheaper than recruiting human operators.
Safer: in training simulation, virtual reality.. etc
Main aspects of Artificial Intelligence (AI)
When can we describe a system as being intelligent? Or what are the
important characteristics
of an intelligent system? These are the main aspects with some examples:
Can learn. (Neural Networks, model estimation, parameter estimation from
data, adaptive
Systems)
Can understand. (Computer vision, pattern recognition, natural language
processing)
Solve problems (AI Math problem solver, Chess player)
Can reason (think). (Chess player-e.g IBM Deep Blue system, inference
engines)
Human like communication – Natural language processing. (Google
Now, automatic report writer) Autonomous -independent of human
operator (unmanned vehicles, robots, chess player ) Tolerate errors
in input (Google search engine, natural language processing)
Toys and games: IBM Deep Blue chess player system. Beat the world
champion in chess!
Computer games: most computer games have some elements of AI
Aviation: rule based expert systems, computer vision.
AI is a very wide field and may contain many specialized subjects. Here is
a list of some of
The sub-fields of artificial intelligence:
Pattern Recognition
Neural Networks
Computer Vision
Speech Recognition
Robotics
Natural Language Processing
Knowledge based systems
Data Mining
Concept Mining
Game Theory
Fuzzy Logic
What is learning?
Learning here means to acquire new skill that was not in the system
originally. In general it
Means trying to approximate a mathematical model or the underlying
function that produces
Specific output in response to certain input.
The model or function can be learned through different ways:
1. through direct memory: direct memory association or look-up table. This
is simplest
Kinds of learning. However it is the most expensive in terms of resources
but easiest
In design. Example: learning a mathematical operation such as adding or
multiplying
By memorizing as many of the input-output pairs. Another example is to
search for
The word in a dictionary in a look-up table manner. That is for each word
there is only
One meaning or set of meanings. This is called learning the examples
2. Through Learning the actual rule and applying it directly. For example
learning the
Multiplication rule of two numbers is to add the first number to itself as
many times as
The second number. A. B = A + A + A … . (B times). This is called
Learning by
Explicit rules
3. Learning the model through generalization of individual examples. This
is learning
From examples or experience and it is the essence of Statistical learning.
Understanding in Artificial Intelligence
Understanding is a complex process. It consists of many elements and
prerequisites. It
Generally means turning incoming information into the context of the
internal knowledge
Base of the system, so that it means something. In other words,
understanding is to make the
Incoming sensory information means something.
Figure 1 Process of Understanding in an Intelligent System
Sensory information can take different forms. The most important kinds of
sensory information are visual and audio. Other sensory information can be
of wide range of physical nature such as voltage, pressure, temperature and
so on. All these kinds of information need to be processed by the intelligent
system. It is then divided into smaller parts; each part is identified as
meaningful entity and together with other parts constitutes an idea or a
situation that requires certain action (Figure 1).
For example, visual information is turned into images, each image is
divided into a number of individual objects. Each object is then turned into
a meaningful entity such as human, tree, chair, bird and so on. Meaningful
objects are combined and turned into some idea or concept. For instance,
the picture can be understood as pleasing scene of a garden, a dangerous
situation involving a speeding car, fire or other kind of risks, or just a
neutral situation that requires no action. The same can be applied to audio
data. The overall series of data is identified as meaningful speech or
sounds. If it is speech then it has to be divided into smaller parts, each
means something as a word. Then individual words are linked together and
a complete meaning of a sentence is made and understood.
An important part of this process is called Pattern Recognition. The role of
pattern recognition is to process the incoming raw data acquired by the
sensors such as the camera or microphone in the case of machine intelligent
system, or the eye or ear in the case of humans, and identify or recognize
meaningful parts.
In general there are two approaches in pattern recognition. The first
approach is Statistical Pattern Recognition which relies on the statistical
information in the data to decide on how detected objects are classified or
identified. For example, the data that represents a human face can vary
significantly. An image of a person can be different from another image of
the same person. However, there are some statistically common aspects
that characterize the images of the same person. The system uses this
statistical information and reaches a decision about the identity of a person
in the image. The same thing can be said about the way certain word is
pronounced. The word “hello” can sound differently by different speakers
or even by the same speaker at different times.
Introduction to Probability
The Random Variable (RV)
A random variable is a variable that is impossible to know its value without
conducting an experiment. Or it is a variable that is impossible to predict its
future value to 100% certainty. Random variable is often associated with
chance. For example, tossing a coin has two outcomes: head or tail.
Throwing a dice results in having 1, 2, 3, 4, 5 or 6. Random variables can
be discrete or continuous:
Discrete: can have one of a limited number of specific values. For
example, it can be the outcome of throwing a dice, tossing a coin, or the
colour of a ball taken blindly from a box full of balls of different colours.
Continuous: can have an infinite number of values in a continuous range.
For example the height of a person in a population is continuous random
variable since it can take any value between minimum (say 100 cm) and
maximum (e.g 220cm). The same can be said about the temperature in a
certain town, or the weight of an apple, the contamination level in river
water and so on.
Random Process
It is the process that generates or produces random variables. For example,
throwing a dice is a random process. Tossing a coin is another example.
The weather in certain country is a random process. All these examples
generate random variables. In the case of a dice the random variable is the
outcome of the dice throwing. In the case of weather, the random variables
can be the temperature, pressure, wind speed and so on.
What is Probability?
It is a measure of the chance that something can happen. It reflects the
degree of uncertainty in the outcome of a random process.
Example1: probability of getting the number 2 in a dice throwing is 1/6 or
0.133. It means that if the dice is thrown 100 times the expected number of
times that we get the number 2 is about 13 times out of 100 trials.
Example 2: the probability that the maximum temperature in Iraq is
between 35-40C during the month of April is around 0.7. In July the
probability drops to less than 0.02.
Department of Computer Engineering Techniques (Stage: 3)
Artificial Intelligence (Lecture 4)
MSc. Ahmed Adnan Hadi
Heuristic Search
The Traveling Salesman Problem (TSP), where a salesman makes a complete tour of
the cities on his route, visiting each city exactly once, while traveling the shortest
possible distance, is an example of a problem which has a combinatorial
explosion. As such, it cannot be solved using breadth-first or depth-first search for
problems of any realistic size. Unfortunately, there are many problems which have
this form and which are essentially intractable (they can’t be solved). In these cases,
finding the best possible answer is not computationally feasible, and so we have to
settle for a good answer. In this section we discuss several heuristic search methods
which attempt to provide a practical means for approaching these kinds of search
problems.
Heuristic search methods are characterized by this sense that we have limited time
and space in which to find an answer to complex problems and so we are willing
to accept a good solution. As such, we apply heuristics or rules of thumb as we are
searching the tree to try to determine the likelihood ) (احتمالthat following one path or
another is more likely to lead to a solution. Note this is in stark contrast to brute-
force methods which chug along merrily regardless of whether a solution is anywhere
in sight.
Heuristic search methods use objective functions called heuristic functions to try to
gauge the value of a particular node in the search tree and to estimate the value of
following down any of the paths from the node.
The generate and test algorithm is the most basic heuristic search function. The steps
are:
Example:-
Question for Students: what is the difference between this algorithm and depth
first search algorithm?
It may take an extremely long time. For small problems, generate and test can
be an effective algorithm, but for large problems, the undirected search
strategy leads to lengthy run times and is impractical.
The major weakness of generate and test is that we get no feedback on which
direction to search. We can greatly improve this algorithm by providing
feedback through the use of heuristic functions.
It is an improved generate-and-test algorithm, where feedback from the tests are used
to help direct the generation (and evaluation) of new candidate states. When a node
state is evaluated by the goal test function, a measure or estimate of the distance to the
goal state is also computed.
Pseudo-Code Algorithm
Example
In this example, (a) is initial State and (h) and (k) is final states and it says that the
numbers near the states are the heuristic values.
What are disadvantages of this algorithm?
One problem with hill climbing search in general is that the algorithm can get
caught in local minima or maxima. Because we are always going in the
direction of least cost, we can follow a path up to a locally good solution,
while missing the globally excellent solution available just a few nodes away.
Once at the top of the locally best solution, moving to any other node would
lead to a node with lower goodness.
Department of Computer Engineering Techniques (Stage: 3)
Artificial Intelligence (Lecture 4)
MSc. Ahmed Adnan Hadi
Another possibility is that a plateau or flat spot exists in the problem space.
Once the search algorithm gets up to this area all moves would have the same
goodness and so progress would be halted.
It isn’t complete and can't guarantee to find the global maxima. The benefit, of
course, is that it requires a fraction of the resources. In practice and applied to
the right problems, it's a very effective solution.
Hill Climbing don't generally backtrack, because it doesn’t keeping track of
state (it's local search) and you would be moving away from a maxima
Solutions
A common way to avoid getting stuck in local maxima with Hill Climbing is to use
random restarts. In the example in above if G is a local maxima, the algorithm would
stop there and then pick another random node to restart from. So if J or C were
picked (or possibly A, B, or D) you would find the global maxima in H or K. it will find
the global maxima or something close; depending on time/resource limitations and
the problem space.
Another solution to avoid getting trapped in suboptimal states, variations on
the hill climbing strategy have been proposed. One is to inject noise into the
evaluation function, with the initial noise level high and slowly decreasing
over time. This technique, called simulated annealing, allows the search
algorithm to go in directions which are not “best” but allow more complete
exploration of the search space.