0% found this document useful (0 votes)
13 views33 pages

Module-2 Material 1

Uploaded by

tagarmy252
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views33 pages

Module-2 Material 1

Uploaded by

tagarmy252
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

2.

Learning Processes

2.1 Introduction
• Central properties of a neural network:
- Learning from its environment.
- Improvement in performance through learning.

• In practice, some suitable learning algorithm must be used.

• This adjusts the synaptic weights (free parameters of the network)


according to some meaningful rules.

1
Organization of the Chapter 2

The chapter consists of three parts:


1. In Sections 2.2-2.6, five basic learning rules are discussed:
• error-correction learning;
• memory-based learning;
• Hebbian learning;
• competitive learning;
• Boltzmann learning.
2. Then various learning methodologies are studied:
• credit-assignment problem (Section 2.7);
• learning with a teacher (Section 2.8);
• learning without a teacher (Section 2.9);
3. Learning tasks, memory, and adaptation are studied in
Sections 2.10-2.12.
2
2.2 Error-Correcting Learning
• Consider the simple case of a single neuron k.

• This lies in the output layer of the multilayer feedforward network

• Neuron k is driven by a signal vector x(n) produced by the preceding


hidden layer(s).

• The argument n denotes discrete time, or actually the iteration number


in the learning algorithm.
3
• yk (n) denotes the output signal of neuron k.

• This is compared to the desired response or target output dk (n).

• Thus the error signal is defined by

ek (n) = dk (n) − yk (n)

• The error signal ek (n) is used to adjust the values of the synaptic
weights.

• The output signal yk (n) should approach the desired response dk (n)
in a step-by-step manner.

• This is achieved by minimizing a cost function or index of performance


defined in this case by

E(n) = 0.5[ek (n)]2

• E(n) is the instantaneous value of the error energy


4
• Learning is continued until the synaptic weights are essentially stabi-
lized.
• This type of learning process is called error-correction learning.
• Minimization of the cost function E(n) leads to the so-called delta
rule or Widrow-Hoff rule (1960):

∆wkj (n) = ηek (n)xj (n). (1)

• Here wkj (n) is the j-th element of the weight vector wk (n) of the
output neuron k.
• xj (n) is the corresponding j-th component of the signal vector x(n).
• ∆wkj (n) is the adjustment (update, correction) made to the weight
wkj (n) at iteration step n.
• The learning-rate parameter η is a positive constant which determines
the amount of correction.
• The error signal must be directly measurable.
5
• This means that we must know the desired response dk (n), or the
learning process is supervised.

• Widrow-Hoff learning rule is local.

• It uses information directly available to neuron k through its synaptic


connections.

• The new, updated value of the synaptic weight wkj (n) is computed
from the rule

wkj (n + 1) = wkj (n) + ∆wkj (n). (2)

• This formula could be represented also in the form

wkj (n) = z −1 [wkj (n + 1)] (3)

where z −1 is the unit-delay operator (storage element) used widely in


digital signal processing.

6
• Signal-flow graph representation of the error-correction learning process.

• Actually this is a closed-loop feedback system

7
• The learning parameter η is crucial to the performance of
error-correction learning in practice.

• It affects to the:
- stability of the learning algorithm;
- convergence speed;
- final accuracy achieved.

• Error-correction learning is discussed in much greater detail in Chapters


3 and 4.

8
2.3 Memory-Based Learning
• In memory-based learning, all (or most) of past experiences are explicit-
ly stored in a large memory.

• This consists of correctly classified input-output examples


{(x1 , d1 ), (x2 , d2 ), . . . , (xN , dN )}.

• Again, xi denotes i-th input vector and di the corresponding desired


response.

• Without loss of generality, di can be restricted to be a scalar.

• Typically, di is the number of pattern class.

• Consider now classification of a test vector xtest not seen before.

• This is done by retrieving and analyzing the training data in a local


neighborhood of xtest .

• All memory-based learning algorithms involve two parts:

9
1. Criterion used for defining the local neighborhood of the test
vector xtest .
2. Learning rule applied to the training examples in the local neigh-
borhood of the test vector xtest .

• A simple yet effective memory-based learning algorithm is known as


the nearest-neighbor rule.

• The vector x0N belonging to the set of training vectors {x1 , x2 , . . . , xN }


is the nearest neigbor of xtest if

mini d(xi , xtest ) = d(x0N , xtest )

where d(xi , xtest ) is the Euclidean distance between the vectors xi and
xtest .

• xtest is classified into the same class as its nearest neigbor x0N .

• Nearest neigbor rule is independent of the underlying distribution.

10
• Assume that:
- The training samples are independently and identically distributed;
- The sample size N is infinitely large.

• One can then show that the probability of error in nearest neighbor
classification is at most twice the Bayes probability of error (Cover and
Hart, 1967).

• The Bayes error is the smallest possible (optimal one).

• It is discussed in Chapter 3.

• Thus half the classification information in an infinitely large training


set is contained in the nearest neighbor.

11
K-nearest neighbor classifier

• xtest is classified to the class which is most frequently represented in


its k-nearest neighbors (a majority vote).

• The k-nearest neigbor classifier averages information, rejecting single


outliers.

• outlier = exceptional, often erroneous observation.

12
• k-nearest neighbor classifier (k = 3)

• Another important type of memory-based classifiers:


radial-basis function networks (Chapter 5).

13
2.4 Hebbian Learning
• Hebb’s postulate of learning (1949) is the oldest and most famous
neural learning rule

• A modern, expanded version of Hebb’s learning rule consists of two


parts:

1. If two neurons on either side of a synapse (connection) are acti-


vated simultaneously (synchronously), then the strength of that
synapse is selectively increased.
2. If two neurons on either side of a synapse are activated asynchro-
nously, then that synapse is selectively weakened or eliminated.

• Such a synapse is called a Hebbian synapse.

14
• Key properties of a Hebbian synapse:

1. Time-dependent mechanism
- Modification depends on the exact time of occurrence of presy-
naptic and postsynaptic signals.
2. Local mechanism
- A Hebbian synapse uses only local information available
for a neuron.
3. Interactive mechanism
- Change in a Hebbian synapse depends both on presynaptic and
postsynaptic signals.
- Interaction between these signals can be either
deterministic or statistical in nature.
4. Conjunctional or correlational mechanism
- Correlation over time between presynaptic and
postsynaptic signals is responsible for a synaptic change.

15
• Synaptic modifications can be classified as Hebbian, anti-Hebbian, and
non-Hebbian ones.

• A Hebbian synapse increases its strength for positively correlated pre-


synaptic and postsynaptic signals.

• It decreases the strength for uncorrelated and negatively correlated


signals.

• An anti-Hebbian synapse operates just in the reverse manner.

• A non-Hebbian synapse does not use Hebbian type learning.

xj yk

Other neurons

16
Mathematical Models of Hebbian Learning

• Consider a synaptic weight wkj of neuron k.

• The respective presynaptic (input) signal is denoted by xj .

• The postsynaptic (output) signal is denoted by yk .

• The change (update) of wkj at time step n has the general form

∆wkj (n) = F (yk (n), xj (n))

where F (y, x) is a function of both postsynaptic and presynaptic sig-


nals.

• Consider two specific forms of the general Hebbian learning rule.

17
1. Standard Hebbian learning rule:

∆wkj (n) = ηyk (n)xj (n)

- Here η is again the learning rate or parameter.


- Repeated application of the input (presynaptic) signal
xj leads to an increase in the output signal yk .
- Finally this leads to an exponential growth and
saturation of the weight value.

18
2. Covariance Hebbian rule:

∆wkj (n) = η[xj (n) − mx ][yk (n) − my ]

Here mx and my are time averages of the presynaptic input signal


xj and postsynaptic output signal yk , respectively.
Covariance rule can converge to a nontrivial state
xj = mx , yk = my (error in Haykin’s book here!?).
The synaptic strength can both increase and decrease.

19
Standard Hebbian rule and the covariance rule.

• In both cases, the weight update depends on the output signal yk


linearly.

• Hebbian learning has strong physiological evidence.

20
2.5 Competitive Learning
• In competitive learning, the neurons in the output layer compete to
become active (fired).

• Only a single output neuron is active at any one time.

• In Hebbian learning, several output neurons may be active simulta-


neously.

• Competitive learning is highly suitable for finding relevant features for


classification tasks.

21
• Three basic elements of a competitive learning rule:

1. A set of similar neurons except for some randomly distributed


synaptic weights.
- Therefore the neurons respond differently to input signals.
2. A limit imposed on the strength of each neuron.
3. A competing mechanism for the neurons.
- Only one output neuron has the right to respond to an input
signal.
- The winner of the competition is called a winner-takes-all neu-
ron.

• As a result of competition, the neurons become specialized.

• They respond to certain type of inputs, becoming feature detectors for


different input classes.

22
• Simplest form of a competitive neural network

23
• Feedback connections between the competing output neurons perform
lateral inhibition.

• Each neuron tends to inhibit the neuron to which it is laterally con-


nected.

• A neuron k is the winning neuron if its induced local field vk for a given
input pattern x is the largest one.

• Mathematically, the output signal

yk = 1, if vk > vj for all j, j 6= k.

For other than the winning neuron, the output signal yk = 0.

• The local field vk represents the combined action of all the forward
and feedback inputs to neuron k.

• Typically, all the synaptic weights wkj are positive.

24
• Normalization condition giving equal portion of synaptic weight “mass”
to each neuron: X
wkj = 1 for all k
j

• A neuron learns by shifting synaptic weights from its inactive input


nodes to the active ones.

• The standard competitive learning rule:

∆wkj = η(xj − wkj ) if neuron k wins;

∆wkj = 0 if neuron k loses the competition.

• This learning rule moves the weight vector wk of the winning neuron
k toward the input pattern x.

25
• Here both the input vectors x and the weight vectors wk are scaled to
have unit length (Euclidean norm).

• Then they are points on the surface of an N -dimensional hypersphere


(assuming N -dimensional vectors).

• Initial state (Fig. a) shows three clusters of data points (black dots)
and initial values of three weight vectors (crosses).

26
• Figure b shows a typical final state of a network resulting from com-
petitive learning.

• The weight vectors have moved to the gravity centers of clusters.

• In more difficult cases, competitive learning algorithms may fail to find


stable clusters.

27
2.7 Credit-Assignment Problem
• Credit assignment (Minsky, 1961) is a useful concept in studying lear-
ning algorithms for distributed systems.

• Basic problem: Assign credit or blame to internal decisions made by a


learning machine.

• The reward or punishment is based on the quality of the overall output


provided by the learning machine.

• Often the outputs of a learning machine depend directly on some ac-


tions and only indirectly on the internal decisions.

28
• In these situations the credit-assignment problem may be decomposed
into two subproblems (Sutton, 1984):

1. Temporal credit-assignment problem


- The assignment of credits to actions for outcomes.
- Involves the instants of time when the actions that deserve credit
were actually taken.
2. Structural credit-assignment problem
- Credit assignment to the internal structures of the actions ge-
nerated by the system.

• The structural credit-assignment problem is relevant when we must


determine precisely which component of the system should alter its
behavior and how much for improving overall system performance.

• The temporal credit-assignment problem is relevant when we must de-


termine which of the actions were responsible for the outcomes obtai-
ned.

• Often both problems encounter simultaneously.


29
• The credit-assignment problem arises for example when error-correction
learning is applied to a multilayer feedforward neural network.

• Both the neurons in the hidden layers and in the output layer are
responsible for the overall behavior of the network.

• As an example, consider this

• It is straightforward to adjust the synaptic weights of the output neuron


using the known desired response and error-correction learning

• Fundamental question: how the weights of hidden neurons are adjus-


ted? — This will be discussed later on in Chapter 4.

30
2.8 Learning with a Teacher
• Called also supervised learning.

• A block diagram of a supervised learning system.

• Assumption: teacher has knowledge about the environment.

31
• This knowledge is represented by the known input-output pairs (trai-
ning data).

• The environment is unknown to the neural network.

• Using error-correction learning (for example), knowledge of the teacher


is transferred to the neural network.

• After learning, the neural network should be able to process new data
independently without a teacher.

• The learning system is a closed-loop feedback system.

• Typical error measure: mean-square error, as a function of the free


parameters of the system.

• This function can be described geometrically using a multidimensional


error surface.
- Coordinates: the free parameters to be optimized.

• The error surface is averaged over all possible input-output examples.


32
• Any supervised operation corresponds to a point on the error surface.

• The optimum operating point is the global minimum of the error sur-
face.

• In supervised learning, this global minimum is searched iteratively using


the gradient (derivative) of the error surface.

• The (negative) gradient vector shows the direction of steepest descent


at any point of the error surface.

• In practice, an instantaneous estimate of the gradient vector is often


used.

• Results in statistical fluctuations in learning.

• However, the correct minimum may be achieved using enough training


data and iterations.

• Gradient type learning may result in a local minimum.

33

You might also like