0% found this document useful (0 votes)

231 views15 pages

Slide02 Haykin Chapter 2: Learning Processes

This document provides an introduction to neural network learning processes. It discusses several key points: 1. Neural networks learn by adjusting their synaptic weights through iterative processes in response to environmental stimuli in order to improve performance. 2. There are five basic learning rules: error correction, Hebbian, memory-based, competitive, and Boltzmann. Learning paradigms include supervised vs unsupervised learning. 3. Error correction learning aims to minimize an error function by gradually adjusting synaptic weights according to an error signal. The delta rule provides a theoretical basis for this type of learning.

Uploaded by

hossein_kho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

231 views15 pages

Slide02 Haykin Chapter 2: Learning Processes

Uploaded by

hossein_kho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Slide02 Introduction

Haykin Chapter 2: Learning • Property of primary significance in nnet: learn from its
environment, and improve its performance through learning.
Processes • Iterative adjustment of synaptic weights.

• Learning: hard to define.

CPSC 636-600
Instructor: Yoonsuck Choe – One definition by Mendel and McClaren: Learning is a

Spring 2008 process by which the free parameters of a neural network are
adapted through a process of stimulation by the environment
in which the network is embedded. The type of learning is
determined by the manner in which the parameter changes
take place.

1 2

Learning Overview
Sequence of events in nnet learning:
Organization of this chapter:
• nnet is stimulated by the environment.
1. Five basic learning rules
• nnet undergoes changes in its free parameters as a result of this error correction, Hebbian, memory-based, copetetive, and
stimulation. Boltzmann
• nnet responds in a new way to the environment because of the 2. Learning paradigms
changes that have occurred in its internal structure.
credit assignment problem, supervised learning, unsupervised
A prescribed set of well-defined rules for the solution of the learning learning
problem is called a learning algorithm.
3. Learning tasks, memory, and adaptation
The manner in which a nnet relates to the environment dictates the
4. Probabilistic and statistical aspects of learning
learning paradigm that refers to a model of environment operated on
by the nnet.

3 4
Error-Correction Learning Error-Correction Learning: Delta Rule

• Input x(n), output yk (n), and desired response or target • Widrow-Hoff rule, with learning rate η :
output dk (n).
∆wk j(n) = ηek (n)xj (n)
• Error signal ek (n) = dk (n) − yk (n)
• With that, we can update the weights:
• ek (n) actuates a control mechanism that gradually adjust the
synaptic weights, to miminize the cost function (or index of wkj (n + 1) = wkj (n) + ∆wk j(n)
performance):
1 2 • There is a sound theoretical reason for doing this, which we will
E(n) = e (n) discuss later.
2 k
• When synaptic weights reach a steady state, learning is stopped.
5 6

Memory-Based Learning Memory-Based Learning: Nearest Neighbor

• A set of instances observed so far:
• All (or most) past experiences are explicitly stored, as input-target
pairs{xi , di )}N X = {x1 , x2 , ..., xN }
i=1 .

• Two classes C1 , C2 . • Nearest neighbor x0N ∈ X of xtest :

• Given a new input xtest , determine class based on local min d(xi , xtest ) = d(xi , xtest )
i
neighborhood of xtest .
where d(·, ·) is the Euclidean distance.
– Criterion used for determining the neighborhood
• xtest is classified as the same class as x0N .
– Learning rule applied to the neighborhood of the input, within
the set of training examples. • Cover and Hart (1967): The bound on error is at max twice that of
the optimal (Bayes probability of error), given
– The classified examples are independently and identically
distributed.
– The sample size N is infinitely large.
7 8
Memory-Based Learning: k−Nearest Neighbor Hebbian Learning
0
0 0
0
• Donald Hebb’s postulate of learning appeared in his book The
0 0
0 Organization of Behavior (1949).
0 0 0
1 1
1 1 0 0 When an axon of cell A is near enough to excite a cell B
1 1 0
1 x 1 1 and repeatedly or persistently takes part in firing it, some
0
1 1 growth process or metabolic changes take place in one or
1 1
both cells such that A’s efficiency as one of the cells firing
• Identify k classlfied patterns that lie nearest to the test vector B, is increased.
xtest , for some integer k. • Hebbian synapse
• Assign xtest to the class that is most frequently represented by – If two neurons on either side of a synapse are activated
the k neighbors (use majority vote). simultaneously, the synapse is strengthened.

• In effect, it is like averaging. It can deal with outliers. The input x – If they are activated asynchronously, the synapse is
above will be classified as 1. weakened or eliminated. (This part was not mentioned in
Hebb.)
9 10

Hebbian Synapses Classification of Synaptic Plasticity

• Time-dependent mechanism Hebbian: time-dependent, highly local, heavily interactive.

• Local mechanism Type Positively correlated Negatively correlated

• Interactive mechanism Hebbian Strengthen Weaken

Anti-Hebbian Weaken Strengthen

• Correlative/conjunctive mechanism
Non-Hebbian × ×
Strong evidence for Hebbian plasticity in the Hippocampus (brain
region).

11 12
Mathematical Models of Synaptic Plasticity Covariance Rule (Sejnowski 1977)

∆wkj = η(xj − x̄)(yk − ȳ)

• Convergence to a nontrivial state
• Prediction of both potentiation and depression.
• Observations:
– Weight enhanced when both pre- and post-synaptic activities
are above average.
– Weight depressed when

• General form: ∆wkj (n) = F (yk (n), xj (n))

∗ Presynaptic activity more than average, and postsynaptic
activity less than average.
• Hebbian learning (with learning rate η ): ∆wkj (n) = ηyk (n)xj (n)
∗ Presynaptic activity less than average, and postsynaptic
• Covariance rule:∆wkj = η(xj − x̄)(yk − ȳ) activity more than average.

13 14

Competetive Learning Inputs and Weights Seen as Vectors in

High-dimensional Space
• Output neurons compete with each other for a chance to become ( x1 x2 x3 ... xn ) coordinate system
active.
x1 w
• Highly suited to discover statistically salient features (that may aid wk1
x2 wk2 x
in classification).
x3 wk3
yk
• Three basic elements:
wkn
– Same type of neurons with different weight sets, so that they ...
respond differently to a given set of inputs. xn ( wk1 wk2 wk3 ... wkn )

– A limit imposed on the strength of each neuron.

– Competition mechanism, to choose one winner: • Inputs and weights can be seen as vectors: x and wk . Note that
winner-takes-all neuron. the weight vector belongs to a certain output neuron k , and thus
the index.

15 16
Competetive Learning: Example Competetive Learning
• Single layer, feedforward excitatory, and lateral in-
hibitory connections x−w(n)
• Winner selection
x η (x−w(n))
( w(n+1)
1 if vk > vj for all j , j 6= k
yk =
0 otherwise

• Limit:
P
wkj = 1 for all k.
w(n)
j
• Adaptation: • Adaptation:
( 8
η(xj − wk j) if k is the winner < η(x − w j) if k is the winner
∆wkj = j k
0 otherwise ∆wkj =
: 0 otherwise
* The synaptic weight vector wk =
Interpreting this as a vector, we get the above plot.
(wk1 , wk2 , ..., wkn ) is moved toward
the input vector. • Weight vectors converge toward local input clusters: clustering.
17 18

Boltzmann Learning Boltzmann Machine

• Stochastic learning algorithm rooted in statistical mechanics.
• Recurrent network, binary neurons (on: ‘+1’, off: ‘-1’).
• Energy function E :
1X X
E=− wkj xk xj
2 j k,k6=j
• Two types of neurons
• Activation: – Visible neurons: can be affected by the environment
– Choose a random neuron k . – Hidden neurons: isolated

– Flip state with a probability (given temperature T ) • Two modes of operation

1 – Clamped: visible neuron states are fixed by environmental input and
P (xk → −xk ) = held constant.
1 + exp(−∆Ek /T )
– Free-running: all neurons are allowed to update their activity freely.
where ∆Ek is the change in E due to the flip.

19 20
Boltzmann Machine: Learning and Operation Learning Paradigms
• Learning:
How neural networks relate to their environment
+
– Correlation of activity during clamped condition ρkj
− • credit assignment problem
– Correlation of activity during free-running condition ρkj

∆wkj = η(ρ+ − • learning with a teacher

– Weight update: kj − ρkj ), j 6= k .
• Train weights wkj with various clamping input patterns. • learning without a teacher

• After training is completed, present new clamping input pattern

that is a partial input of one of the known vectors.

• Let it run clamped on the new input (subset of visible neurons),

and eventually it will complete the pattern (pattern completion).
Cov(x, y)
Correl(x, y) =
σx σy

21 22

Credit-Assignment Problem Learning with a Teacher

• How to assign credit or blame for overall outcome to individual

decisions made by the learning machine.

• In many cases, the outcomes depend on a sequence of actions.

– Assignment of credit for outcomes of actions (temporal
credit-assignment problem): When does a particular action
deserve credit.

– Assignment of credit for actions to internal decisions

• Also known as supervised learning
(structural credit-assignment problem): assign credit to
internal structures of actions generated by the system. • Teacher has knowledge, represented as input–output examples. The
environment is unknown to the nnet.
Credit-assignment problem routinely arises in neural network learning. • Nnet tries to emulate the teacher gradually.
Which neuron, which connection to credit or blame?
• Error-correction learning is one way to achieve this.
• Error surface, gradient, steepest descent, etc.
23 24
Learning without a Teacher Learning without a Teacher: Reinforcement Learning
• Learning input-output mapping through continued
interaction with the environment.

• Actor-critic: cricit converts primary reinforce-

ment signal into higher-quality, heuristic rein-
forcement signal (Barto, Sutton, ...).

• Goal is to optimize the cumulative cost of actions.

• In many cases, learning is under delayed rein-
forcement. Delayed RL is difficult since (1) teacher
does not provide desired action at each step, and
Two classes (2) must solve temporal credit-assignment problem.

• Relation to dynamic programming, in the context

• Reinforcement learning (RL)/Neurodynamic programming
of optimal control theory (Bellman).
• Unsupervised learning/Self-organization

25 26

Learning without a Teacher: Unsupervised Learning Learning Tasks, Memory, and Adaptation

Learning tasks

• Pattern association

• Pattern recognition
• Learn based on task-independent measure of the quality of • Function approximation
representation.
• Control
• Internal representations for encoding features of the input space.
• Filtering/Beamforming
• Competetive learning rule needed, such as winner-takes-all.
Memory and adaptation

27 28
Pattern Association Pattern Classification
• Mapping between input pattern and a pre-
scrived number of classes (categories).

• Two general types:

• Associtive memory: brainlike distributed memory that learns
– Feature extraction (observation space to
association. Storage and retrieval (recall).
feature space: cf. dimensionality reduc-
• Pattern association (xk : key pattern, yk : memorized pattern): tion), then classification (feature space to
xk → yk , k = 1, 2, ..., q decision space).
– Single step (observation space to deci-
– autoassociation (xk = yk ): given partial or corrupted
sion space).
version of stored pattern and retrieve the original.
– heteroassociation (xk 6= yk ): Learn arbitrary pattern pairs
and retrieve them.

• Relevant issues: storage capacity vs. accuracy.

29 30

Function Approximation Function Apprix: System Identification and Inverse

System Modeling
• Nonlinear input-output mapping: d = f (x) for an unknown f .

• Given a set of labeled examples T = {(xi , di )}N

i=1 , estimate
F(·) such that

kF(x) − f (x)k < , for all x

• System identification: learn function of an unknown system.

d = f (x)

• Inverse system modeling: learn inverse function:

x = f −1 (d)

31 32
Control Filtering, Smoothing, and Prediction

Extract information about a quantity of interest from a set of noisy data.

• Filtering: estimate quantity at time n, based on measurements up

to time n.

• Control of a plant, a process or critical part of a system that is to • Smoothing: estimate quantity at time n, based on measurements
be maintained in a controlled condition. up to time n + α (α > 0).

• Feedback controller: adjust plant input u so that the output of the • Prediction: estimate quantity at time n + α, based on
plant y tracks the reference signal d. Learning is in the form of measurements up to time n (α > 0).

free-parameter adjustment in the controller.

33 34

Blind Source Separation and Nonlinear Prediction Linear Algebra Tip: Partitioned (or Block) Matrices

• When multiplying matrices or matrix and a vector, partitioning them and

• Blind source separation: recover u(n) from distorted signal multiplying the corresponding partitions can be very convenient.
x(n) when mixing matrix A is unknown.
• Consider the 4 × 5 matrix above (let’s"call it X). If#you have another
x(n) = Au(n) E, F
5 × 4 matrix partitioned similarly into (let’s call it Y ), then
G, H
you can calculate the product as another block matrix:
• Nonlinear prediction: given x(n − T ), x(n − 2T ), ...,
estimate x(n) (x̂(n) is the estimated value).
" #" # " #
A, B E, F AE + BG, AF + BH
XY = =
C, D G, H CE + CG, CF + CH

Example from http:

//[Link]/matrix_linear_trans/08_partition/[Link].

35 36
Memory Associative Memory
xk1 yk1

xk2 yk2
• Memory: relatively enduring neural alterations induced by an

...
...
organism’s interaction with the environment. x kj yki

wij(k)

...

...
• Memory needs to be accessible by the nervous system to xkm ykm

influence behavior.
q pattern pairs: (xk , yk ), for k = 1, 2, ..., q .
• Activity patterns need to be stored through a learning process.
• Input (key vector) xk = [xk1 , xk2 , ..., xkm ]T .
• Types of memory: short-term and long-term memory.
• Output (memorized vector) yk = [yk1 , yk2 , ..., ykm ]T .
• Weights can be represented as a weight matrix:
yk = W(k)xk , for k = 1, 2, ..., q
m
X
yki = wij (k)xkj , for m = 1, 2, ..., m
j=1
37 38

Associative Memory (cont’d) Associative Memory (cont’d)

• Weight matrix: • With a single W(k), we can only represent one mapping (xk to
yk = W(k)xk , for k = 1, 2, ..., q yk ). For all pairs (xk , yk ) (k = 1, 2, ..., q ), we need q such
m
weight matrices.
X
yki = wij (k)xkj , for i = 1, 2, ..., m
yk = W(k)xk , for k = 1, 2, ..., q
j=1

• One strategy is to combine all W(k) into a single memory

2 3
xk1
matrix M by simple summation:
6 7
6 xk2 7
yki = [wi1 (k), wi2 (k), ..., wim (k)] 6 7 , i = 1, 2, ..., m
6
:
7 q
X
4 5
xkm M= W(k)
2 3 2 32 3 k=1
yk1 w11 (k), w12 (k), ..., w1m (k) xk1
6 7 6 76 7 • Will such a simple strategy work? That is, can the following be
6 yk2 w21 (k), w22 (k), ..., w2m (k) 7 6 xk2
possible with M?
7 6 7
6 7=6 76 7
: ... :
6 7 6 76 7
4 5 4 54 5
ykm wm1 (k), wm2 (k), ..., wmm (k) xkm yk ≈ Mxk , for k = 1, 2, ..., q
39 40
Associative Memory: Example – Storing Multiple Correlation Matrix Memory
Mappings • With q pairs (xk , yk ), we can construct a candidate memory matrix that
stores all q mappings as:
With fixed set of key vectors xk , an m × m matrix can store m arbitrary
output vectors yk . q
X T T T T
T
M̂ = yk xk = y1 x1 + y2 x2 + ... + yq xq ,
• Let xk = [0, 0, ...1, ...0] where only the k-th element is 1 and all k=1
the rest is 0.
T
where yk xk represents the outer product of vectors that results in a
• Construct a memory matrix M with each column representing the
matrix, i.e.,
arbitrary output vectors yk : T
(yk xk )ij = yki xkj .
2 3
6 7 • A more convenient notation is:
M=6
4 y1 , y2 , ..., ym 5
7 2 3
x1
h i6
6 x2
7
7 = YXT .
7
M̂ = y1 , y2 , ..., yq 6
• Then, yk = Mxk , for all k = 1, 2, ..., m. 4 ...
6 7
5
• But, we want xk to be arbitrary too! xq

This can be verified easily using partitioned matrices.

41 42

Correlation Matrix Memory: Recall Correlation Matrix Memory: Recall (cont’d)

• Will M̂xk give yk ? • Now, back to M̂: under what condition will M̂xj give yj for all j ? Let’s
T =
begin by assuming xk xk 1 (key vectors are normalized).
• For convenience, let’s say • We can decompose M̂xj as follows:
q q
X X q q
M̂ = yk xT
k = W(k). M̂xj =
X T
yk xk xj = yj xj xj +
T
X T
yk xk xj .
k=1 k=1 k=1 k=1,k6=j

• First, consider W(k) = yk xT

k only. • We know yj xT
j xj = yj , so it now becomes:
Check if W(k)xk = yk :
q
X T
M̂xj = yj + yk xk xj .
W(k)xk = yk xT
k xk = yk (xT
k xk ) = cyk k=1,k6=j
| {z }
where c = xT k xk , a scalar value (the length of vector xk
Noise term

squared). If all xk s were normalized to have length 1, • If all keys are orthogonal (perpendicular to each other), then for an arbitrary
W(k)xk = yk will hold! k 6= j , xT k xj = kxk kkxj k cos(θkj ) = 1 × 1 × 0 = 0 , so
the noise term becomes 0, and hence M̂xj = yj + 0 = yj . The
example in page 41 is one such (extreme) case!
43 44
Correlation Matrix Memory: Recall (cont’d) Adaptation
• We can also ask how many items can be stored in M̂, i.e., its
• When the environment is stationary (the statistic characteristics
capacity.
do not change over time), supervised learning can be used to
• The capacity is closely related with the rank of the matrix M̂. obtain a relatively stable set of parameters.
The rank means the number of linearly independent column
vectors (or row vectors) in the matrix. • If the environment is nonstationary, the parameters need to be
adapted over time, on an on-going basis (continuous learning or
• Linear independence means a linear combination of the vectors
learning-on-the-fly).
can be zero only when the coefficients are all zero:
• If the signal is locally stationary (pseudostationary), then the
c1 x1 + c2 x2 + ... + cn xn = 0
parameters can repeatedly be retrained based on a small window
only when ci = 0 for all i = 1, 2, ..., n. of samples, assuming these are stationary: continual training

• The above and the examples in the previous pages are best with time-ordered samples.
understood by running simple calculations in Octave or Matlab.
See the src/ directory for example scripts.
45 46

Statistical Nature of Learning Statistical Nature of Learning (cont’d)

• Deviation between the target function f (x) and the neural
• The error term is typically assumed to have a zero mean:
network relization of the function F (x, w) can be expressed in
E[|x] = 0. (E[·] is the expected value of a random variable.)
statistical terms (note F (·, ·) is parameterized by the weight w).

• Random input vectors X ∈ {xi }N • In this light, f (x) can be expressed in statistical terms:
i=1 and random output scalar
N
values D ∈ {di }i=1 f (x) = E[D|x], since from

• Suppose we have a training set T = {(xi , di )}N i=1 . The D = f (X) + , we can get
problem is that the target values D in the training set may only be
approximate (D ≈ f (X), i.e., D 6= f (X)).
E[D|x] = E[f (x) + ] = f (x) + E[|x] = f (x).

• So, we end up with a regressive model: • A property that can be derived from the above is that the
expectational error term is independent of the regressive function:
D = f (X) + ,
E[f (X)] = 0. This will become useful in the following.
where f (·) is deterministic and is a random expectational error
representing our ignorance.
47 48
Statistical Nature of Learning (cont’d) Statistical Nature of Learning (cont’d)
• Neural network realization of the regressive model:
Y = F (X, w). d−F (x, T ) = d−f (x)+f (x)−F (x, T ) = +(f (x)−F (x, T )).

We want to map the knowledge in the training data T into the With that,

weights w. 1 h
2
i
E(w) = ET (di − F (x, T )) becomes
2
• We can now define the cost function:
N
1 h
2
i
1X = ET ( + (f (x) − F (x, T ))
E(w) = (di − F (xi , w))2 2
2 i=1 1 h
2 2
i
= ET + 2(f (x) − F (x, T )) + (f (x) − F (x, T ))
2
which can be written equivalently as an average over the training
1 2 1 2
set ET [·]: = ET [ ] + ET [(f (x) − F (x, T ))] + ET [(f (x) − F (x, T )) ]
2 | {z } 2
1
| {z } | {z }
This reduces to 0
ET (di − F (x, T ))2
ˆ ˜
E(w) = Intrinsic error We’re interested in this!
2

49 50

Statistical Nature of Learning: Bias/Variance Dillema Bias/Variance Dillema (cont’d)

The cost function we derived
• The bias indicates how much F (x, T ) differs from the true
2
ET [(f (x) − F (x, T )) ] function f (x): approximation error

can be rewritten, knowing f (x) = E[D|x]: • The variance indicates the variance in F (x, T ) over the entire
training set T : estimation error
ET [(E[D|x] − F (x, T ))2 ]
= ET [(E[D|x] − ET [F (x, T )] + ET [F (x, T )] − F (x, T ))2 ] • Typically, achieving smaller bias leads to higher variance, and
2 2
= (ET [F (x, T )] − E[D|x]) + ET [(F (x, T ) − ET [F (x, T )]) ]. smaller variance leads to higher bias.
| {z } | {z }
Bias V ariance

[E[D|x]2 ] = E[D|x]2 ,
The last step above is obtained using ET
ET [ET [F (x, T )]2 ], = ET [F (x, T )]2 , and
ET [E[D|x]F (x, T )] == E[D|x]ET [F (x, T )].
* Note: E[c] = c and E[cX] = cE[X] for constant c and random variable X .

51 52
Statistical Learning Theory Appendix on VC Dimension
• Statistical learning theory addresses the fundamental issue of
• The concept of Shattering
how to control the generalization ability of a neural network in
mathematical terms. • VC dimension
• Certain quantities such as sample size and the
Vapnik-Chevonenkis dimension (VC dimension) is closely
related to the bounds on generalization error.

• The probably approximately correct (PAC) learning model is

another framework to study such bounds. In this case, the the
confidence δ (probably) and tolerable error level (approximately
correct) are important quantities. Given these, and other
measures such as the VC dimension, we can calculate the
sample complexity (how many samples are needed to achieve
that level of correctness with that much confidence δ ).

53 54

Shattering a Set of Instances Three Instances Shattered

Instance space X
Definition: a dichotomy of a set S is a partition of S into
two disjoint subsets.

Definition: a set of instances S is shattered by a function

class F if and only if for every dichotomy of S there exists
some function in F consistent with this dichotomy.

Each closed contour indicates one dichotomy. What kind of classifier

function can shatter the instances?

55 56
The Vapnik-Chervonenkis Dimension VC Dim. of Linear Decision Surfaces

Definition: The Vapnik-Chervonenkis dimension,

V C(F ), of function class F defined over sample space X
is the size of the largest finite subset of X shattered by F . If (a) (b)

arbitrarily large finite sets of X can be shattered by F , then

• When F is a set of lines, and S a set of points, V C(F ) = 3.
V C(F ) ≡ ∞.
• (a) can be shattered, but (b) cannot be. However, if at least one
Note that |F | can be infinite, while V C(H) finite!
subset of size 3 can be shattered, that’s fine.

• Set of size 4 cannot be shattered, for any combination of points

(think about an XOR-like situation).

57 58

Uses of VC Dimension

• Training error decreases monotinically as the VC dimension is

increased.

• Confidence interval increases monotinically as the VC dimension

is increased.

• Sample complexity (in PAC framework) increases as VC

dimension increases.

Import CSV and Analyze Participants Data
No ratings yet
Import CSV and Analyze Participants Data
29 pages
Class 11 Computer Science Notes of Magnet Brains
100% (1)
Class 11 Computer Science Notes of Magnet Brains
99 pages
Unit-5 Computer Vision (Ai)
No ratings yet
Unit-5 Computer Vision (Ai)
14 pages
MCS 226 em 2022
No ratings yet
MCS 226 em 2022
38 pages
Grade10 AI Practical Programs Questions 2025-26
No ratings yet
Grade10 AI Practical Programs Questions 2025-26
4 pages
10 Artificial Intelligence
No ratings yet
10 Artificial Intelligence
6 pages
Lecture - 6 3.3 How Organizations Affect Information Systems
No ratings yet
Lecture - 6 3.3 How Organizations Affect Information Systems
7 pages
Cbse Class12 Ai Mcqs Full
No ratings yet
Cbse Class12 Ai Mcqs Full
27 pages
Python
No ratings yet
Python
16 pages
Introduction to Neural Networks Concepts
100% (1)
Introduction to Neural Networks Concepts
25 pages
Maths For AI Class IX Notes 2
No ratings yet
Maths For AI Class IX Notes 2
2 pages
Chapter 2 Introduction To R and Python
No ratings yet
Chapter 2 Introduction To R and Python
35 pages
Advance Python Question Paper 2023
No ratings yet
Advance Python Question Paper 2023
2 pages
CNNs Explained for Tech Enthusiasts
No ratings yet
CNNs Explained for Tech Enthusiasts
24 pages
Chapter - RDBMS (Basic) Class 10
No ratings yet
Chapter - RDBMS (Basic) Class 10
6 pages
ch-2 Data Literacy Question and Answers
No ratings yet
ch-2 Data Literacy Question and Answers
2 pages
AS-societal impacts-NCERT Solutions
100% (2)
AS-societal impacts-NCERT Solutions
11 pages
Codathon Questions
No ratings yet
Codathon Questions
11 pages
Class 10 AI 417 Computer Vision
No ratings yet
Class 10 AI 417 Computer Vision
22 pages
Feature Scaling Techniques: Machine Learning
No ratings yet
Feature Scaling Techniques: Machine Learning
27 pages
MCN 401 MODULE 1 Industrial Safety Engineering
100% (1)
MCN 401 MODULE 1 Industrial Safety Engineering
6 pages
PEAS in AI
100% (1)
PEAS in AI
12 pages
Adt301 Foundations of Data Science, November 2024
100% (1)
Adt301 Foundations of Data Science, November 2024
2 pages
McCulloch-Pitts Neuron
No ratings yet
McCulloch-Pitts Neuron
14 pages
2.2 - DE UNIT 2-Nand-Nor Realization, K-MAP
No ratings yet
2.2 - DE UNIT 2-Nand-Nor Realization, K-MAP
31 pages
UNIT 4 Data Science Notes
100% (1)
UNIT 4 Data Science Notes
4 pages
What Is The Difference Between Canonical Form and Standard Form ? PDF
No ratings yet
What Is The Difference Between Canonical Form and Standard Form ? PDF
1 page
Class Xii Societial Impacts MCQ
No ratings yet
Class Xii Societial Impacts MCQ
5 pages
List & Dictionary Python
100% (1)
List & Dictionary Python
26 pages
Unit-2 Advance Concept of Model. Notes
No ratings yet
Unit-2 Advance Concept of Model. Notes
15 pages
Term 2 Informatics Practices Guide
No ratings yet
Term 2 Informatics Practices Guide
76 pages
Mongo DB Installation Guide
No ratings yet
Mongo DB Installation Guide
9 pages
AI Book 10 - Worksheets - Unit 2 - Answer Key
No ratings yet
AI Book 10 - Worksheets - Unit 2 - Answer Key
7 pages
Class 11 Computer Science ch-2 Notes With Questions An Pyq
No ratings yet
Class 11 Computer Science ch-2 Notes With Questions An Pyq
15 pages
ML in Materials Science: A Review
No ratings yet
ML in Materials Science: A Review
40 pages
Unit 2 Modelling Textbook Worksheet
No ratings yet
Unit 2 Modelling Textbook Worksheet
11 pages
Orange AI 843 12 QP
No ratings yet
Orange AI 843 12 QP
8 pages
Ch-2 Deep Learning - Creating Perceptron Q10
No ratings yet
Ch-2 Deep Learning - Creating Perceptron Q10
2 pages
Week 2 Python Programming
No ratings yet
Week 2 Python Programming
11 pages
2020 CS300 Lecture01 IntroductionToAI
100% (1)
2020 CS300 Lecture01 IntroductionToAI
46 pages
DL Unit5 RNN
No ratings yet
DL Unit5 RNN
107 pages
IT Skills Lab Manual by Subhash J R
No ratings yet
IT Skills Lab Manual by Subhash J R
62 pages
Python Classes and Objects
No ratings yet
Python Classes and Objects
3 pages
HTML Table & Iframe Tutorial
No ratings yet
HTML Table & Iframe Tutorial
9 pages
Python Libraries - Foundational Notes-CLASS 12
No ratings yet
Python Libraries - Foundational Notes-CLASS 12
4 pages
Python Libraries: NumPy, Pandas, Matplotlib
No ratings yet
Python Libraries: NumPy, Pandas, Matplotlib
68 pages
Unit1 - AI - PPT AIT
No ratings yet
Unit1 - AI - PPT AIT
212 pages
Program 5. Largest, Second Largest, Prime Numbers From A Random Generation AIM
No ratings yet
Program 5. Largest, Second Largest, Prime Numbers From A Random Generation AIM
2 pages
Split Up - AI - X - 2024-25 KVS RO Guwahati
No ratings yet
Split Up - AI - X - 2024-25 KVS RO Guwahati
4 pages
Overview of Semiconductor Memories
No ratings yet
Overview of Semiconductor Memories
29 pages
02 Ai Project Cycle Important Questions Answers 1
No ratings yet
02 Ai Project Cycle Important Questions Answers 1
33 pages
Lecture-7 Discrete Hopfield Network354dsc36
No ratings yet
Lecture-7 Discrete Hopfield Network354dsc36
31 pages
Programs Class 10th AI Practical
No ratings yet
Programs Class 10th AI Practical
40 pages
Database Management System
No ratings yet
Database Management System
35 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
60 pages
Data Literacy Questions All Types
No ratings yet
Data Literacy Questions All Types
2 pages
Unit 3 Introduction To Operating System
100% (1)
Unit 3 Introduction To Operating System
12 pages
Unit 3 Making Machines See
No ratings yet
Unit 3 Making Machines See
27 pages
Module-2 Material 1
No ratings yet
Module-2 Material 1
33 pages
Learning Processes: T.Sunil Chowdary
No ratings yet
Learning Processes: T.Sunil Chowdary
44 pages
Practice Final Exam1
No ratings yet
Practice Final Exam1
7 pages
STS ACT.1 Medrano, Jamaica, P.
No ratings yet
STS ACT.1 Medrano, Jamaica, P.
2 pages
Enhanced Number Theory IOQM RMO
No ratings yet
Enhanced Number Theory IOQM RMO
6 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
60 pages
Kumon 2014 Catalog Small
100% (1)
Kumon 2014 Catalog Small
20 pages
Matrix Computations in Numerical Analysis
No ratings yet
Matrix Computations in Numerical Analysis
63 pages
CTMC midtermF08secondSols
No ratings yet
CTMC midtermF08secondSols
8 pages
Algebra 2 and 3-Take Home Problems
No ratings yet
Algebra 2 and 3-Take Home Problems
8 pages
PRE CALCULUS Useful Formulas
No ratings yet
PRE CALCULUS Useful Formulas
4 pages
Lesson 1 - Intellectual Revolutions That Defined Society
75% (4)
Lesson 1 - Intellectual Revolutions That Defined Society
11 pages
Mathematics Exam: Integer & Multiple Choice Questions
No ratings yet
Mathematics Exam: Integer & Multiple Choice Questions
4 pages
Diff Eqn Differential Equations
No ratings yet
Diff Eqn Differential Equations
7 pages
Linear Transformations Guide
No ratings yet
Linear Transformations Guide
29 pages
KMS States in Quantum Mechanics
No ratings yet
KMS States in Quantum Mechanics
102 pages
Odd and Even Numbers
No ratings yet
Odd and Even Numbers
3 pages
Parachutist Mass Calculation Using MATLAB
No ratings yet
Parachutist Mass Calculation Using MATLAB
12 pages
FinMan Module 5 Time Value of Money
No ratings yet
FinMan Module 5 Time Value of Money
9 pages
Maths 00000
No ratings yet
Maths 00000
22 pages
Notable Contributions of Famous Mathematicians
No ratings yet
Notable Contributions of Famous Mathematicians
7 pages
3 Maths
100% (1)
3 Maths
25 pages
Writing Linear Equations from Points
100% (1)
Writing Linear Equations from Points
10 pages
Unit 3 Problem Set
No ratings yet
Unit 3 Problem Set
2 pages
Nikolai Lobachevsky
No ratings yet
Nikolai Lobachevsky
1 page
AP Calc AB/BC Review Sheet
100% (3)
AP Calc AB/BC Review Sheet
2 pages
Mathematical Methods of Physics I PH 8101
No ratings yet
Mathematical Methods of Physics I PH 8101
3 pages
Chapter1 3
No ratings yet
Chapter1 3
7 pages
BCA SEM I Syllabus NEP 20
No ratings yet
BCA SEM I Syllabus NEP 20
8 pages
Advanced CFD Course: ANSYS ICEM & FLUENT
No ratings yet
Advanced CFD Course: ANSYS ICEM & FLUENT
3 pages
End of Unit Test IPR H
No ratings yet
End of Unit Test IPR H
2 pages

Slide02 Haykin Chapter 2: Learning Processes

Uploaded by

Slide02 Haykin Chapter 2: Learning Processes

Uploaded by

Slide02 Introduction

• Learning: hard to define.

Memory-Based Learning Memory-Based Learning: Nearest Neighbor

• Two classes C1 , C2 . • Nearest neighbor x0N ∈ X of xtest :

Hebbian Synapses Classification of Synaptic Plasticity

• Time-dependent mechanism Hebbian: time-dependent, highly local, heavily interactive.

• Local mechanism Type Positively correlated Negatively correlated

• Interactive mechanism Hebbian Strengthen Weaken

Anti-Hebbian Weaken Strengthen

∆wkj = η(xj − x̄)(yk − ȳ)

• General form: ∆wkj (n) = F (yk (n), xj (n))

Competetive Learning Inputs and Weights Seen as Vectors in

– A limit imposed on the strength of each neuron.

Boltzmann Learning Boltzmann Machine

– Flip state with a probability (given temperature T ) • Two modes of operation

∆wkj = η(ρ+ − • learning with a teacher

• After training is completed, present new clamping input pattern

• Let it run clamped on the new input (subset of visible neurons),

Credit-Assignment Problem Learning with a Teacher

• How to assign credit or blame for overall outcome to individual

• In many cases, the outcomes depend on a sequence of actions.

– Assignment of credit for actions to internal decisions

• Actor-critic: cricit converts primary reinforce-

• Goal is to optimize the cumulative cost of actions.

• Relation to dynamic programming, in the context

• Two general types:

• Relevant issues: storage capacity vs. accuracy.

Function Approximation Function Apprix: System Identification and Inverse

• Given a set of labeled examples T = {(xi , di )}N

kF(x) − f (x)k < , for all x

• System identification: learn function of an unknown system.

• Inverse system modeling: learn inverse function:

Extract information about a quantity of interest from a set of noisy data.

• Filtering: estimate quantity at time n, based on measurements up

free-parameter adjustment in the controller.

• When multiplying matrices or matrix and a vector, partitioning them and

Example from http:

Associative Memory (cont’d) Associative Memory (cont’d)

• One strategy is to combine all W(k) into a single memory

This can be verified easily using partitioned matrices.

Correlation Matrix Memory: Recall Correlation Matrix Memory: Recall (cont’d)

• First, consider W(k) = yk xT

Statistical Nature of Learning Statistical Nature of Learning (cont’d)

Statistical Nature of Learning: Bias/Variance Dillema Bias/Variance Dillema (cont’d)

• The probably approximately correct (PAC) learning model is

Shattering a Set of Instances Three Instances Shattered

Definition: a set of instances S is shattered by a function

Each closed contour indicates one dichotomy. What kind of classifier

Definition: The Vapnik-Chervonenkis dimension,

arbitrarily large finite sets of X can be shattered by F , then

• Set of size 4 cannot be shattered, for any combination of points

• Training error decreases monotinically as the VC dimension is

• Confidence interval increases monotinically as the VC dimension

• Sample complexity (in PAC framework) increases as VC

You might also like

kF(x) − f (x)k < , for all x