0% found this document useful (0 votes)

14 views40 pages

PNAL6 MLPTraining

Uploaded by

engineeringengtr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views40 pages

PNAL6 MLPTraining

Uploaded by

engineeringengtr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Perceptron Networks and Applications

M. Ali Akcayol
Gazi University
Department of Computer Engineering
Content
 Training
 Backpropagation algorithm
 Initialization of the weights
 Frequency of weight updates
 Choice of learning rate
 Generalizability
 Number of hidden layers and nodes
 Number of samples

2
Training
 By learning rule we mean a procedure (training algorithm) for
modifying the weights and biases of a network.
 The purpose of the learning rule is to train the network to
perform some task.
 There are many types of neural network learning rules.
 They fall into three broad categories:
 Supervised learning
 Unsupervised learning
 Reinforcement learning

3
Training
Supervised learning
 In supervised learning, the learning rule is provided with a set
of examples (the training set) of proper network behavior:

where, pq is an input to the network and tq is the

corresponding correct (target) output.
 As the inputs are applied to the network, the network outputs
are compared to the targets.
 The learning rule is then used to adjust the weights and biases
of the network in order to move the network outputs closer to
the targets.
 The perceptron learning rule falls in this supervised learning
category.
4
Training
Unsupervised learning
 In unsupervised learning, the weights and biases are modified
in response to network inputs only.
 There are no target outputs available.
 At first glance this might seem to be impractical.
 How can you train a network if you don’t know what it is
supposed to do?
 Most of these algorithms perform some kind of clustering
operation.
 They learn to categorize the input patterns into a finite
number of classes.
 This is especially useful in such applications as vector
quantization.

5
Training
Reinforcement learning
 Reinforcement learning is similar to supervised learning.
 There are no target values.
 Instead of being provided with the correct output for each
network input, the algorithm is only given a grade.
 The grade (or score) is a measure of the network performance
over some sequence of inputs.
 This type of learning is currently much less common than
supervised learning.
 Genetic algorithms, tabu search, simulated annealing
algorithms are in the reinforcement learning category.

6
Content
 Training
 Backpropagation algorithm
 Initialization of the weights
 Frequency of weight updates
 Choice of learning rate
 Generalizability
 Number of hidden layers and nodes
 Number of samples

7
Backpropagation algorithm
 The backpopagation algorithm is a generalization of LMS
algorithm.
 The backpropagation algorithm modifies the weights to
minimize SSE or MSE.
 Backprop uses supervised learning in which the inputs and the
corresponding outputs are used for training.
 Once the network is trained, the weights are frozen and the
network can be used to compute output values for new input
samples.
 The feedforward process involves presenting an input pattern
to input layer neurons that pass the input values onto the first
hidden layer.
 Each of the hidden layer nodes computes a weighted sum of
its inputs, passes the sum through its activation function and
presents the result to the output layer.
8
Backpropagation algorithm
Feedforward
 The ith input node holds a value
xp, i for the pth pattern.
 The net input for jth node
in the hidden layer is (includes
threshold xp, 0 = 1),

 The connection from ith input node to jth hidden layer node,
where (1, 0) represents layer 1 (hidden layer),
layer 0 (input layer).

 The output of the jth hidden layer node is

where S is a sigmoid function.

9
Backpropagation algorithm
Feedforward
 The net input to the kth output
layer node is,

 The connection from jth input

node to kth output layer node,
where (2, 1) represents
layer 2 (output layer),
layer 1 (hidden layer).

 The output of the kth output layer node is,

where S is a sigmoid function.

 The corresponding squared error is,

10
Backpropagation algorithm
Backpropagation
 For each connection from
the hidden layer to output layer,
we calculate

For each connection from

the input layer to hidden layer,
we calculate

 The following two equations describe the weight changes

11
Backpropagation algorithm
Backpropagation
 The chain rule is used to calculate
the weight changes

 Since

 Since ,

 Since ,
12
Backpropagation algorithm
Backpropagation
 The chain rule is used to calculate
the weight changes

13
Backpropagation algorithm
Backpropagation algorithm

14
Content
 Training
 Backpropagation algorithm
 Initialization of the weights
 Frequency of weight updates
 Choice of learning rate
 Generalizability
 Number of hidden layers and nodes
 Number of samples

15
Initialization of the weights
 Training is generally started with randomly chosen initial
weight values.
 Typically, the weights chosen are small (between -1.0 and 1.0,
-0.5 to +0.5).
 Larger weight values may drive the output nodes to
saturation.
 Initialization may bias the network to give much greater
importance to inputs those with higher value.
 In this case, the weights in the hidden layers can be taken the
same.

16
Initialization of the weights
 The following equation can be used to initialize the weights
between input layer and first hidden layer.

 The following equation can be used to initialize the weights

between hidden layers and output layer.

17
Content
 Training
 Backpropagation algorithm
 Initialization of the weights
 Frequency of weight updates
 Choice of learning rate
 Generalizability
 Number of hidden layers and nodes
 Number of samples

18
Frequency of weight updates
 There are two approaches to learning;
 In "per-pattern" learning: weights are changed after
every sample presentation.
 In "per-epoch" (or "batch-mode") learning: weights are
updated only after all samples are presented to the
network.
 An epoch consists of such a presentation of the entire set
of training samples.
 Calculated weight changes for each sample are
accumulated together into a single change to occur at the
end of each epoch.

19
Frequency of weight updates
 In each case, training is continued until a reasonably low
error is achieved, or until the maximum number of
iterations allocated for training is exceeded.
 For some applications, the input-output patterns are
presented on-line, hence batch-mode learning is not
possible.
 Per-pattern training is more expensive then per-epoch
training.
 For large applications, the amount of training time is
large, requiring several days even on the fastest
processors.

20
Frequency of weight updates
 The amount of training time can be reduced by exploiting
parallelism in per-epoch training.
 Per-pattern training is not parallelizable in this manner.
 One problem in per-pattern learning is that the network
may just learn to generate an output close to the desired
output for the current pattern, without actually learning
anything about the entire training set.

21
Content
 Training
 Backpropagation algorithm
 Initialization of the weights
 Frequency of weight updates
 Choice of learning rate
 Generalizability
 Number of hidden layers and nodes
 Number of samples

22
Choice of learning rate
 Weight vector changes in backpropagation are
proportional to the negative gradient of the error.
 The relative changes that must occur in different weights
when a training sample is presented.
 The exact magnitudes of the desired weight changes are
not able to be decided.
 The magnitude change depends on the appropriate
choice of the learning rate η.
 A large value of η will lead to rapid learning but the
weight may then oscillate.
 Low values imply slow learning.
 This is typical of all gradient descent methods.

23
Choice of learning rate
 The right value of η will depend on the application.
 Values between 0.1 and 0.9 have been used in many
applications.
 There have been several studies in the literature on the
choice of η.
 In some formulations, each weight in the network is
associated with its own learning rate.
 These weights are adapted separately from other
weights.
 Each connection has its own learning rates.

24
Choice of learning rate
 A simple heuristic is to begin with a large value for η in
the early iterations, and steadily decrease it.
 The changes of the weight vector must be small to
reduce the likelihood of divergence or weight oscillations.
 This is based on the expectation that larger changes in
error would occur earlier in the training.
 In this case, the error decreases more slowly in the later
stages.
 Another heuristic is to increase η at every iteration that
improves performance by some significant amount.
 Decrease η at every iteration that worsens performance
by some significant amount.

25
Choice of learning rate
 The second derivative of the error measure provides
information regarding the rate with which the first
derivative changes.
 If the second derivative is low in magnitude, it is safe to
assume a steady slope, and large steps can be taken.
 If the second derivative has high magnitude for a given
choice of w, the first derivative may be changing
significantly at w.
 Assumptions of steady slope are then incorrect, and a
smaller choice of η may be appropriate.
 The main difficulty with this method is that a large
amount of computation is required.

26
Content
 Training
 Backpropagation algorithm
 Initialization of the weights
 Frequency of weight updates
 Choice of learning rate
 Generalizability
 Number of hidden layers and nodes
 Number of samples

27
Generalizability
 For a large network, it is possible that repeated training
iterations successively improve performance of the
network on training data.
 But the resulting network may perform poorly on test
data.
 This phenomenon is called overtraining.
 One solution is to constantly monitor the performance of
the network on the test data.
 The weights should be adjusted only on the basis of the
training set, but the error should be monitored on the
test set.

28
Generalizability
 Training continues as long as the error on the test set
continues to decrease.
 Training process is terminated if the error on the test set
increases.
 Training may thus be terminated even if the network
performance on the training set continues to improve.

29
Generalizability
 To eliminate random fluctuations, performance over the
test set is monitored over several iterations.
 This method does not suggest using the test data for
training: weight changes are computed solely on the
basis of the network's performance on training data.
 With this stopping criterion, final weights do depend on
the test data in an indirect manner.
 Since the weights are not obtained from the current test
data, it is expected that the network will continue to
perform well on future test data.

30
Generalizability
 A network with a large number of nodes is capable of
memorizing the training set but may not generalize well.
 For this reason, networks of smaller sizes are preferred
over larger networks.
 Thus, overtraining can be avoided by using networks with
a small number of parameters.
 Injecting noise into the training set has been found to be
a useful technique.
 This is especially the case when the size of the training
set is small.
 Each training data point is modified to a point
where each is a small
randomly generated displacement.
31
Content
 Training
 Backpropagation algorithm
 Initialization of the weights
 Frequency of weight updates
 Choice of learning rate
 Generalizability
 Number of hidden layers and nodes
 Number of samples

32
Number of hidden layers and nodes
 Determining how many training samples are required for
successful learning solved by trial and error.
 And, how large a neural network is required for a specific task
is solved in practice by trial and error also.
 These problems are strictly dependent on the problem.
 With too few nodes, the network may not be powerful enough
for a given learning task.
 With a large number of nodes, computation is too expensive.
 The network tends to perform poorly on new test samples,
 The network is not considered to have accomplished learning
successfully.
 Neural learning is considered successful only if the system can
perform well on test data.
 Capabilities of a neural network are emphasized to generalize
from input training samples.
33
Number of hidden layers and nodes
 Adaptive algorithms have been devised to obtain
optimized number of neurons.
 Begin from a large network and repeatedly remove some
nodes and links until network performance degrades to
an unacceptable level.
 New nodes and weights can also be added, starting from
a very small network and until the performance is
satisfactory.
 The network is retrained at each intermediate state.

34
Number of hidden layers and nodes
 For classification tasks with d input nodes, first hidden
layer nodes often function as hyperplanes.
 That hyperplanes effectively partition d-dimensional
space into various regions.
 Each node in the next layer represents a cluster of points
that belong to the same class.
 All members in a set are assumed to belong to the same
class, and instances of different classes are assigned to
different sets.

35
Number of hidden layers and nodes
 Network with a single node using step function.

 One hidden layer network with convex region.

 Each node realizes one of the lines bounding the region.

36
Number of hidden layers and nodes
 Network with two hidden layers that realizes the union of
three convex regions.
 Each box represents one hidden layer network.

37
Content
 Training
 Backpropagation algorithm
 Initialization of the weights
 Frequency of weight updates
 Choice of learning rate
 Generalizability
 Number of hidden layers and nodes
 Number of samples

38
Number of samples
 How many samples are needed for good training?
 At least five to ten times as many training samples as the
number of weights to be trained.
 The equation is suggested on the basis of the desired
accuracy on the test set:

P denotes the number of patterns,

|W| denotes the number of weights to be trained,
a denotes the expected accuracy on the test set.

39
Number of samples
 Let a network contains 27 weights and the desired test
set accuracy is 95% (a = 0.95).
 The analysis suggests that the size of the training set
should be at least P > 27/0.05 = 540.
 The above is a necessary condition.
 A sufficient condition that ensures the desired
performance is:

where n is the number of nodes.

CI-6-8 Backpropagation (COMPLETE) Updated
No ratings yet
CI-6-8 Backpropagation (COMPLETE) Updated
76 pages
4 Perceptron 06 08 2025
No ratings yet
4 Perceptron 06 08 2025
32 pages
Ann 2 A
No ratings yet
Ann 2 A
20 pages
ANN Notes Updated
0% (1)
ANN Notes Updated
46 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
78 pages
Back Propogation Algorithm
No ratings yet
Back Propogation Algorithm
13 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
ANN Unit 3
No ratings yet
ANN Unit 3
100 pages
Understanding Backpropagation in Neural Networks
No ratings yet
Understanding Backpropagation in Neural Networks
32 pages
Lecture 9
No ratings yet
Lecture 9
78 pages
Back Propagation
0% (1)
Back Propagation
20 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
29 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
13 pages
ANN Research
No ratings yet
ANN Research
18 pages
Classification 1
No ratings yet
Classification 1
78 pages
Lecture 9 - Supervised Learning in ANN - (Part 2) New
No ratings yet
Lecture 9 - Supervised Learning in ANN - (Part 2) New
7 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
CL Back Propogation
No ratings yet
CL Back Propogation
11 pages
Data Mining, Advance Methods
No ratings yet
Data Mining, Advance Methods
83 pages
Classification by Back Propagation
No ratings yet
Classification by Back Propagation
20 pages
Backpropagation
No ratings yet
Backpropagation
12 pages
14 Backprop
No ratings yet
14 Backprop
34 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Limitations of Single Layer Neural Networks
No ratings yet
Limitations of Single Layer Neural Networks
43 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
27 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
7 pages
Pattern Classification 11. Backpropagation & Time-Series Forecasting
No ratings yet
Pattern Classification 11. Backpropagation & Time-Series Forecasting
78 pages
Neural Network
No ratings yet
Neural Network
14 pages
Classification by Back Propagation
No ratings yet
Classification by Back Propagation
20 pages
Backpropagation
No ratings yet
Backpropagation
4 pages
Lecture 17-Classification by Backpropagation-M
No ratings yet
Lecture 17-Classification by Backpropagation-M
25 pages
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND - 20250415 - 122012 - 0000
No ratings yet
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND - 20250415 - 122012 - 0000
18 pages
Neural Networks & Backpropagation
No ratings yet
Neural Networks & Backpropagation
14 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Lecture-17 Machine Learning With Python
No ratings yet
Lecture-17 Machine Learning With Python
37 pages
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
14 pages
Soft Computing 2
No ratings yet
Soft Computing 2
33 pages
Intro to Feedforward Neural Networks
No ratings yet
Intro to Feedforward Neural Networks
16 pages
Backpropagation
No ratings yet
Backpropagation
4 pages
Backpropagation in Neural Network Classification
100% (1)
Backpropagation in Neural Network Classification
5 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
Lecture (4) Backpropagation
No ratings yet
Lecture (4) Backpropagation
27 pages
Understanding Backpropagation in Neural Networks
No ratings yet
Understanding Backpropagation in Neural Networks
8 pages
Backpropagation Networks Presentation Updated
No ratings yet
Backpropagation Networks Presentation Updated
10 pages
Neural Networks: Multilayer Perceptrons
No ratings yet
Neural Networks: Multilayer Perceptrons
63 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Multi Layer Feed-Forward Network Learning
No ratings yet
Multi Layer Feed-Forward Network Learning
5 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
2012-1158. Backpropagation NN
No ratings yet
2012-1158. Backpropagation NN
56 pages
Machine Learning
No ratings yet
Machine Learning
68 pages
ML Exp 8
No ratings yet
ML Exp 8
2 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Back Propagation Algorithm Explained
No ratings yet
Back Propagation Algorithm Explained
12 pages
12 Design Patterns
No ratings yet
12 Design Patterns
133 pages
AAL2 Asymptotic
No ratings yet
AAL2 Asymptotic
16 pages
AAL5 Heap Quick Sort
No ratings yet
AAL5 Heap Quick Sort
17 pages
AAL4 Recurrences
No ratings yet
AAL4 Recurrences
11 pages
PNAL9 CNNs
No ratings yet
PNAL9 CNNs
61 pages
ProfEd323-Activity 1
No ratings yet
ProfEd323-Activity 1
2 pages
Bourdieu Pascalian Meditations
33% (3)
Bourdieu Pascalian Meditations
23 pages
Two-Storey Residential Elevation Plans
No ratings yet
Two-Storey Residential Elevation Plans
1 page
Aurobindo Ghose and His Educational Philosophy
No ratings yet
Aurobindo Ghose and His Educational Philosophy
15 pages
How Did The Process of Liberalization Initiated in India in The 1990s Promote Globalization
No ratings yet
How Did The Process of Liberalization Initiated in India in The 1990s Promote Globalization
2 pages
Applied Economics Q3 Module 16
No ratings yet
Applied Economics Q3 Module 16
12 pages
Introduction to AI: History & Applications
No ratings yet
Introduction to AI: History & Applications
12 pages
مفردات و قواعد اللغة الإنجليزية
No ratings yet
مفردات و قواعد اللغة الإنجليزية
4 pages
A Beautiful Day in The Neighborhood: Bulletin Board Kit
No ratings yet
A Beautiful Day in The Neighborhood: Bulletin Board Kit
17 pages
01 Question
No ratings yet
01 Question
3 pages
Effective Requirements Gathering
No ratings yet
Effective Requirements Gathering
4 pages
Singapore Math Challenge Grades 5 8 Frank Schaffer Publications PDF Download
No ratings yet
Singapore Math Challenge Grades 5 8 Frank Schaffer Publications PDF Download
88 pages
Gordon Allport: By: Ian Cachin
No ratings yet
Gordon Allport: By: Ian Cachin
47 pages
Identifying The Main Idea and Supporting Details 4
No ratings yet
Identifying The Main Idea and Supporting Details 4
5 pages
Understanding Intellectual Disabilities
100% (1)
Understanding Intellectual Disabilities
4 pages
Habitats: A 2nd Grade Lesson Plan
No ratings yet
Habitats: A 2nd Grade Lesson Plan
6 pages
CSC334 P&DC CDF V4.5
No ratings yet
CSC334 P&DC CDF V4.5
3 pages
Mathematics R-3
No ratings yet
Mathematics R-3
558 pages
Essayism Brian Dillon 2018 - Excerto
No ratings yet
Essayism Brian Dillon 2018 - Excerto
6 pages
Research Associate - Women and Jobs in India 2025
No ratings yet
Research Associate - Women and Jobs in India 2025
2 pages
Un Business Plan Esempio
No ratings yet
Un Business Plan Esempio
12 pages
Narrative How To Write
No ratings yet
Narrative How To Write
2 pages
Mastery Learning for Educators
No ratings yet
Mastery Learning for Educators
14 pages
Book Summary - I Hear You Summary Michael Sorensen - Read in 7 Minutes
No ratings yet
Book Summary - I Hear You Summary Michael Sorensen - Read in 7 Minutes
10 pages
Data Is A Collection of Facts
No ratings yet
Data Is A Collection of Facts
4 pages
CHS 2025 Summer Readingn
No ratings yet
CHS 2025 Summer Readingn
1 page
Uniwest Hub Brochure-354
No ratings yet
Uniwest Hub Brochure-354
20 pages
Group 1 Research Presentation
No ratings yet
Group 1 Research Presentation
15 pages
Anesthesiology Board Review Pearls of Wisdom 3rd Get It Now
No ratings yet
Anesthesiology Board Review Pearls of Wisdom 3rd Get It Now
306 pages
Goldman Sachs Revised List
No ratings yet
Goldman Sachs Revised List
6 pages

PNAL6 MLPTraining

Uploaded by

PNAL6 MLPTraining

Uploaded by

Perceptron Networks and Applications

where, pq is an input to the network and tq is the

 The output of the jth hidden layer node is

where S is a sigmoid function.

 The connection from jth input

 The output of the kth output layer node is,

 The corresponding squared error is,

For each connection from

 The following two equations describe the weight changes

 The following equation can be used to initialize the weights

 One hidden layer network with convex region.

P denotes the number of patterns,

where n is the number of nodes.

You might also like