0% found this document useful (0 votes)
19 views10 pages

Mod 6 DSC 204

Module 6 focuses on neural networks, detailing their structure, training, and implementation for classification tasks. It covers key concepts such as activation functions, weight updates, and practical implementation using Python with TensorFlow. The module concludes with a reflection on the advantages of neural networks and a preview of the next module on ensemble learning methods.

Uploaded by

abdullahikulei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views10 pages

Mod 6 DSC 204

Module 6 focuses on neural networks, detailing their structure, training, and implementation for classification tasks. It covers key concepts such as activation functions, weight updates, and practical implementation using Python with TensorFlow. The module concludes with a reflection on the advantages of neural networks and a preview of the next module on ensemble learning methods.

Uploaded by

abdullahikulei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 6: Neural networks

Instructional Hours: 6

Module Overview
In this module, you will learn about neural networks. Neural networks are a
machine learning mechanisms that seek to emulate the working of the nervous
system. You will learn about the structure of neural networks, the use of neural
networks for training and classification, the training of neural networks and
practically implement neural networks for a learning task.

Introduction to Neural Networks


Introduction
Gurney(2004) describes neural networks as an assembly of interconnected units
known as nodes that functionally emulate the human brain. The connection
between any two units has a value known as a weight which determines how
strongly the two units are connected. Figure 6.1 shows the structure of a very
simple neural network.

1
Figure 6.1: Neural network connections
Each input is multiplied by the weight of the connection through which it passes.
The weighted inputs are then summed. The output node transforms the sum
in some way to generate the output y.
Neural networks have been used in many different types of classification and
regression tasks such as hand written character recognition, speech recognition,
face recognition and image recognition.
The following are characteristics of problems for which neural networks are
well suited:
i. Many training examples are available

2
ii. The target function can be a real value, a discrete value, vector of real values
or a vector of discrete values.
iii. The training examples may contain errors
iv. Long training times are acceptable
v. The ability of humans to understand the learned target function is not very
necessary

3
Neural Network Structure
Figure 6.3 illustrates the structure of a typical neural network.

Figure 6.3: Two layer feed forward Neural network


Given Figure 6.3, the first layer of nodes is known as input layer. It does not
modify the inputs, but just distributes the attribute values as they are to the
next layer nodes. The first input node is known as the bias and its values is
always 1.
The next layer is known as the hidden layer. Each hidden layer node gets a

4
sum of its inputs where each input is multiplied by its corresponding weight as
shown in Equation 6.1. In practice, there can be more than one hidden layer.
n
X
sumhi = whij Ij [ Equation − −6.1]
j=0

The actual output of the hidden node is g (sumhi ) where g is a non-linear


differentiable function known as the activation function

Hi = g ( sum hi ) [ Equation − 6.2]


The third layer is known as the output layer. Each node in this layer gets a
sum of the hidden layer node outputs multiplied by their connecting weights. It
then applies an activation function to generate its output as shown in equations
6.3 and 6.4.
k
X
sumoi = woij Hj [ Equation − −6.3]
j=0

Oi = g ( sum oi ) [ Equation − 6.4]


Oi is the final output of output node i .

Activation Functions
As shown in the equation, the activation functions transform the sum at a node
to generate a non-linear output. There are many activation functions. We will
describe three of them next.
i. Sigmoid Function
This function takes the form shown in equation 6.5.
1
g(y) = [ Equation − −6.5]
1 + e−y
The output is between 0 and 1 no matter what the input is. It is therefore
also referred to as the squashing function. It is also referred to as the logistic
function. Sometimes, the input to the function ( y ) is multiplied by a small
constant known as the gain so that the function appears as shown below:
1
g(σy) = [ Equation − −6.6]
1 + e−σy
ii. Hyperbolic Tangent Function
Another activation function is the hyperbolic tangent function which is
shown in equation 6.7. It’s outputs lie in the range -1 to 1 .

g(y) = tanh(σy) [ Equation − −6.7]


iii. Rectified Linear Unit (ReLU)

5
Given an input x , this function outputs x when x is greater than zero
otherwise it output zero. This means it outputs zero for negative numbers. As
an example ReLU(5) = 5 and RelU(−5) = 0. The function appears as shown
in Equation 6.8.

g(x) = max(0, x) [ Equation − −6.8]


iv. Softmax function
Given an input x , the output of the softmax function is given by equation
6.9
exi
g (xi ) = Pm [ Equation − −6.9]
j=1 exj
The softmax function is normally used only in the output layer and generates
probabilities for each possible output class. So the class with the greatest output
is selected as the output of the neural network.

Weight Updates
When a neural network is initially created, its weights are initialized with ran-
dom values. At that point, the performance of the neural network in the clas-
sification or regression task is equivalent to a random guess. To improve the
performance of the neural network, you update its weights during the train-
ing phase. The updates on the weights are based on the error obtained in the
classification or regression task for a given input.
Each training example xi has the format, xi1 , xi1 , . . . , xim , Ti where xi1 , xi1 , . . . , xim
are the attribute values of the example (inputs) while Ti is the expected output.
The expected output is also known as the target. The actual output of the neu-
ral network node is represented as Oi . Where i is the output node index. The
error of the specific output node of the network for a given example is therefore
Ti − Oi . Since we may have more than one output node, you get and sum the
errors of all the output nodes using the equation in formula 6.10 where k is the
number ofP output nodes.
k 2
Enet = 12 i=1 (Oi − Ti ) [ Equation -6.10]
The weights between the nodes are then adjusted to reduce training error. Algo-
rithms such as the back propagation algorithm, Adaptive gradient (AdaGrad),
Root Mean Square Propagation (RMSProp) and Adaptive Moment Estimation
(Adam) are used to update the weights.
Example 1. Given the network below, what is the output of the output node?
Assume that the logistic function is used as the activation function.

6
Figure 6.4: Neural network with inputs and weights

Inputs 1 0.5 2

Activation function/Hidden
Weights Sum
Node outputs
H1 0.5 0.4 0.7
Inputs × Weights 0.5 0.2 1.4 2.1

7
H2 0.5 0.4 0.7
Inputs × Weights 0.5 0.2 1.4 2.1

O 0.3 0.3
0.67347
Hidden outputs * 0.33673
0.33673 4 1.122456428
Output Weights 7

Reading Material
To learn more the back propagation algorithm, read the following material

1. Mitchell T.(1997). Machine [Link]-Hill, Page 95-105


[Link] sa=t&source=web&rct=j&opi=89978449&
url=[Link] pdf&ved=2ahUKEwjqvt
-gqaLAxVcTqQEHUfBPCsQFnoECBKQAQ&usg= AOvVawOSalCe 2Hb37BvRp3SKwL8

Implementation of Neural Networks using Python


Example 2. You can implement neural networks using python as shown below.

import tensorflow as tf
from tensorflow import keras
from [Link] import layers
import numpy as np
from [Link] import load_iris
from sklearn.model_selection import train_test_split
from [Link] import StandardScaler
from sklearn import metrics
#You will use the iris dataset that comes with the sklearn
package
data=load_iris()

#Get the target values


y=[Link]
#get the names of the target classes
labels=data.target_names
#get the data (input values)
x=[Link]
#scale the data using the standard scaler
scaler=StandardScaler()
x=scaler.fit_transform(x)
# split the data - First into training and test set
x_train,x_test, y_train,y_test= train_test_split(x,y,

8
random_state=42, test_size=0.2, shuffle=True)
y_test1=y_test
#Then you split the training data into a training and a
validation set
x_train,x_validation, y_train,y_validation=
train_test_split(x_train,y_train, random_state=42,
test_size=0.3, shuffle=True)
#One hot encoding of the data
oneHotEncoding=[Link](num_tokens=3,
output_mode="one_hot")
y_train=oneHotEncoding(y_train)
y_validation=oneHotEncoding(y_validation)
y_test=oneHotEncoding(y_test)
#create the neural network
inputLayer=[Link](shape=(4,))
model=[Link]()
[Link](inputLayer)
[Link].plot_model(model, show_shapes=True)
[Link]([Link](60,activation=’relu’, name=’layer1’)) #
first hidden layer
[Link]([Link](3,activation=’softmax’, name=’layer3’))
#output hidden layer
[Link].plot_model(model, show_shapes=True)
[Link]()

#compile the model


[Link](optimizer=’adam’, loss=’categorical_crossentropy’,
metrics=[’categorical_accuracy’])
#create stopping criteria
earlyStop =
[Link](monitor=’loss’,patience=3,
restore_best_weights=True)
#Train the model
history=[Link](x_train, y_train, batch_size=64, epochs=150,
validation_data=(x_validation, y_validation), verbose=1,
callbacks=[earlyStop])
#plot the training trend
from matplotlib import pyplot as plt
[Link]([Link][’val_categorical_accuracy’])
#Evaluate the model
test_loss, test_acc = [Link](x_test, y_test)
print(test_loss)
print(test_acc)
#print the confusion matrix
predicted =[Link](x_test)
pred=[Link](predicted, axis=1)

9
cm=metrics.confusion_matrix(y_test1,pred)
[Link][’[Link]’]=(5,5)
displayCM=[Link](cm,display_labels=label
s)
[Link]()
[Link][’[Link]’]=[Link][’[Link]
ze’]

Reference List and Further Reading


• Mitchell T.(1997). Machine Learning, McGrall-Hill

• Gurney K. (2204). An Introduction to Neural Networks, Taylor & Francis


e-Library

• Venkataramanan Krishnan (2020) IRIS Data-Tensorflow Neural Network,


URL:
[Link]
Last accessed 20th July 2024

• To learn more about Neural Networks implementation, review the fol-


lowing Kaggle notebook: Venkataramanan Krishnan (2020) IRIS Data-
Tensorflow Neural Network, URL:
[Link]
, Last accessed 20 th July 2024

Take Home Message/Self Reflection


• Why would neural networks be your preferred learning algorithm?

WHAT’S NEXT?
• In the next module, you will learn about ensemble learning methods which
are known to improve on the performance of the methods we have encoun-
tered so far.

10

You might also like