33
CHAPTER 3
ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM
The objective of an ANFIS (Jang 1993) is to integrate the best
features of Fuzzy Systems and Neural Networks. ANFIS is one of the best
tradeoffs between neural and fuzzy systems, providing smoothness, due to the
Fuzzy Control (FC) interpolation and adaptability due to the Neural Network
Back propagation.
3.1 INTRODUCTION TO FUZZY LOGIC
Two distinct forms of problem knowledge exist for many problems:
Objective knowledge, which is used in all engineering problem formulations
(e.g. mathematical models), and Subjective knowledge, which represents
linguistic information that is usually impossible to quantify using traditional
mathematics (e.g. rules, expert information, design requirements) (Mendel
1995).
To solve most of the real world problems, both types of knowledge
must be required. The two forms of knowledge can be coordinated in a logical
way using fuzzy logic (FL). A fuzzy logic system is unique in that it is able to
simultaneously handle numerical data and linguistic knowledge (Ross 2005).
The founding father of entire field of FL is Dr. Lotfi Zadeh. In his paper,
Zadeh (1965) states, “As the complexity of a system increases, our ability to
make precise and yet significant statements about its behavior diminishes
until a threshold is reached beyond which precision and significance (or
relevance) become almost mutually exclusive characteristics” – or, “The
closer one looks at a real world problem, the fuzzier becomes its solution”.
34
3.2 FUZZY LOGIC SYSTEM (FLS)
In general, a FLS is a nonlinear mapping of an input data (feature)
vector into a scalar output data. The richness of the FL is that there are
enormous numbers of possibilities that leads to lots of different mappings.
This richness does require a careful understanding of FL and the elements that
comprise a FLS.
FLS contains four components: fuzzifier, rules, inference engine,
and defuzzifier. Once the rules have been established, a FLS can be viewed as
a mapping from inputs to outputs, and this mapping can be expressed
quantitatively as y = f(x). Figure 3.1, depicts a FLS that is widely used in
fuzzy logic controllers.
RULES
Crisp Inputs Crisp Outputs
FUZZIFIER DEFUZZIFIER
INFERENCE
Fuzzy Input Sets Fuzzy Output Sets
Figure 3.1 Schematic Diagram of a Fuzzy Inference System
Fuzzy inference is the process which maps the given input into the
output using fuzzy logic. Any fuzzy inference system can be simply
represented in four integrating blocks:
1) Fuzzification: The process of transforming any crisp value to
the corresponding linguistic variable (fuzzy value) based on
the appropriate membership function.
35
2) Knowledge base: Contains membership functions definitions
and the necessary IF-THEN rules.
3) Inference engine: This simulates human decision making
through using implication and aggregation processes.
4) Defuzzification: The process of transforming the fuzzy output
into a crisp numerical value.
Rules may be provided by experts or can be extracted from
numerical data. In either case, engineering rules are expressed as a collection
of IF – THEN statements, e.g. “IF u1 is very warm and u2 is quite low, THEN
turn v somewhat to right”. This rule reveals that it needs an understanding of:
1) Linguistic variables versus numerical values of a variable
(e.g. very warm versus 40o C);
2) Quantifying linguistic variables (e.g., u1 may have a finite
number of linguistic terms associated with it, ranging from
extremely hot to extremely cold), which is done using fuzzy
membership functions;
3) Logical connections for linguistic variables (e.g., “and”, “or”
etc.,); and
4) Implications, i.e., “IF A THEN B”. Additionally
understanding of combining more than one rule is required.
The fuzzifier maps crisp numbers into fuzzy sets. It is needed in
order to activate rules which are in terms of linguistic variables, which have
fuzzy sets associated with them. The inference engine of the FLS maps input
fuzzy sets into output fuzzy sets. It handles the way in which rules are
combined, just as humans use many different types of inferential procedures
36
to help us understand things or to make decisions. In many applications, crisp
number must be obtained at the output of a FLS. The defuzzifier maps output
sets into crisp numbers.
3.3 FUZZY SET THEORY
3.3.1 Crisp Sets
A crisp set A in a universe of discourse U (which provides the set of
allowable values for a variable) can be defined by listing all of its members or
by identifying the elements x A . One way to do the latter is to specify a
condition by which x A ; thus A can be defined as A = {x | x meets
some condition}. Alternatively, we can introduce a zero-one membership
function for A, denoted A(x), such that A A(x) = 1 if x A and A(x)
= 0 if x A . Subset A is mathematically equivalent to its membership
function A(x) in the sense that knowing A(x) is the same as knowing A itself.
3.3.2 Fuzzy Sets
A fuzzy set F defined on a universe of discourse U is characterized
by a membership function F (x) which takes on values in the intervals [0, 1].
A fuzzy set is a generalization of an ordinary subset (i.e. a crisp subset) whose
membership function only takes in two values, zero or unity. A membership
function provides a measure of the degree of similarity of an element in U to
the fuzzy subset. In FL an element can reside in more than one set to different
degrees of similarity. This cannot occur in crisp set theory. A fuzzy set F in U
may be represented as a set of ordered pairs of generic element x and its grade
of membership function: F {( x, F ( x)) | x U } . When U is continuous, F is
commonly written as F U F ( x) | x . In this equation the integral sign does
not denote integration; it denotes the collection of all points x U with
associated membership function F (x). When U is discrete, F is commonly
37
written as F U F ( x) | x . In this equation the summation sign denotes the
collection of all points x U with associated membership function F (x);
hence it denotes the set theoretical operation of union. The slash in these
expressions associates the elements in U with their membership grades, where
F (x) > 0.
3.3.3 Linguistic Variables
Linguistic variables are variable whose values are not numbers but
words or sentences in a natural or artificial language. In general, linguistic
variables are less specific than numerical ones. Let u denote the names of
linguistic variable, numerical values of a linguistic variable u are denoted x,
where x U . Sometimes x and u are interchangeably used. A linguistic
variable is usually decomposed into a set of terms, T(u), which covers its
universe of discourse.
3.3.4 Membership Functions
Membership functions, F (x) for the most part, associated with
terms that appear in the antecedents or consequents of rules, or in phrases.
The most commonly used shapes for membership functions are triangular,
trapezoidal, piecewise, linear and Gaussian. Usually, membership functions
are chosen by the user arbitrarily, based on the user’s experience; hence, the
membership function for two users could be quite different depending upon
their experiences, perspectives, cultures, etc. Figure 3.2 shows a sample
membership function for two sets.
Fuzzy logic was introduced as a superset of standard Boolean logic
by considering the fuzzy values that ranges from 0 to 1 instead of only
considering two values true or false and applying the same logic operators
such as AND, OR, NOT, etc. Thus the concept is extended from two valued
38
logic to multi-valued logic, which have many applications (Babulal 2006,
Babulal 2008, Behera 2009, Bonatto 1998, Boris 2006, Chilukuri 2004, Dash
2000, Elmitwally 2000, Farghal 2002, Grey 2005, Ibrahim 2001, Ibrahim
2002, Jain 2000, Ko 2004, Ko 2007, Kochukuttan 1997, Liang 2002, Masoum
2004, Morsi 2008, Morsi 2008a, Morsi 2008b, Morsi 2009, Nawi 2003, Saroj
2010, Zhang 2005, Zhu 2004).
H(h): most people H(h): Professional
basketball players
Short Medium Tall Short Medium Tall
Height Height
4 5 6 7 4 5 6 7
(a) (b)
Figure 3.2 Membership Function for T(Height) = {Short Men, Medium
Men, Tall Men). (a) Most People’s Membership Functions
and (b) Professional Basketball Player’s Membership
Function
The conditional statement commonly known as IF-THEN rules can
be easily formulated using fuzzy logic. Rules consist of two parts: the
antecedent or the IF part, and the consequent or the THEN part. The IF-
THEN rule can take the following form:
IF x is A and y is B THEN z is C
where, A, B and C are linguistic variables whose values are sentences in a
natural language.
39
The main disadvantage of fuzzy classifier is that system time
response slows down with the increase in number of rules. If the system does
not perform satisfactorily, then the rules are reset again to obtain efficient
results i.e. it is not adaptable according to the variation in data. The accuracy
of the system is dependent on the knowledge and experience of human
experts. The rules should be updated and weighting factors in the fuzzy sets
should be refined with time. Neural networks, genetic algorithms, swarm
optimization techniques, etc. can be used to for fine tuning of fuzzy logic
control systems.
3.4 NEURAL NETWORKS
A neural network is a powerful data modeling tool that is able to
capture and represent complex input/output relationships. The motivation for
the development of neural network technology stemmed from the desire to
develop an artificial system that could perform "intelligent" tasks similar to
those performed by the human brain. Neural networks resemble the human
brain in the following two ways:
1. A neural network acquires knowledge through learning.
2. A neural network's knowledge is stored within inter-neuron
connection strengths known as synaptic weights.
The true power and advantage of neural networks lies in their
ability to represent both linear and non-linear relationships and in their ability
to learn these relationships directly from the data being modeled. Traditional
linear models are simply inadequate when it comes to modeling data that
contains non-linear characteristics.
40
Figure 3.3 Multi-Layer Perceptron Neural Network
The most common neural network model is the multi-layer
perceptron (MLP). This type of neural network is known as a supervised
network because it requires a desired output in order to learn. The goal of this
type of network is to create a model that correctly maps the input to the output
using historical data so that the model can then be used to produce the output
when the desired output is unknown. A graphical representation of an MLP is
shown in Figure 3.3.
In a two hidden layer MLP, the inputs are fed into the input layer
and get multiplied by interconnection weights as they are passed from the
input layer to the first hidden layer. Within the first hidden layer, they get
summed up and then processed by a nonlinear function (usually the
hyperbolic tangent). As the processed data leaves the first hidden layer, again
it gets multiplied by interconnection weights, then summed and processed by
the second hidden layer. Finally the data is multiplied by interconnection
weights then processed one last time within the output layer to produce the
neural network output.
41
The MLP and many other neural networks learn using an algorithm
called back-propagation. With back-propagation, the input data is repeatedly
presented to the neural network. With each presentation the output of the
neural network is compared to the desired output and an error is computed.
This error is then fed back (back-propagated) to the neural network and used
to adjust the weights such that the error decreases with each iteration and the
neural model gets closer and closer to producing the desired output. This
process is known as "training".
Neural networks have been successfully applied to a broad
spectrum of data-intensive applications. Artificial Neural Networks (ANN) is
among the oldest Artificial Intelligence techniques; they have been around the
power research arena for quite some time. ANNs mimic the neural brain
structure of humans. This structure consists of simple arithmetic units
connected in highly complex layer architecture. ANNs are capable of
representing complex (nonlinear) functions, and they learn these functions
through example. Neural networks have been applied extensively in Power
Quality research. Major applications include
Identifying Power Quality events from poor power quality ones
Modeling the patterns of harmonic production from individual
fluorescent lighting systems
Estimating harmonic distortions and power quality in power
networks
Identifying and recognizing power quality events using the
wavelet transform in conjunction with neural networks
Identifying high-impedance fault, fault-like load, and normal
load current patterns
42
Analyzing harmonic distortion while avoiding the effects of
noise and sub-harmonics
Developing screening tools for the power system engineers, to
address power quality issues
3.5 ANFIS ARCHITECTURE
ANFIS is a hybrid system incorporating the learning abilities of
ANN and excellent knowledge representation and inference capabilities of
fuzzy logic (Jang 1993) that have the ability to self modify their membership
function to achieve a desired performance. An adaptive network, which
subsumes almost all kinds of neural network paradigms, can be adopted to
interpret the fuzzy inference system. ANFIS utilizes the hybrid-learning rule
and manage complex decision-making or diagnosis systems. ANFIS has been
proven to be an effective tool for tuning the membership functions of fuzzy
inference systems. Ibrahim (2001) proposed an ANFIS based system to learn
power quality signature waveform. It was shown that adaptive fuzzy systems
are very successful in learning power quality waveform. Rasli (2009), Rathina
(2009) and Rathina (2010) have proposed ANFIS based systems for power
quality assessment.
ANFIS is a simple data learning technique that uses a fuzzy
inference system model to transform a given input into a target output. This
prediction involves membership functions, fuzzy logic operators and if-then
rules. There are two types of fuzzy system, commonly known as the Mamdani
and Sugeno models. There are five main processing stages in the ANFIS
operation, including input fuzzification, application of fuzzy operators,
application method, output aggregation, and defuzzification.
43
ANFIS utilizes “Representation of prior knowledge into a set of
constraints (network topology) to reduce the optimization search space”, from
Fuzzy Systems and “adaptation of back propagation to structured network to
automate FC parametric tuning”, from Neural Networks, to improve
performance. The design objective of the fuzzy controller is to learn and
achieve good performance in the presence of disturbances and uncertainties.
The design of membership functions is done by the ANFIS batch learning
technique, which amounts to tune a FIS with back propagation algorithm
based on a collection of input–output data pairs.
Generally, ANFIS is a multilayer feed forward network in which
each node performs a particular function (node function) on incoming signals.
For simplicity, we consider two inputs 'x' and 'y' and one output 'z '. Suppose
that the rule base contains two fuzzy if-then rules of Takagi and Sugeno type
(Jang 1993):
Rule 1: IF x is A1 and y is B1 THEN f1=P1x+Q1y+R1
Rule 2: IF x is A2 and y is B2 THEN f2=P2x+Q2y+R2 (3.1)
Figure 3.4 ANFIS Architecture
44
The ANFIS architecture is a five layer feed forward network as
shown in Figure 3.4. An adaptive network (Jang 1993) is a multilayer feed
forward network in which each node performs a particular function (node
function) on incoming signals as well as a set of parameters pertaining to this
node. The formulas for the node functions may vary from node to node, and
the choice of each node function depends on the overall input-output function
which the adaptive network is required to carry out. Note that the links in an
adaptive network only indicate the flow direction of signals between nodes;
no weights are associated with the links.
To reflect different adaptive capabilities, we use both circle and
square nodes in an adaptive network. A square node (adaptive node) has
parameters while a circle node (fixed node) has none. The parameter set of an
adaptive network is the union of the parameter sets of each adaptive node. In
order to achieve a desired input-output mapping, these parameters are updated
according to given training data and a gradient-based learning procedure is
used.
Layer 1: Every node in this layer is a square node with a node
function (the membership value of the premise part)
Oi1 Ai ( x) (3.2)
Where, x is the input to the node i , and Ai is the linguistic label associated
with this node function.
Layer 2: Every node in this layer is a circle node labelled which
multiplies the incoming signals. Each node output represents the firing
strength of a rule.
Oi2 Ai ( x) Bi ( y) where i = 1:2 (3.3)
45
Layer 3: Every node in this layer is a circle node labeled N
(normalization). The ith node calculates the ratio of the ith rule’s firing strength
to the sum of all firing strengths.
Wi
Oi3 Wi , where i=1: 2 (3.4)
W1 W2
Layer 4: Every node in this layer is a square node with a node
function
Oi4 Wi fi Wi ( Pi x Qi y Ri ) , where i=1:2 (3.5)
Layer 5: The single node in this layer is a circle node labeled
that computes the overall output as the summation of all incoming signals
Oi5 = System output, where i = 1:2 (3.6)
Equation (3.6) represents the overall output of the ANFIS, which is
functionally equivalent to the fuzzy system in (Morsi 2008a).
3.6 ANFIS LEARNING ALGORITHM
In this subsection, the hybrid learning algorithm is explained
briefly. The ANFIS Learning Algorithm uses a two-pass learning cycle. In the
forward pass, S1 is unmodified and S2 is computed using a Least Squared
Error (LSE) algorithm (Off-line Learning). In the Backward pass, S2 is
unmodified and S1 is computed using a gradient descent algorithm (usually
Back Propagation).
46
Figure 3.5 ANFIS Structure
From the ANFIS structure shown in Figure 3.5, it has been
observed that when the values of the premise parameters are fixed, the overall
output can be expressed as a linear combination of the consequent parameters.
The hybrid learning algorithm is a combination of both back propagation and
the least square algorithms. Each epoch of the hybrid learning algorithm
consists of two passes, namely forward pass and backward pass. In the
forward pass of the hybrid learning algorithm, functional signals go forward
up to layer 4 and the consequent parameters are identified by the least squares
estimate. The back propagation is used to identify the nonlinear parameters
(premise parameters) and the least square is used for the linear parameters in
the consequent parts.