Module 5

Uploaded by

Advith A J

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

30 views11 pages

Module 5

Uploaded by

Advith A J

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 11

Bioinspired algorithms and applications

Algorithm

• An algorithm is a set of instructions that needs to be followed in a problem- solving operation or

calculation. An algorithm takes input and produces desired outputs.
Genetic algorithm

• Genetic algorithm is a probabilistic search algorithm based on the principles Genetics and evolution.
• Charles Darwin gave theory of natural selection which is the driving force of evolution.
• Theory of natural selection proposes that those features/characteristics that ensure better
survivability of the organism are preserved and multiplied over generations. Natural selection is
brought about by variation. Variation in organisms may be brought about by
1. Crossing over: Is a process that occurs during meiosis in which a chromosome segment is
exchanged between chromosome pair or the homologous chromosome.

2. Mutation: Mutation is any sudden change in the genetic information of an organism which
can be transferred from generation to generation. Mutation may be
a) Deletion- A part of genetic material is removed
b) Duplication- A part of genetic material is repeated
c) Insertion- A new base/sequence is added
d) Inversion- Instead of purine a pyrimidine is added or vice versa
e) Translocation- A part of chromosome is transported to another chromosome.
• Genetic algorithms, also termed as global search heuristics are inspired by evolutionary
biology such as inheritance mutation, selection and cross over. These algorithms provide a
technique for program to automatically improve their parameters.
• Genetic algorithm was introduced by John Holland in the early 1970s
• Genetic algorithms are considered as a search process used in computing to find exact or a
approximate solution for optimization and search problems.
Terminologies in Genetic algorithm
a) Population: Population is the subset of all possible or probable solutions, which can solve the given
problem.

b) Fitness Function: The fitness function is used to determine the individual's fitness level in the
population. It means the ability of an individual to compete with other individuals. In every round,
individuals are evaluated based on their fitness function.
c) Genetic Operators: In a genetic algorithm, the best individual mate to regenerate offspring better
than parents. Here genetic operators play a role in changing the genetic composition of the next
generation.
d) Selection: After calculating the fitness of every existent in the population, a selection process is used
to determine which of the individualities in the population will get to reproduce and produce the
seed that will form the coming generation.

Components of genetic algorithm

• Coded information of the results to be optimized known as the genotype (Uncoded form of this
information is the Phenotype).
• Fitness function – to evaluate the fitness of results to be optimized.
• Selection function – To select the results with best fitness. The selected results only will be carried to
next generation.
• Crossover and mutation function – To create variation in the coded information.

Overview of genetic algorithm

• In GAs, we have a pool or a population of possible solutions to the given problem.
• These solutions then undergo recombination and mutation (like in natural genetics), producing new
children, and the process is repeated over various generations.
• Each individual (or candidate solution) is assigned a fitness value (based on its objective function
value) and the fitter individuals are given a higher chance to mate and yield more “fitter” individuals.
This is in line with the Darwinian Theory of “Survival of the Fittest”.
Working of genetic algorithm
1. Initialization: First step in a genetic algorithm is to generate a set of individuals (known as population) on
which operations can be carried out. Usually, the information is coded in the form of binary strings.
2. Fitness Assignment: Based on priorly decided function a particular fitness value is assigned to each
individual. This score further determines the probability of being selected for reproduction. The high the
fitness score, the more chances of getting selected for reproduction.
3. Selection: The selection phase involves the selection of individuals for the reproduction of offspring. All
the selected individuals are then arranged in a pair of two to increase reproduction. Then these individuals
transfer their information to the next generation.
4. Reproduction: Is carried out by the use of crossover and mutation function.
a) Crossover: The crossover plays a most significant role in the reproduction phase of the genetic
algorithm. In this process, a crossover point is selected at random within the binary string. Then
the crossover operator swaps genetic information of two parents from the current generation to
produce a new individual representing the offspring.
b) Mutation: The mutation operator inserts random genes in the offspring (new child) to maintain
the diversity in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances diversification.
Mutation may be random bit flip or random bit swap etc.

• In the figure a population of four individuals is shown.

• Each is assigned a fitness value by the function F(x, y) = yx 2 – x4.
• On the basis of the fitness, the selection phase assigns the first individual (0000001101) one copy,
the second (0101010010) two, the third (1111111000) one copy, and the fourth (1010100111) zero
copies.
• After selection, the genetic operators are applied probabilistically; the first individual has its first bit
mutated from a 0 to a 1, and crossover combines the last two individuals into two new ones. The
resulting population is shown in the last box
5. Termination: Above steps will be repeated until termination conditions are reached.
Applications of Genetic algorithm

• Sensor Based Robot Path Planning: After finding obstacles, the planning module generates a short and
safe path to goal with obstacle avoidance
• Image Processing to produce better images from group of images with disparity or apparent motion.
• Gaming: Genetic algorithm can come up with best strategies with a given number of resources. For
example, in a game a player would go through a level of the game and at the end, the program would pick
the non-player character that fared the best against the player, and use those in the next generation.
Slowly, after a lot of playing, some reasonable characteristics would be evolved.
• Real Time Systems: Assigning task priorities with time constraints
• Dynamic scheduling: Scheduling is to allocate a set of machines to perform a set of jobs within a certain
time period, and the goal of scheduling is to find an appropriate allocation – schedule – which maximizes
certain performance measure. For the implementation issues, the solutions are encoded by natural
representation, and the order crossover operator is used. They used the inversion mechanism as mutation
operator.
• Automated design of automobiles considering various constraints such as fuel consumption
• Genetic algorithms can be used to derive optimal schedules considering the courses, number of students,
and lecture rooms that the university has.
• Genetic algorithms are used here to derive the best combinations of product specifications and attributes
that will attract the most customers.
Gene expression modelling

• Gene expression defines the activity and behavior of a gene inside a cell.
• Alteration of gene expression by regulators (promoters or inhibitors) may have different consequences
on cells, including promoting the development of specific diseases such as cancer.
• All genes are not expressed at the same time. The study of the regulation of gene expression is
fundamental for the understanding of the molecular basis in health and disease, and for the identification
of therapeutic targets.
• In order to understand how genetic material controls cell growth and development we need to know not
only about the protein coding sequences but also about the non coding sequences which regulate gene
expression.
• This can be done by examining the elementary constituents (such as genes and factors) individually and
then how these are connected. The myriad components of a system and their interactions are best
characterized as networks and they are mainly represented as graphs where thousands of nodes are
connected with thousands of vertices.
• a variety of mathematical methods for describing regulatory networks have been proposed.
• a directed graph, Bayesian networks, Boolean network, differential equations etc. can be used to predict
the gene expression.
1. Directed graphs
• Is a static model. Here each node represents a gene and the arrows represent their
relationship.
• A directed graph may be annotated for better understanding.
• Directed graphs are mostly suitable for the representation of schemas describing biological
pathways or procedures which show the sequential interaction of elements. These are mainly
metabolic, signal transduction or regulatory networks.
2. Bayesian model
• Bayesian statistics makes use of probability to determine possibilities of various events
occurring.
• Bayesian network is a probabilistic graphical model which uses interpretable joint
probabilities.
• Bayesian networks have been used to model gene expression data and gene regulatory
networks
• A BN consists of a directed graph and a set of corresponding conditional probability density
functions.
• defined by two sets: the set of nodes (vertices), which represent random variables, and the
set of directed edges.
• The directed edges in a BN structure model the dependencies between the variables
(nodes) in the form of joint/ conditional probabilities

3. Boolean network
• Each node, has associated to it a function, with inputs the states of the nodes connected to.
The function can take only 2 values- 0/1, yes/No, True/False.
• A Boolean network consists of a set of Boolean variables whose state is determined by other
variables in the network.
• A Boolean network can be considered as a directed graph where the nodes represent the
expression status of genes and directed edges represent the actions of genes on other genes
(positive effect- 1/yes/True or negative effect – 0/No/False).
• The Boolean function associated to a variable represents the influence of other genes on the
activation state of the gene described by such a variable.
• The image below represents a regulatory network of three genes A, B and X. Effect of level of
expression of A and B and how it affects variable X. For example, when A is active (1) and
when B is inactive (0) the gene x is inactive.
4. Differential equation
• The two main functions that make up the process of gene expression are transcription and
translation.
• Transcription and translation are controlled by the number of mRNA and protein in the cell.
Gene expression can be represented as a system of first order differential equations for the
rate of change of mRNA and proteins. These equations involve transcription, translation,
degradation

Methodology of gene expression modelling

• Experiment is carried out gene regulatory system and data is obtained

• Mathematical models are made based on this data.
• Based on the previous biological knowledge mathematical models can also be built
• These different models are then compared and if they are not in agreement with the
experimental data, always the experimental results are considered superior to mathematical
models and the models are corrected based on the experimental data.
• The corrected equations may be used to predict gene expression in future
• Quantitative analysis done with set genes relevant for a phenotype (e.g., a tumor) of interest and
investigates their regulatory systems.
• objective is to identify the most relevant features explaining the regulation of each target gene.
• Data regarding function of different genes and the transcription factors involved may be obtained
from public repositories such as GENCODE and ENCODE.
Parallel genetic algorithm
• Parallel genetic algorithm is such an algorithm that uses multiple genetic algorithms to solve a single
task.
• All these algorithms try to solve the same task and after they’ve completed their job, the best
individual of every algorithm is selected, then the best of them is selected, and this is the solution to
a problem
• These genetic algorithms do not depend on each other, as a result, they can run in parallel
• Each algorithm has its own set of individuals, as a result these individuals may differ from individuals
of another algorithm, because they have different mutation/crossover history.
• Moreover, some creatures, with the highest fitting, are allowed to ‘migrate’ from one algorithm to
another
History of parallel computing
• Genetic algorithms and the idea of parallelism in computing was given by John Holland.
• “Genetic Programming” (GP), name coined by David Goldberg, a PhD student of John Holland
• In 1988 John Koza (also a PhD student of John Holland) patented his invention of a GA for program
evolution
• An early study of how to parallelize genetic algorithms was conducted by Bethke. His analysis showed
that global parallel GAs may have an efficiency close to 100%

• Another early study on how to map genetic algorithms to existing computer architectures was made
by Grefenstette. He proposed four prototypes for parallel GAs; the first three are variations of the
master-slave scheme, and the fourth is a multiple-population GA.
1. Master processer carries out all functions on population, Individuals are sent to slave processors
for evaluation.
2. Similar to first but there is no clear distinction between generations. when any slave processor
finishes evaluating an individual, it returns it to the master and receives another individual
3. In the third prototype the population is stored in shared memory, which can be accessed by the
slaves that evaluate the individuals and apply genetic operators independently of each other
4. Grefenstette’s fourth prototype is a multiple-population GA where the best individuals of each
processor are broadcast every generation to all the others
Methodology of parallel genetic algorithm
Step 0: Any problem that needs to be optimized must be represented in a way that can be understood by
the computer.
Step 1: Individuals of initial population must be defined
Step 2: Each individual must be assessed for fitness using some defined fitness function. Unfit individuals
from the population will be deleted.
Step 3: Among the fit individuals partners are selected and allowed to mate i.e., crossing over and mutation
is carried out.
Step 4: Offsprings are generated and these form the next generation
Step 5: Offsprings are assessed for fitness.
Step 6: If the termination criteria is not reached same process s repeated until fitness matches termination
criteria.
Real life applications of parallel genetic algorithm
• Diverse fields finally rely on optimizing a complex system: from building new drugs (optimize the
energy estate of chemicals, like their angles), deploying better 5Gs networks(optimized covered area,
the packet protocol parameters, and the number of potential users), investing in the stock market ]
(maximize revenues of a complex portfolio), or helping citizens to move into a city (minimize
pollution, fuel consumption, travel time, maximize comfort and drivers’ experience), just to name a
few. Filling your fridge with healthy food at low cost, combat a fire, program your computer with a
minimum number of bugs, and thousands of other applications are now possible by using Gas
• A transit route network design (TRND) problem for urban bus operation involves the determination
of a set of transit routes and the associated frequencies that achieve the desired objective. This can
be formulated as an optimization problem of minimizing the total system cost. GAs/Parallel GAs are
more suited to solve combinatorial optimisations.
• In machine learning
• Artificial neural networks
• Genetic algorithms find application in many designing procedures of mechanical components. For
instance, consider the following genetic algorithm example where the aircraft wing design is a kind
of designing problem that takes multiple disciplines into consideration. It requires improvement in
the ratio of left to drag for a complex wing.
• Optimise parameters to minimize manufacturing cost
• The areas of predictive analysis include protein prediction, RNA structure prediction, operon
prediction, and others.
• Other processes such as protein folding, gene expression profiling analysis, bioinformatics multiple
sequence alignment, or some of the process alignment that uses genetic optimisation.
DNA computing algorithms

• A computation may be thought of as the execution of an algorithm, which itself may be defined as a
step-by-step list of well-defined instructions that takes some input, processes it, and produces a
result.
• In DNA computing, information is represented using the four-character genetic alphabet (A
[adenine], G [guanine], C [cytosine], and T [thymine]), rather than the binary alphabet (1 and 0) used
by traditional computers.
• An algorithm’s input is therefore represented (in the simplest case) by DNA molecules with specific
sequences, the instructions are carried out by laboratory operations on the molecules (such as
sorting them according to length or chopping strands containing a certain subsequence), and the
result is defined as some property of the final set of molecules (such as the presence or absence of a
specific sequence).
Operations on DNA
The types of operations available are a result of the capability of molecular biology rather than the wishes
of algorithm designers. Also note that these algorithms are performed in constant time on test tubes.
1. Merge - This is the simple operation of combining the contents of two test tubes in a third tube.
2. Anneal - This is the process by which complementary strands of DNA are paired to form the famous
double-helix structure of Watson and Crick. Annealing is achieved by cooling a DNA solution, which
encourages pairing. Adleman uses this in step 1 to generate all legal paths through the graph.
3. Melt - Melting is the inverse operation of annealing. By heating the contents of a tube, double-
stranded DNA sequences are denatured, or separated into its two single-stranded parts.
4. Separation by length - The contents of a test tube can be separated by increasing length. This is
achieved by gel electrophoresis, whereby longer strands travel more slowly through the gel. This
operation was used by Adleman in step 3 of his solution to HP.
5. Separation by sequence - This operation allows one to remove from solution all DNA strands that
contain a desired sequence. This is performed by generating the strand whose complement is the
desired sequence. This newly generated strand is attached to a magnetic substance which is used to
extract the sequences after annealing. This operation is the crux of Adleman's step 4.
6. Copying/Amplification - Copies are made of DNA strands in a test tube. The strands to be copied must
have known sequences at both the beginning and end in order for this operation to be performed.
7. Append - This process makes a DNA strand longer by adding a character or strand to the end of each
sequence.
8. Detect - It is also possible to analyze a test tube in order to determine whether or not it contains at
least one strand of DNA.
Mechanism DNA computing algorithm

• Scheduling a problem: A problem to be solved is designed

• Construct weighted graph: A directed graph containing vertices and edges is designed based on the
problem.
• A unique DNA sequence is synthesised to represent vertices as well as edges of the graph
• Laboratory biochemical operations are carried out on the DNA sequences (Merge, Anneal, Melt,
Sequencing etc.)
• The product is PCR amplified – i.e., amount of DNA strand available is increased exponentially by
artificial synthesis.
• Strands are separated according to length.
• Sequences obtained are used of computation.

Example for DNA computing

• To encode, for example, English text using DNA, one can choose an encoding scheme mapping the
Latin alphabet onto strings over {A, C, G, T}, and proceed to synthesize the obtained information-
encoding strings as DNA single strands.
• English alphabet as A → ACA, B → ACCA, C → ACCCA, D → AC 4A, etc., wherein the ith letter of the
alphabet is represented by AC i A, i.e., a single strand of DNA consisting of i repetitions of C flanked
by one A at the beginning and another A at the end.
• The encoding of text “To be or not to be ” will be
Honey bee based routing algorithms
• Honey bees live in well-organized colonies.
• Honey bees are generally active during spring, when they go in search of plants from which to collect
pollen and nectar
• With a queen, thousands of workers and a few male drones. Workers make these nests from wax,
which they secrete from their abdominal glands. Within each cell, young workers place pollen and
nectar as food for developing larvae. Male drones are only required for reproduction.
• honey bees forage for food in groups and this foraging behaviour is inspiration to routing algorithms.
• This behaviour is the link between the honey bee colony and the ambient environment.
• The forager bees can be classified into two categories; scout bees which search for the best food
resource and the onlooker bees which wait in the beehive until the scout bees return and give them
information about the food source by dancing.
• A routing algorithm is a procedure that lays down the route or path to transfer data packets from
source to the destination.
• The Artificial Bee Colony (ABC) algorithm was introduced by Karaboga base on honey bee foraging
behaviour.
• The ABC algorithm works with three kinds of agents (bees):
• (i)Employed bees: bees already associated with a food source, which represents a solution
• (ii)Onlooker bees: bees that will observe the dance of employed bees and follow them to the
neighborhood of their food sources
• (iii)Scout bees: bees that wander around the search space to find new food sources; this can
be a random or guided process.
Steps in an ABC algorithm
1. Initialisation: The initial food sources are discovered by scout bees. The number of food sources is
the same as the number of employed bees in the hive. Each food source is then associated with an
employed bee.
2. Employed bee phase: During this phase, an employed bee will search for better food sources in its
neighborhood. Then, based on some neighborhood function, a greedy selection is applied between
the previous and the new food source. If a better food source is found within this neighborhood,
then this becomes the current food source associated with this employed bee.
3. Onlooker bee phase: During this phase, employed bees advertise their food sources to the onlooker
bees by the means of the waggle dance.
4. Onlooker bees will observe the dance of employed bees and probabilistically choose one of them
to follow until their food source.
5. Scout bee phase: During this phase, an employed bee will become a scout bee if its food source is
not improved. Then, the employed bee will abandon its food source and look for a new one.
Though bee inspired algorithms could be efficient solution to travelling sales man problem but as of now an
optimum solution to the problem has not been found

(The travelling salesman problem is a graph computational problem where the salesman needs to visit all
cities (represented using nodes in a graph) in a list just once and the distances (represented using edges in
the graph) between all these cities are known. The solution that is needed to be found for this problem is
the shortest possible route in which the salesman visits all the cities and returns to the origin city.)