Module 5
Module 5
Algorithm
• Genetic algorithm is a probabilistic search algorithm based on the principles Genetics and evolution.
• Charles Darwin gave theory of natural selection which is the driving force of evolution.
• Theory of natural selection proposes that those features/characteristics that ensure better
survivability of the organism are preserved and multiplied over generations. Natural selection is
brought about by variation. Variation in organisms may be brought about by
1. Crossing over: Is a process that occurs during meiosis in which a chromosome segment is
exchanged between chromosome pair or the homologous chromosome.
2. Mutation: Mutation is any sudden change in the genetic information of an organism which
can be transferred from generation to generation. Mutation may be
a) Deletion- A part of genetic material is removed
b) Duplication- A part of genetic material is repeated
c) Insertion- A new base/sequence is added
d) Inversion- Instead of purine a pyrimidine is added or vice versa
e) Translocation- A part of chromosome is transported to another chromosome.
• Genetic algorithms, also termed as global search heuristics are inspired by evolutionary
biology such as inheritance mutation, selection and cross over. These algorithms provide a
technique for program to automatically improve their parameters.
• Genetic algorithm was introduced by John Holland in the early 1970s
• Genetic algorithms are considered as a search process used in computing to find exact or a
approximate solution for optimization and search problems.
Terminologies in Genetic algorithm
a) Population: Population is the subset of all possible or probable solutions, which can solve the given
problem.
b) Fitness Function: The fitness function is used to determine the individual's fitness level in the
population. It means the ability of an individual to compete with other individuals. In every round,
individuals are evaluated based on their fitness function.
c) Genetic Operators: In a genetic algorithm, the best individual mate to regenerate offspring better
than parents. Here genetic operators play a role in changing the genetic composition of the next
generation.
d) Selection: After calculating the fitness of every existent in the population, a selection process is used
to determine which of the individualities in the population will get to reproduce and produce the
seed that will form the coming generation.
• Coded information of the results to be optimized known as the genotype (Uncoded form of this
information is the Phenotype).
• Fitness function – to evaluate the fitness of results to be optimized.
• Selection function – To select the results with best fitness. The selected results only will be carried to
next generation.
• Crossover and mutation function – To create variation in the coded information.
• Sensor Based Robot Path Planning: After finding obstacles, the planning module generates a short and
safe path to goal with obstacle avoidance
• Image Processing to produce better images from group of images with disparity or apparent motion.
• Gaming: Genetic algorithm can come up with best strategies with a given number of resources. For
example, in a game a player would go through a level of the game and at the end, the program would pick
the non-player character that fared the best against the player, and use those in the next generation.
Slowly, after a lot of playing, some reasonable characteristics would be evolved.
• Real Time Systems: Assigning task priorities with time constraints
• Dynamic scheduling: Scheduling is to allocate a set of machines to perform a set of jobs within a certain
time period, and the goal of scheduling is to find an appropriate allocation – schedule – which maximizes
certain performance measure. For the implementation issues, the solutions are encoded by natural
representation, and the order crossover operator is used. They used the inversion mechanism as mutation
operator.
• Automated design of automobiles considering various constraints such as fuel consumption
• Genetic algorithms can be used to derive optimal schedules considering the courses, number of students,
and lecture rooms that the university has.
• Genetic algorithms are used here to derive the best combinations of product specifications and attributes
that will attract the most customers.
Gene expression modelling
• Gene expression defines the activity and behavior of a gene inside a cell.
• Alteration of gene expression by regulators (promoters or inhibitors) may have different consequences
on cells, including promoting the development of specific diseases such as cancer.
• All genes are not expressed at the same time. The study of the regulation of gene expression is
fundamental for the understanding of the molecular basis in health and disease, and for the identification
of therapeutic targets.
• In order to understand how genetic material controls cell growth and development we need to know not
only about the protein coding sequences but also about the non coding sequences which regulate gene
expression.
• This can be done by examining the elementary constituents (such as genes and factors) individually and
then how these are connected. The myriad components of a system and their interactions are best
characterized as networks and they are mainly represented as graphs where thousands of nodes are
connected with thousands of vertices.
• a variety of mathematical methods for describing regulatory networks have been proposed.
• a directed graph, Bayesian networks, Boolean network, differential equations etc. can be used to predict
the gene expression.
1. Directed graphs
• Is a static model. Here each node represents a gene and the arrows represent their
relationship.
• A directed graph may be annotated for better understanding.
• Directed graphs are mostly suitable for the representation of schemas describing biological
pathways or procedures which show the sequential interaction of elements. These are mainly
metabolic, signal transduction or regulatory networks.
2. Bayesian model
• Bayesian statistics makes use of probability to determine possibilities of various events
occurring.
• Bayesian network is a probabilistic graphical model which uses interpretable joint
probabilities.
• Bayesian networks have been used to model gene expression data and gene regulatory
networks
• A BN consists of a directed graph and a set of corresponding conditional probability density
functions.
• defined by two sets: the set of nodes (vertices), which represent random variables, and the
set of directed edges.
• The directed edges in a BN structure model the dependencies between the variables
(nodes) in the form of joint/ conditional probabilities
3. Boolean network
• Each node, has associated to it a function, with inputs the states of the nodes connected to.
The function can take only 2 values- 0/1, yes/No, True/False.
• A Boolean network consists of a set of Boolean variables whose state is determined by other
variables in the network.
• A Boolean network can be considered as a directed graph where the nodes represent the
expression status of genes and directed edges represent the actions of genes on other genes
(positive effect- 1/yes/True or negative effect – 0/No/False).
• The Boolean function associated to a variable represents the influence of other genes on the
activation state of the gene described by such a variable.
• The image below represents a regulatory network of three genes A, B and X. Effect of level of
expression of A and B and how it affects variable X. For example, when A is active (1) and
when B is inactive (0) the gene x is inactive.
4. Differential equation
• The two main functions that make up the process of gene expression are transcription and
translation.
• Transcription and translation are controlled by the number of mRNA and protein in the cell.
Gene expression can be represented as a system of first order differential equations for the
rate of change of mRNA and proteins. These equations involve transcription, translation,
degradation
• Another early study on how to map genetic algorithms to existing computer architectures was made
by Grefenstette. He proposed four prototypes for parallel GAs; the first three are variations of the
master-slave scheme, and the fourth is a multiple-population GA.
1. Master processer carries out all functions on population, Individuals are sent to slave processors
for evaluation.
2. Similar to first but there is no clear distinction between generations. when any slave processor
finishes evaluating an individual, it returns it to the master and receives another individual
3. In the third prototype the population is stored in shared memory, which can be accessed by the
slaves that evaluate the individuals and apply genetic operators independently of each other
4. Grefenstette’s fourth prototype is a multiple-population GA where the best individuals of each
processor are broadcast every generation to all the others
Methodology of parallel genetic algorithm
Step 0: Any problem that needs to be optimized must be represented in a way that can be understood by
the computer.
Step 1: Individuals of initial population must be defined
Step 2: Each individual must be assessed for fitness using some defined fitness function. Unfit individuals
from the population will be deleted.
Step 3: Among the fit individuals partners are selected and allowed to mate i.e., crossing over and mutation
is carried out.
Step 4: Offsprings are generated and these form the next generation
Step 5: Offsprings are assessed for fitness.
Step 6: If the termination criteria is not reached same process s repeated until fitness matches termination
criteria.
Real life applications of parallel genetic algorithm
• Diverse fields finally rely on optimizing a complex system: from building new drugs (optimize the
energy estate of chemicals, like their angles), deploying better 5Gs networks(optimized covered area,
the packet protocol parameters, and the number of potential users), investing in the stock market ]
(maximize revenues of a complex portfolio), or helping citizens to move into a city (minimize
pollution, fuel consumption, travel time, maximize comfort and drivers’ experience), just to name a
few. Filling your fridge with healthy food at low cost, combat a fire, program your computer with a
minimum number of bugs, and thousands of other applications are now possible by using Gas
• A transit route network design (TRND) problem for urban bus operation involves the determination
of a set of transit routes and the associated frequencies that achieve the desired objective. This can
be formulated as an optimization problem of minimizing the total system cost. GAs/Parallel GAs are
more suited to solve combinatorial optimisations.
• In machine learning
• Artificial neural networks
• Genetic algorithms find application in many designing procedures of mechanical components. For
instance, consider the following genetic algorithm example where the aircraft wing design is a kind
of designing problem that takes multiple disciplines into consideration. It requires improvement in
the ratio of left to drag for a complex wing.
• Optimise parameters to minimize manufacturing cost
• The areas of predictive analysis include protein prediction, RNA structure prediction, operon
prediction, and others.
• Other processes such as protein folding, gene expression profiling analysis, bioinformatics multiple
sequence alignment, or some of the process alignment that uses genetic optimisation.
DNA computing algorithms
• A computation may be thought of as the execution of an algorithm, which itself may be defined as a
step-by-step list of well-defined instructions that takes some input, processes it, and produces a
result.
• In DNA computing, information is represented using the four-character genetic alphabet (A
[adenine], G [guanine], C [cytosine], and T [thymine]), rather than the binary alphabet (1 and 0) used
by traditional computers.
• An algorithm’s input is therefore represented (in the simplest case) by DNA molecules with specific
sequences, the instructions are carried out by laboratory operations on the molecules (such as
sorting them according to length or chopping strands containing a certain subsequence), and the
result is defined as some property of the final set of molecules (such as the presence or absence of a
specific sequence).
Operations on DNA
The types of operations available are a result of the capability of molecular biology rather than the wishes
of algorithm designers. Also note that these algorithms are performed in constant time on test tubes.
1. Merge - This is the simple operation of combining the contents of two test tubes in a third tube.
2. Anneal - This is the process by which complementary strands of DNA are paired to form the famous
double-helix structure of Watson and Crick. Annealing is achieved by cooling a DNA solution, which
encourages pairing. Adleman uses this in step 1 to generate all legal paths through the graph.
3. Melt - Melting is the inverse operation of annealing. By heating the contents of a tube, double-
stranded DNA sequences are denatured, or separated into its two single-stranded parts.
4. Separation by length - The contents of a test tube can be separated by increasing length. This is
achieved by gel electrophoresis, whereby longer strands travel more slowly through the gel. This
operation was used by Adleman in step 3 of his solution to HP.
5. Separation by sequence - This operation allows one to remove from solution all DNA strands that
contain a desired sequence. This is performed by generating the strand whose complement is the
desired sequence. This newly generated strand is attached to a magnetic substance which is used to
extract the sequences after annealing. This operation is the crux of Adleman's step 4.
6. Copying/Amplification - Copies are made of DNA strands in a test tube. The strands to be copied must
have known sequences at both the beginning and end in order for this operation to be performed.
7. Append - This process makes a DNA strand longer by adding a character or strand to the end of each
sequence.
8. Detect - It is also possible to analyze a test tube in order to determine whether or not it contains at
least one strand of DNA.
Mechanism DNA computing algorithm
(The travelling salesman problem is a graph computational problem where the salesman needs to visit all
cities (represented using nodes in a graph) in a list just once and the distances (represented using edges in
the graph) between all these cities are known. The solution that is needed to be found for this problem is
the shortest possible route in which the salesman visits all the cities and returns to the origin city.)