100% found this document useful (1 vote)
658 views4 pages

Bioinformatics Course SBT 410 Outline

This document outlines the course details for Bioinformatics (SBT 410) including the aim, course description, objectives, contents, methodology, assessment, attendance policy, and references. The course focuses on biological databases, sequence analysis, genome analysis, and gene mapping using computational tools. Students will learn fundamental concepts in bioinformatics and how to analyze biological data. The course will be examined through assignments, practicals, tests, and a final exam weighing 60% of the overall grade. Attendance of at least 80% of lectures is recommended.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
658 views4 pages

Bioinformatics Course SBT 410 Outline

This document outlines the course details for Bioinformatics (SBT 410) including the aim, course description, objectives, contents, methodology, assessment, attendance policy, and references. The course focuses on biological databases, sequence analysis, genome analysis, and gene mapping using computational tools. Students will learn fundamental concepts in bioinformatics and how to analyze biological data. The course will be examined through assignments, practicals, tests, and a final exam weighing 60% of the overall grade. Attendance of at least 80% of lectures is recommended.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
  • Course Outline
  • Course Contents
  • Methodologies and Approaches
  • References

School of Industrial Sciences & Technology

Department: Biotechnology
SBT 410 : Bioinformatics (SBT 410)
Lecturer : Mr. C. Mawere/ Mr N. Ncube
Email ID : cmawere@[Link]/nqncube@[Link]

COURSE OUTLINE

Aim

The course focuses on information search, data retrieval, genome analysis and gene mapping. Students are
introduced to Biological Data bases and their management. These include SQL (Sequence Query Language),
Searching of databases similar sequence; The NCBI; Publicly available tools; Resources at EBI; Resources
on the web; Database mining tools. Pair wise and multiple sequence alignment, scoring matrices, secondary
structure predictions are subjects include. Finally genome analysis and gene mapping using analysis. Tools
for Sequence Data Bank, sequence homology searching using BLAST and FASTA, FASTA and BLAST
Algorithms comparison. Much of the work will require use of internet to deduce gene sequences and
structures of proteins under study.

Course Description

In this course, students learn fundamental concepts and methods in bioinformatics, a field at the intersection
of biology and computing. It surveys a wide range of topics including biological database searching,
computational sequence analysis, sequence homology searching and motif finding, gene finding and genome
annotation, protein structure analysis and modeling, and biological knowledge discovery.

Course objectives

By the end of the course, students should be able to;

 Familiarize with some of the basic computational problems in bioinformatics.


 Familiarize with basic methods and tools for solving computational problems in bioinformatics.
 Analyze biological data set with available computational tools and methods.
 Understand and explain the basic biochemical pathways of importance to biotechnologists,
 Demonstrate an understanding of how to perform, interpret and report on in silico analysis.
Course Contents

Week Content
Definition and History of Bioinformatics, Internet and Bioinformatics, Data
Introduction 1 Management Analysis, Introduction to Data Mining: string mining, Text
to mining, KDD for bioinformatics.
Bioinformatic
2 Applications of Data Mining to Bioinformatics Problems, Applications of
s Bioinformatics.

Major Bioinformatics resources: NCBI, EBI, ExPASY, UNIPROT. The


Biological 3 knowledge of various databases and bioinformatics tools available at these
Data resources, organization of databases: data contents and formats, purpose and
Resources utilities in bioinformatics.
Access to Molecular Biology Databases through: Entrez, Sequence Retrieval
4 System (SRS), Macro Molecular Structural Databases: PDB, NDB, MMDB.
Protein structural classification systems CATH, SCOP, Introduction to pathway
databases: KEGG, BRENDA.
Concept of homology and sequence evolution (substitutions, conservation and
Sequence 5 INDELS), Concept of sequence alignment; different measures of sequence
analysis similarity (%identity, % similarity), pair-wise sequence comparisons, Dynamic
programming as applicable to global (Needleman-Wunch) and local (Smith-
Waterman) sequence alignments.
6 Pair-wise substitution scoring matrices (PAM and BLOSSUM); gap penalties,
Heuristic methods for homology detection FASTA and, BLAST and their
variants (Blast n, Blast p, x-Blast etc.,) PSSM and PSI-BLAST.

Multiple sequence alignments; multi-dimensional dynamic programming for


Multiple 7 multiple sequence alignment (MSA), Heuristic approaches for MSA;
Sequence progressive sequence, iterative alignment method; Clustal W/X, concept of
analysis dendrogram and its interpretation.
Detection of motifs; construction of sequence profile; Block Maker, MEME,
8
MACAW LOGOS and MAST, Introduction to homology modeling and protein
model analysis-PROCHECK, RAMPAGE, ERRAT,ProSA, VERIFY3D.

Concept of biological clock, Concept of Phylogenetic Trees, Comparison of


Molecular 9
Phylogenetic Trees and MSA.
Phylogenetics
Methods of Evaluation for Phylogenies; character based methods, distance
10
based methods for Phylogenetic, bootstrap method, Packages for Phylogenetic
studies like PHYLIP, PAUP, TREE VIEW etc.

Genomics 11 Introduction to genome, large scale genome sequencing strategies, Genome


assembly and annotation, Gene Identification: Introduction, methods of gene
predictions. Gene prediction tools; GRAIL, GENSCAN, FGENES, GenLang,
Gene Parser, Procrustes,
DNA/RNA structure and Function analysis: Poly A site Prediction, TATA
12 signaling, Promoter & Transfactor Bind site prediction, ORF prediction, Splice
Site prediction, Repetitive DNA & CpG island analysis, tRNA Gene prediction.

Methodologies/Approaches

 Lectures,
 Tutorials,
 Lab sessions,
 Group work

Course Assessment

The course will be taken over one semester and will be examined by a written examination and
assessment of assignments, practical and tests as follows;

Final Examination Theory Paper 60%


Continuous Assessment Theory (Assignments and Tests) 15%
Continuous Assessment Practical 25%
a) Tests – 3.
b) Written assignments
c) Term exam paper
d) Practical Lab assignments - based on each chapter.
e) Group presentations

Attendance
It is recommended that you attend all lectures. Students may not be allowed to sit for the exam if they
fail to attend at least 80% of the lecture sessions. Students are responsible for all material presented in
class or during practical sessions including course procedures. The course syllabus is defined by the
lecture content. However this can only lay out the essentials of the subject. You are therefore
encouraged to explore topics further by reading a number of reference texts including those listed in this
outline.

References

Baxevanis, A. D. and Oullette, B. F. 2003. Bioinformatics; A Practical Guide to the Analysis of


Genes and Proteins, 3 ed. John Wiley & Sons, Inc, New Delhi.

Leach, A. R. 1996. Molecular Modeling, Principles & Applications. Addison Wesley Longman,
Singapore.

Lesk, A. M. 2002. Introduction to Bioinformatics. Oxford University Press.

Mount, D. W. 2004. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor
Laboratory Press.

Primrose, S.B. and Twyman, R.M. 2007. Principles of Genome Analysis and Genomics. Blackwell
Publishing Company, Oxford, UK.

Rastogi, S.C., Mendiratta, N. and Rastogi, P. 2004. Bioinformatics: Concepts, Skills &
Applications. CBS Publishers & Distributors, New Delhi.
Xiong, J. 2006. Essential Bioinformatics. Cambridge University Press, Cambridge, UK.

Common questions

Powered by AI

The bootstrap method evaluates phylogenies by resampling data to create multiple datasets, constructing trees for each, and assessing tree stability by calculating the percentage of times specific groupings appear across all trees. Factors considered include the number of replicates and the underlying model assumptions. Packages like PHYLIP and PAUP automate this process, providing robust statistical tools to handle data, execute resampling, and visualize the consensus trees derived from bootstrap analyses, ensuring reliable phylogenetic inferences .

Large-scale genome sequencing involves challenges such as handling vast amounts of data, ensuring sequence accuracy, and assembling sequences into complete genomes. Methodologies include shotgun sequencing and newer technologies like next-generation sequencing. Post-sequencing, assembly involves piecing together fragments, while annotation involves identifying gene regions and functional elements. Tools like GENSCAN and GRAIL assist in gene prediction by using statistical models to identify coding regions within the sequence based on known gene structures, significantly reducing manual annotation effort .

Multi-dimensional dynamic programming improves MSA by optimally aligning multiple sequences simultaneously, maintaining consistent alignment across all sequences. However, it is computationally intensive and impractical for large datasets. Heuristic approaches like Clustal W/X offer computational efficiency by using progressive alignment methods, but they can miss optimal solutions due to their reliance on initial pair-wise alignments and guide trees, which may not accurately represent evolutionary relationships in all datasets .

FASTA and BLAST are both used for sequence homology searching, but they differ in their approach and efficiency. FASTA, an older tool, aligns sequences using a simplified version of the Smith-Waterman algorithm and is generally considered more rigorous but slower. BLAST, on the other hand, employs heuristic methods to quickly find local alignments, making it much faster. Their variants, such as Blastn, Blastp, and PSI-BLAST, enhance these methods by tailoring the search to specific types of sequences (nucleotide, protein) and improving detection of distant homologs through profile alignments .

Data mining in bioinformatics involves extracting useful patterns and knowledge from large biological datasets, which goes beyond simple data retrieval that focuses on accessing and organizing specific data. It can address problems such as identifying gene variants, predicting protein functions, and discovering potential drug targets. Data mining techniques like string mining and knowledge discovery in databases (KDD) are used to analyze complex biological relationships and structures .

The concept of the biological clock refers to the constant rate at which specific genes or proteins evolve over time, allowing the estimation of time divergence between species. In molecular phylogenetics, this concept helps calibrate evolutionary trees, where the rate of molecular changes is treated as proportional to time, aiding in reconstructing the evolutionary relationships and lineage diversifications among species using phylogenetic trees .

Pair-wise substitution scoring matrices like PAM and BLOSSUM are critical for sequence alignment as they provide the scores for evaluating the likelihood of character substitutions in an alignment. PAM matrices are derived from closely related proteins and predict short-term evolutionary changes, while BLOSSUM matrices are based on observed substitutions in more divergent sequences, thus better for general use with diverse datasets. The choice of matrix affects alignment outcomes; PAM matrices are generally used for sequences with high similarity, while BLOSSUM matrices are more suitable for distantly related sequences .

Homology modeling is based on predicting a protein's structure using the known structure of a homologous protein as a template. The accuracy of the modeled structure largely depends on the sequence identity between the target and template proteins. Validation tools such as PROCHECK, RAMPAGE, and VERIFY3D play a crucial role by assessing the quality of protein models. PROCHECK evaluates stereochemical properties, RAMPAGE assesses Ramachandran plots, and VERIFY3D checks the compatibility of the 3D structure with its sequence, thereby ensuring reliable models for further functional analysis .

Sequence Retrieval Systems such as Entrez and SRS enhance database accessibility by providing user-friendly interfaces for querying and retrieving relevant biological data across multiple databases. Entrez integrates diverse datasets, offering powerful search capabilities and cross-linking between different types of biological information, while SRS allows customized queries and data retrieval from various molecular biology repositories. These systems improve the usability of databases, facilitating efficient data management and analysis for researchers .

Pathways databases like KEGG and BRENDA provide comprehensive data on various biochemical pathways, allowing researchers to understand interactions and functions within a biological system. KEGG integrates genomic, chemical, and systemic functional data to map pathways, while BRENDA offers enzyme-specific information. Researchers can access these databases through various interfaces and tools that enable them to trace metabolic pathways, simulate biochemical reactions, and explore enzymatic functions and regulations within cellular processes .

You might also like