Books  Data Analysis & Modelling  Computing for the Natural Sciences 

Hidden Markov Processes: Theory and Applications to Biology

Series: Princeton Series in Applied Mathematics

By: M Vidyasagar (Author)

304 pages, 50 b/w illustrations

Princeton University Press

Hardback | Jul 2014 | #211376 | ISBN-13: 9780691133157
Availability: Usually dispatched within 4 days Details
NHBS Price: £41.95 $52/€50 approx

About this book

Hidden Markov Processes: Theory and Applications to Biology explores important aspects of Markov and hidden Markov processes and the applications of these ideas to various problems in computational biology. The book starts from first principles, so that no previous knowledge of probability is necessary. However, the work is rigorous and mathematical, making it useful to engineers and mathematicians, even those not interested in biological applications. A range of exercises is provided, including drills to familiarize the reader with concepts and more advanced problems that require deep thinking about the theory. Biological applications are taken mostly from post-genomic biology, especially genomics and proteomics. The topics discussed include standard material such as the Perron-Frobenius theorem, transient and recurrent states, stopping times, maximum likelihood estimation, and the Baum-Welch algorithm.

Hidden Markov Processes: Theory and Applications to Biology contains extremely useful topics not usually seen at the basic level, such as mixing coefficients between random variables, ergodicity of Markov processes, Markov Chain Monte Carlo (MCMC) methods, information theory, and introductory large-deviation theory. In the area of realization theory for hidden Markov models, Hidden Markov Processes: Theory and Applications to Biology presents contemporary research. Among biological applications, it presents an in-depth look at the BLAST (Basic Local Alignment Search Technique) algorithm, including a comprehensive explanation of the underlying theory.

"This book provides a terrific introduction to an important and widely studied field – Markov processes (including hidden Markov processes) – with a particular view toward applications to problems in biology. With a wonderful balance of rigor, intuition, and choice of topics, the book gives a unique treatment of the subject for those interested in both fundamental theory and important applications."
– Sanjeev Kulkarni, Princeton University

"Vidyasagar uses sound scholarship to address hidden Markov processes and their application to problems in computational biology, in particular to genomics and proteomics. The well-organized book examines topics not often covered, such as realization theory and order determination for hidden Markov processes, and also looks at significant properties such as ergodicity and mixing. This work will be useful to systems researchers as well as computational biologists."
– Steve Marcus, University of Maryland


Preface xi


Chapter 1. Introduction to Probability and Random Variables 3
1.1 Introduction to Random Variables 3
1.1.1 Motivation 3
1.1.2 Definition of a Random Variable and Probability 4
1.1.3 Function of a Random Variable, Expected Value 8
1.1.4 Total Variation Distance 12
1.2 Multiple Random Variables 17
1.2.1 Joint and Marginal Distributions 17
1.2.2 Independence and Conditional Distributions 18
1.2.3 Bayes' Rule 27
1.2.4 MAP and Maximum Likelihood Estimates 29
1.3 Random Variables Assuming Infinitely Many Values 32
1.3.1 Some Preliminaries 32
1.3.2 Markov and Chebycheff Inequalities 35
1.3.3 Hoeffding's Inequality 38
1.3.4 Monte Carlo Simulation 41
1.3.5 Introduction to Cramér's Theorem 43

Chapter 2. Introduction to Information Theory 45
2.1 Convex and Concave Functions 45
2.2 Entropy 52
2.2.1 Definition of Entropy 52
2.2.2 Properties of the Entropy Function 53
2.2.3 Conditional Entropy 54
2.2.4 Uniqueness of the Entropy Function 58
2.3 Relative Entropy and the Kullback-Leibler Divergence 61

Chapter 3. Nonnegative Matrices 71
3.1 Canonical Form for Nonnegative Matrices 71
3.1.1 Basic Version of the Canonical Form 71
3.1.2 Irreducible Matrices 76
3.1.3 Final Version of Canonical Form 78
3.1.4 Irreducibility, Aperiodicity, and Primitivity 80
3.1.5 Canonical Form for Periodic Irreducible Matrices 86
3.2 Perron-Frobenius Theory 89
3.2.1 Perron-Frobenius Theorem for Primitive Matrices 90
3.2.2 Perron-Frobenius Theorem for Irreducible Matrices 95


Chapter 4. Markov Processes 101
4.1 Basic Definitions 101
4.1.1 The Markov Property and the State Transition Matrix 101
4.1.2 Estimating the State Transition Matrix 107
4.2 Dynamics of Stationary Markov Chains 111
4.2.1 Recurrent and Transient States 111
4.2.2 Hitting Probabilities and Mean Hitting Times 114
4.3 Ergodicity of Markov Chains 122

Chapter 5. Introduction to Large Deviation Theory 129
5.1 Problem Formulation 129
5.2 Large Deviation Property for I.I.D. Samples: Sanov's Theorem 134
5.3 Large Deviation Property for Markov Chains 140
5.3.1 Stationary Distributions 141
5.3.2 Entropy and Relative Entropy Rates 143
5.3.3 The Rate Function for Doubleton Frequencies 148
5.3.4 The Rate Function for Singleton Frequencies 158

Chapter 6. Hidden Markov Processes: Basic Properties 164
6.1 Equivalence of Various Hidden Markov Models 164
6.1.1 Three Different-Looking Models 164
6.1.2 Equivalence between the Three Models 166
6.2 Computation of Likelihoods 169
6.2.1 Computation of Likelihoods of Output Sequences 170
6.2.2 The Viterbi Algorithm 172
6.2.3 The Baum-Welch Algorithm 174

Chapter 7. Hidden Markov Processes: The Complete Realization Problem 177
7.1 Finite Hankel Rank: A Universal Necessary Condition 178
7.2 Nonsuffciency of the Finite Hankel Rank Condition 180
7.3 An Abstract Necessary and Suffcient Condition 190
7.4 Existence of Regular Quasi-Realizations 195
7.5 Spectral Properties of Alpha-Mixing Processes 205
7.6 Ultra-Mixing Processes 207
7.7 A Sufficient Condition for the Existence of HMMs 211


Chapter 8. Some Applications to Computational Biology 225
8.1 Some Basic Biology 226
8.1.1 The Genome 226
8.1.2 The Genetic Code 232
8.2 Optimal Gapped Sequence Alignment 235
8.2.1 Problem Formulation 236
8.2.2 Solution via Dynamic Programming 237
8.3 Gene Finding 240
8.3.1 Genes and the Gene-Finding Problem 240
8.3.2 The GLIMMER Family of Algorithms 243
8.3.3 The GENSCAN Algorithm 246
8.4 Protein Classification 247
8.4.1 Proteins and the Protein Classification Problem 247
8.4.2 Protein Classification Using Profile Hidden Markov Models 249

Chapter 9. BLAST Theory 255
9.1 BLAST Theory: Statements of Main Results 255
9.1.1 Problem Formulations 255
9.1.2 The Moment Generating Function 257
9.1.3 Statement of Main Results 259
9.1.4 Application of Main Results 263
9.2 BLAST Theory: Proofs of Main Results 264

Bibliography 273
Index 285

Write a review

There are currently no reviews for this product. Be the first to review this product!


M. Vidyasagar is the Cecil and Ida Green Chair in Systems Biology Science at the University of Texas, Dallas. His many books include Computational Cancer Biology: An Interaction Network Approach and Control System Synthesis: A Factorization Approach.

Bestsellers in this subject

A Beginner's Guide to R

NHBS Price: £39.99 $50/€47 approx

Computing for Biologists

NHBS Price: £34.99 $44/€41 approx

Unix and Perl to the Rescue!

Clearance price: £7.50 £30.99 (Save £23.49) $9/€9 approx

R Graphs Cookbook

NHBS Price: £47.99 $60/€57 approx