An Introduction to Statistics and Data Analysis for Bioinformatics Using R

506 pages, 100 b/w illustrations, includes CD-ROM

Chapman & Hall (CRC Press)

Hardback | Jul 2017 | #200640 | ISBN-13: 9781439892367
From the very basics to linear models, this book provides a complete introduction to statistics, data analysis, and R for bioinformatics research and applications. It covers linear models, ANOVA, cluster analysis, visualization tools, and machine learning techniques. Suitable for self-study and courses in computational biology, bioinformatics, statistics, and the life sciences, the text also presents examples of microarrays and bioinformatics applications. R code illustrates all of the essential concepts and is available on an accompanying CD-ROM.


Bioinformatics — an emerging discipline

Introduction to R
Introduction to R
The basic concepts
Data structures and functions
Other capabilities
The R environment
Installing Bioconductor
Control structures in R
Programming in R vs C/C++/Java

Bioconductor: Principles and Illustrations
The portal
Some explorations and analyses

Elements of Statistics
Some basic concepts
Elementary statistics
Degrees of freedom
Bayes’ theorem
Testing for (or predicting) a disease

Probability Distributions
Probability distributions
Central limit theorem
Are replicates useful?

Basic Statistics in R
Descriptive statistics in R
Probabilities and distributions in R
Central limit theorem

Statistical Hypothesis Testing
The framework
Hypothesis testing and significance
"I do not believe God does not exist"
An algorithm for hypothesis testing
Errors in hypothesis testing

Classical Approaches to Data Analysis
Tests involving a single sample
Tests involving two samples

Analysis of Variance (ANOVA)
One-way ANOVA
Two-way ANOVA
Quality control

Linear Models in R
Introduction and model formulation
Fitting linear models in R
Extracting information from a fitted model: testing hypotheses and making predictions
Some limitations of the linear models
Dealing with multiple predictors and interactions in the linear models, and interpreting model coefficients

Experiment Design
The concept of experiment design
Comparing varieties
Improving the production process
Principles of experimental design
Guidelines for experimental design
A short synthesis of statistical experiment designs
Some microarray specific experiment designs

Multiple Comparisons
The problem of multiple comparisons
A more precise argument
Corrections for multiple comparisons
Corrections for multiple comparisons in R

Analysis and Visualization Tools
Box plots
Gene pies
Scatter plots
Volcano plots
Time series
Time series plots in R
Principal component analysis (PCA)
Independent component analysis (ICA)

Cluster Analysis
Distance metric
Clustering algorithms
Partitioning around medoids (PAM)
Clustering in R

Machine Learning Techniques
Main concepts and definitions
Supervised learning
Practicalities using R

The Road Ahead

Sorin Draghici the Robert J. Sokol MD Endowed Chair in Systems Biology in the Department of Obstetrics and Gynecology, professor in the Department of Clinical and Translational Science and Department of Computer Science, and head of the Intelligent Systems and Bioinformatics Laboratory at Wayne State University. He is also the chief of the Bioinformatics and Data Analysis Section in the Perinatology Research Branch of the National Institute for Child Health and Development. A senior member of IEEE, Dr. Draghici is an editor of IEEE/ACM Transactions on Computational Biology and Bioinformatics, Journal of Biomedicine and Biotechnology, and International Journal of Functional Informatics and Personalized Medicine. He earned a Ph.D. in computer science from the University of St. Andrews.

