This book aims to present state-of-the-art analytical methods from statistics and data mining for the analysis of high-throughput data from genomics and proteomics. Research and development in genomics and proteomics depend on the analysis and interpretation of large amounts of data generated by high-throughput techniques. To exploit data obtained from experimental and observational studies, life scientists need to understand the analytical techniques and methods from statistics and data mining. These techniques are not easily accessible to life scientists working on genomics and proteomics problems, as the available material is presented from a highly mathematical perspective, favoring formal rigor over conceptual clarity and assessment of practical relevance. This book addresses these issues by adopting an approach focusing on concepts and applications. It presents key analytical techniques for the analysis of genomics and proteomics data by detailing their underlying principles, merits and limitations.
Preface.- List of Contributors.- Introduction to Genomic and Proteomic Data Analysis.- Design Principles for Microarray Investigations.- Pre-Processing DNA Microarray Data.- Pre-Processing Mass Spectrometry Data.- Visualization in Genomics and Proteomics.- Clustering ? Class Discovery in the Post-Genomic Era.- Feature Selection and Dimensionality Reduction in Genomics and Proteomics.- Resampling Strategies for Model Assessment and Selection.- Classification of Genomic and Proteomic Data Using Support Vector Machines.- Networks in Cell Biology.- Identifying Important Explanatory Variables for Time-Varying Outcomes.- Text Mining in Genomics and Proteomics.- Index.