Recent advances in experimental methods have resulted in the generation of enormous volumes of data across the life sciences. Hence clustering and classification techniques that were once predominantly the domain of ecologists are now being used more widely. This book provides an overview of these important data analysis methods, from long-established statistical methods to more recent machine learning techniques. It aims to provide a framework that will enable the reader to recognise the assumptions and constraints that are implicit in all such techniques. Important generic issues are discussed first and then the major families of algorithms are described. Throughout the focus is on explanation and understanding and readers are directed to other resources that provide additional mathematical rigour when it is required. Examples taken from across the whole of biology, including bioinformatics, are provided throughout the book to illustrate the key concepts and each technique's potential.
* Equations are kept to a minimum to ensure accessibility of the material to a wide readership, particularly those without a strong mathematical background
* All worked examples in the book use accessible data files, allowing the reader to understand the details of each analysis and repeat it themselves: examples are taken from across the life sciences.
* A specific chapter is devoted to the measurement of accuracy, something that is lacking in most biological and statistical texts
1. Introduction; 2. Exploratory data analysis; 3. Cluster analysis; 4. Introduction to classification; 5. Classification algorithms I; 6. Other classification methods; 7. Classification accuracy; Appendices; References.
ALAN H. FIELDING is Senior Lecturer in the Division of Biology at Manchester Metropolitan University.
...the book contains quite a lot of useful material for those embarking in this area... - Morven Leese, Biometrics