This accessible introductory textbook provides a straightforward, practical explanation of how statistical analysis and error measurements should be applied in biological research. Assuming no prior knowledge of statistics, Understanding Statistical Error covers the central topics needed for efficient data analysis, ranging from probability distributions, statistical estimators, confidence intervals, error propagation and uncertainties in linear regression, to advice on how to use error bars in graphs properly. Using simple mathematics, all these topics are carefully explained and illustrated with figures and worked examples. The emphasis throughout is on visual representation and on helping the reader to approach the analysis of experimental data with confidence.
This useful guide explains how to evaluate uncertainties of key parameters, such as the mean, median, proportion and correlation coefficient. Crucially, the reader will also learn why confidence intervals are important and how they compare against other measures of uncertainty.
Understanding Statistical Error: A Primer for Biologists can be used both by students and researchers to deepen their knowledge and find practical formulae to carry out error analysis calculations. It is a valuable guide for students, experimental biologists and professional researchers in biology, biostatistics, computational biology, cell and molecular biology, ecology, biological chemistry, drug discovery, biophysics, as well as wider subjects within life sciences and any field where error analysis is required.
Introduction 3
Why would you read an introduction? 3
What is this book about? 3
Who is this book for? 4
About maths 4
Acknowledgements 4
Chapter 1 Why do we need to evaluate errors? 9
Chapter 2 Probability distributions 11
2.1 Random variables 11
2.2 What is a probability distribution? 13
Probability distribution of a discrete variable 13
Probability distribution of a continuous variable 13
Cumulative probability distribution 14
2.3 Mean, median, variance and standard deviation 15
2.4 Gaussian distribution 16
Example: estimate an outlier 17
2.5 Central limit theorem 18
2.6 Log-normal distribution 20
2.7 Binomial distribution 21
2.8 Poisson distribution 24
Classic example: horse kicks 26
Interarrival times 27
2.9 Student's t-distribution 28
2.10 Exercises 29
Chapter 3 Measurement errors 31
3.1 Where do errors come from? 31
Systematic errors 32
Random errors 33
3.2 Simple model of random measurement errors 33
3.3 Intrinsic variability 35
3.4 Sampling error 36
Sampling in time 37
3.5 Simple measurement errors 38
Reading error 38
Counting error 40
3.6 Exercises 42
Chapter 4 Statistical estimators 43
4.1 Population and sample 43
4.2 What is a statistical estimator? 45
4.3 Estimator bias 47
4.4 Commonly used statistical estimators 47
Mean 47
Weighted mean 48
Geometric mean 49
Median 50
Standard deviation 51
Unbiased estimator of standard deviation 52
Mean deviation 55
Pearson's correlation coefficient 55
Proportion 57
4.5 Standard error 58
4.6 Standard error of the weighted mean 61
4.7 Error in the error 62
4.8 Degrees of freedom 63
4.9 Exercises 63
Chapter 5 Confidence intervals 65
5.1 Sampling distribution 66
5.2 Confidence interval: what does it really mean? 68
5.3 Why 95%? 69
5.4 Confidence interval of the mean 70
Example 73
5.5 Standard error vs. confidence interval 73
How many standard errors are in a confidence interval? 73
What is the confidence of the standard error? 74
5.6 Confidence interval of the median 75
Simple approximation 77
Example 78
5.7 Confidence interval of the correlation coefficient 78
Significance of correlation 81
5.8 Confidence interval of a proportion 82
5.9 Confidence interval for count data 86
Simple approximation 88
Errors on count data are not integers 88
5.10 Bootstrapping 89
5.11 Replicates 90
Sample size to find the mean 92
5.12 Exercises 93
Chapter 6 Error bars 97
6.1 Designing a good plot 97
Elements of a good plot 98
Lines in plots 99
A digression on plot labels 100
Logarithmic plots 101
6.2 Error bars in plots 102
Various types of errors 103
How to draw error bars 103
Box plots 104
Bar plots 105
Pie charts 109
Overlapping error bars 109
6.3 When can you get away without error bars? 111
On a categorical variable 111
When presenting raw data 111
Large groups of data points 111
Where errors are small and negligible 111
Where errors are not known 112
6.4 Quoting numbers and errors 112
Significant figures 112
Writing significant figures 113
Errors and significant figures 114
Error with no error 115
Computer generated numbers 117
Summary 118
6.5 Exercises 118
Chapter 7 Propagation of errors 121
7.1 What is propagation of errors? 121
7.2 Single variable 122
Scaling 123
Logarithm 123
7.3 Multiple variables 124
Sum or difference 125
Ratio or product 126
7.4 Correlated variables 127
7.5 To use error propagation or not? 127
7.6 Example: distance between two dots 129
7.7 Derivation of the error propagation formula for one variable 130
7.8 Derivation of the error propagation formula for multiple variables 131
7.9 Exercises 133
Chapter 8 Errors in simple linear regression 135
8.1 Linear relation between two variables 135
Mean response 136
True response and noise 137
Data linearization 137
8.2 Straight line fit 138
8.3 Confidence intervals of linear fit parameters 140
Example 144
8.4 Linear fit prediction errors 145
8.5 Regression through the origin 147
Example 149
8.6 General curve fitting 150
8.7 Derivation of errors on fit parameters 152
8.8 Exercises 153
Chapter 9 Worked example 155
9.1 The experiment 155
9.2 Results 156
Sasha 157
Lyosha 159
Masha 161
9.3 Discussion 162
9.4 The final paragraph 164
Solutions to exercises 165
Bibliography 178
Index 179