This book brings together the two disparate worlds of computational text analysis and biology and presents some of the latest methods and applications to proteomics, sequence analysis, and gene expression data. Modern genomics generates large and comprehensive data sets but their interpretation requires an understanding of a vast number of genes, their complex functions, and interactions. Keeping up with the literature on a single gene is a challenge itself - for thousands of genes it is simply impossible! Here, Soumya Raychaudhuri presents the techniques and algorithms needed to access and utilize the vast scientific text, i.e. methods that automatically "read" the literature on all the genes.
Including background chapters on the necessary biology, statistics, and genomics, in addition to practical examples of interpreting many different types of modern experiments, this book is ideal for students and researchers in computational biology, bioinformatics, genomics, statistics and computer science.
1. An Introduction to Text Analysis in Genomics; 2. Functional Genomics; 3. Textual Profile of Genes; 4. Using Text in Sequence Analysis; 5. Using Text in the Analysis of a Gene Expression Experiment; 6. Analyzing Groups of Genes; 7. Analyzing Large Gene Expression Data Sets; 8. Using Text Classification for Gene Function Annotation; 9. Finding Gene Names; 10. Protein Interaction Networks; 11. Conclusion