an excellent introduction, 10 Sep 2004
This is a very lucid introduction to chemoinformatics. It covers all the main topics: 2D and 3D representations of chemical structure, pharmacophore searching, descriptor generation, chemometric techiniques (Principal Component Analysis and Partial Least Squares), QSAR model building, similarity searching, diversity analysis and molecule selection, and combinatorial library design.
Additionally, there are sections on virtual screening and on data-mining of high-throughput screening, which are slightly weaker than the other material. In the former case, sections on filters, drug-likeness, docking and ADMET prediction, hangs together rather uneasily. The docking section, while communicating the basics, is perhaps a little too light on detail.
The HTS chapter considers classification techniques as separate from regression (which is covered in the QSAR chapter), covering neural networks and decision trees, as well as linear discriminant analysis and briefly mentioning Support Vector Machines. I would question whether LDA is suitable for analysis of HTS data (the example given is from 1974 and only considers 20 molecules!), likewise for SVMs (due to speed issues). And certainly, SVMs and NNs can be used for regression.
One other thing - the jacket design sports a horrendously naff typeface. Please change this for the next edition, Kluwer!
However, these are minor quibbles. The authors have packed a large amount of information (with plenty of references) into a small volume, without sacrificing readability. An admirable achievement, and highly recommended for anyone seeking a good overview or entry point into the field.
A comprehensive introduction, 18 Aug 2003
Chemical structures are a symbolic "language" that has developed about 150 years ago. For the specialist, the structures do not only encode the connectivity of atoms but they also provide information, via the recognition of functional groups, on synthetic accessibility, chemical reactivity, and various other molecular properties. However, for a computer, this symbolic language has to be translated. This is one application of chemoinformatics, to store and retrieve structural information in various ways. Some other ones, becoming more and more important because of the vast amount of compounds being synthesised and tested in drug research, are the calculation of different molecular properties, the comparison of molecules by their mutual similarity, the selection of sets of compounds with the highest dissimilarity, and various strategies for the enrichment of compound libraries, especially combinatorial libraries, with promising candidates for biological testing.
The book by Leach and Gillet is a comprehensive introduction into the field, well balanced in its presentation of the underlying theories and a critical discussion of scope and limitations of the individual approaches.
In an introductory chapter, the representation and manipulation of 2D molecular structure (i.e., by the computer) is described, including structure searching and substructure searching. The next chapter deals with the generation, representation and manipulation of 3D structures; pharmacophore generation and pharmacophore searches are treated as well as the flexibility of molecules. 2D and 3D descriptors are described in an extra chapter, which is the basis of the discussion of computational models, like QSAR and molecular field analyses. Two more chapters discuss similarity methods and the selection of diverse sets of compounds; a small section of each chapter is dedicated to a comparison and evaluation of the individual methods. A chapter on the analysis of high-throughput screening data discusses data visualisation and data mining methods. Virtual screening is becoming more and more important, due to the vast number of compounds that could be synthesised and tested. Correspondingly, a chapter on this topic discusses the concept of "drug-likeness", different computational filters, ligand docking and scoring, and the prediction of ADMET (absorption, distribution, metabolism, elimination, and toxicity) properties. The final chapter deals with library design in combinatorial chemistry. Two short appendices, one on matrices, eigenvectors and eigenvalues, and another one on conformation, energy calculations and energy surfaces, present details which could not be included in the text. Very helpful is a short section on recommendations for further reading (sorted by chapters), followed by the list of references for all chapters (25 pages), and the keyword index (9 pages).
Leach and Gillet have been successful to compile an introductory text which is easy to read and understand, and is of special value for the newcomer as well as for the practitioner. The individual chapters treat all important aspects of chemoinformatics in a well-balanced manner, in scientific detail but without too much theory. To me, the comments on limitations and various pitfalls, which reflect the long practical experience of these two outstanding scientists, are the most important aspect of the book. Too many people use modelling in a "blind" manner, without being aware of the problems behind the individual methods. This can be avoided by studying this book.
Correspondingly, this chemoinformatics book is highly recommended as an introductory text for students and all scientists who deal with molecular modelling, combinatorial chemistry, high-throughput, structure-activity relationships, and drug research, in general.