A recent publication has highlighted some of the limitations in the ability of commercially available docking programs to predict ligand binding affinities correctly (1). These authors summarize that while docking programs can generate ligand poses similar to crystallographically determined protein/ligand complex structures for some targets, no single program usually does well on all targets. Additionally, scoring functions are usually not successful at distinguishing the crystallographic conformation from the set of docked poses. Lastly, while docking programs can identify active compounds from a pharmaceutically relevant pool of decoy compounds, no single program has performed well on all the targets. These limitations undermine the credibility of docking programs as a virtual-screening tool.
On Tuesday the 14 of October 2008 we will hold an eCheminfo Community of Practice conference session at Bryn Mawr College on Docking and Scoring to be chaired by Chaya Duraiswami of GlaxoSmithKline. This will be preceded on Monday 13 October by a one day wiki-supported virtual screening best practices workshop continuing our work and discussions of last year.
The “docking and scoring” session will highlight some advances made to this field to address the limitations stated above. John Irwin, will address the necessity of utilizing appropriate experimental design principles while conducting both retrospective and prospective docking studies and analyzing their results. The talk by Georgia McGaughey will compare the utility of docking studies with other ligand-based approaches. This should help us understand when docking might be a worthwhile virtual-screening tool to consider and when other methods might be more appropriate; and if it is important for the docking program to generate ligand conformations similar to crystallographically determined protein/ligand complex structures while conducting a virtual screening exercise. Johannes Voigt will present an application of cross docking with CDK2 inhibitors as the test case, to determine if one is obtaining the right answers for the right reasons, as opposed to a chance correlation. Talks by Lance Westerhoff and Zsolt Zsoldos will highlight some advances made in the area of scoring functions to correctly predict binding affinity and rank order ligands by their relative potency.
Reference
(1) A Critical Assessment of Docking Programs and Scoring Functions; Gregory L Warren, et. al J. Med. Chem., 49 (20), 5912 -5931, 2006
A description of the session with presentation abstracts follows:
Docking & Scoring
http://echeminfo.com/COMTY_confprog08docking
(Please follow continuation here to read abstracts. Comments can be made at the end.)
Abstracts
Retrospective and Prospective Investigation of Docking Performance
John J. Irwin, Department of Pharmaceutical Chemistry, University of California San Francisco, Byers Hall, 1700 4th St, Mailbox 2550, San Francisco CA 94158-2330, USA
Virtual screening is the most practical method to leverage structure for ligand discovery, and as a consequence is widely used by both pharma and academic groups. Unfortunately, the technique retains important weaknesses, and continues to be challenging to use effectively, even by experts. In principle, one would like to evaluate docking by its ability to predict ligand binding affinities, but this is now beyond the field. In practice, docking is judged by separation of likely ligands from the vast majority of database molecules (decoys) that are unlikely to bind, typically in retrospective calculations. For such enrichment factors to be meaningful, the ligands and decoys must be property matched, so that enrichment is not simply due to trivial separation by physical properties such as molecular weight or hydrophobicity. We have addressed this problem by creating a directory of universal decoys (DUD), containing 2950 ligands and 95,316 property-matched decoys for 40 targets. Another way to evaluate docking is to compare it to high throughput screening (HTS). We prospectively docked 70,563 compounds against AmpC beta lactamase, and compared our prediction with HTS followed by careful secondary assays. Surprisingly, although there were 1274 “hits”, HTS failed to identify a single non-covalent inhibitor at 30 uM. Reviewing the top scoring docked ligands, we selected 16 compounds to test in the lab at higher concentration. We found two compounds, the better of which had a Ki of 37 uM, and could be progressed through routine medicinal chemistry to a Ki of 8 uM.
References
1. Babaoglu, B, Simeonov, A, Irwin JJ, Nelson ME, Feng B, Thomas CJ, Cancian L, Costi MP, Maltby DA, Jadhav A, Inglese J, Austin CP and Shoichet BK, A comprehensive mechanistic analysis of hits from high throughput and docking screens against beta lactamase, J. Med. Chem. (2008), in press.
2. Huang N, Shoichet BK*, Irwin JJ*, Benchmarking sets for Molecular Docking, J. Med. Chem., 49(23), 6789-6801 (2006).
3. Irwin JJ, Community benchmarks for virtual screening, J. Comput. Aided Mo.l Des. (2008), available online.
4. DUD is available free at http://dud.docking.org/. 5. Zinc is available free at http://zinc.docking.org.
Ligand- and Structure-based Methods: Comparison of Methods
Georgia McGaughey, Merck Research Laboratories
In an extensive study published by our laboratories (McGaughey et al., J Chem Inf Model (2007) 47:1504-1519) we compared 2D-ligand and 3D-ligand and structure-based methods for use in virtual screening. Enrichment factors (EF) were applied as a measure to rank the virtual screening methods. Each protein target or chemical probe was represented by one crystal structure or one molecule. Although eleven targets were examined using two diverse databases, the results of any one virtual screening method could be skewed because of the particular crystal structure or molecule (i.e. chemical probe) used. We have since extended the results of the first study so that each protein would be represented by a maximum of five crystal structures or five ligands (Sheridan et al, J Comput Aided Mol Des (2008) 22:257-265). In addition, we will report additional measures of enrichment (BEDROC) for 47 targets.
Cross-docking to CDK2: a Virtual Screening Study
Johannes H. Voigt, Vincent S. Madison, Jose S. Duca ; Department of Drug Design, Schering-Plough Research Institute, 2015 Galloping Hill Road, K15-1-1800, Kenilworth, New Jersey 07033, USA
We recently published a comprehensive cross docking study on CDK2 covering the analysis of docking accuracy and score/affinity correlations for a uniform set of 150 CDK2 crystal structures (Duca, Voigt, J. Chem. Inf. Model. 2008, 48, 659-668 and 669-678). In agreement with previous docking/scoring evaluations, the docking accuracy of Gold and Glide was good, while the score/affinity correlations were not satisfactory.
In this study virtual screening for this unique data set was investigated. The following questions were addressed: A) Does virtual screening for this data set work? B) If yes, does it work for the correct reasons? Here it is valuable to have the experimentally determined binding modes of the active ligands. C) What is the best choice of the decoy set, what really is a decoy, and is a decoy set only valid in the context of the active molecules? In comparison to our in-house ligands, the DUD CDK2 data set was used (“Benchmarking Sets for Molecular Docking” Huang, N., Shoichet, B.K., Irwin, J.J., J. Med. Chem. 2006, 49, 23, 6789 - 6801.) D) Does combing of docking results from multiple protein structures enhance the performance? E) How do Gold and Glide compare?
Application of Quantum Mechanical Pairwise Energy Decomposition to Structure-based Drug Design
Lance Westerhoff, QuantumBio
The current state of the art of in silico drug discovery relies almost exclusively on molecular mechanics force fields and empirical potentials. It is well known that while these approaches are excellent for certain applications, they have thus far proven less then satisfactory for a thorough understanding of the interactions of enzyme-inhibitor systems. To address these issues, our linear scaling, quantum mechanics (QM) algorithm is being applied to in silico drug discovery problems to characterize pairwise energy decomposition (QM-PWD) between a set of targets and a population of inhibitors. Thus the QM-PWD method and associated SE-COMBINE application has been successfully developed and validated against two conventional methods (COMBINE and MM-PB(GB)SA) by comparing the methods’ abilities in elucidating binding affinities of a series of trypsin inhibitors. In order to measure the statistical robustness of the various methods, a partial least squares (PLS) analysis was performed for the results from SE-COMBINE and COMBINE calculations. The present study not only shows that the QM-PWD/SE-COMBINE method possesses a much greater flexibility in its use of atom-by-atom, pairwise energy decomposition and ligand fragmentation compared to other methods, but QM-PWD/SE-COMBINE is also shown to yield results that are significant improvements over those generated by conventional methods. Further, unlike these conventional methods, SE-COMBINE provides both QM and molecular mechanics (MM) energy terms, from which many scoring functions can be constructed on the fly and with little or no additional CPU cost. This ability allows QM-PWD/SE-COMBINE to be applied to a large breadth of receptor-ligand systems that may emphasize or require different energy terms. In the trypsin-ligand system, twelve different scoring functions with several combinations of various energy terms and ligand-fragmentation schemes were developed and validated, and robust PLS models were derived. The QM energy terms, such as QM-PWD in vacuum and solvent, have been demonstrated to be important to describe activation variation in trypsin-ligand system, and the scoring functions including mainly the MM energy terms yield less descriptive prediction sets. Thus, it has been shown that the QM energy terms are essential in accurate characterization of the trypsin-ligand system.
The eHiTS Scoring Function
Zsolt Zsoldos and Danni Harris, SimBioSys Inc.
The fragment-based exhaustive flexible ligand docking engine of eHiTS has been published previously [1]. Recently, our efforts were focused on developing an innovative scoring function for eHiTS, one which departs from the traditional atom-based interaction scoring that is typical to most empirical, force-field based and statistical scoring methods. We have introduced a novel concept of scoring interactions based on Interacting Surface Points (ISP) that are represented by their 3D positions, normal vectors and 23 chemical feature types including H-bond donor/acceptor, aromatic Pi electrons, and hydrophobic groups. A statistically derived empirical scoring function is constructed using a 4-parameter geometric description of the relationship between ISP pairs. The parameters include the distance between the pairs of ISPs, and the angles between the normal vectors. The energy associated with each possible ISP pair is deduced from statistics based on an inverse application of the Boltzmann distribution function. During the statistics collection, temperature factors were considered with the corresponding Gaussian functions applied to the atom positions to account for the variable uncertainty of the atom positions in the Protein Data Bank (PDB) X-ray structures. More accurate geometric statistics have been collected from the Cambridge Structure Database and recently incorporated into the PDB data. Certain atoms, for example, the nitrogen atom in the imidazole ring, may participate in very different types of interactions at the same time (H-bonding and aromatic Pi-stacking). The ISP representation can describe these interactions better than the atom-based approach by having multiple ISPs associated with the same atom but pointing in different directions.
The advantage of the statistically driven ISP scoring function is demonstrated on a case study using the Acetylcholine Binding Protein (AChBP) which has a key cation-Pi interaction observed crystallographically for several substrates (e.g. CCE, Nicotine, Lobeline, Epibatidine) [2]. Empirical and force-field based scoring functions fail to rank the correct binding pose highest even when using DFT-6-31**B3LYP charges. In contrast, eHiTS produces the correct pose with the best score even when using the default statistical table and weighting scheme for which no example from this protein family was included. When the automated training script is run to include the family in the knowledge base, then the energy separation between the correct pose and other generated poses improves and provides very cleanly distinguished clusters. Furthermore, the eHiTS score gives a good correlation with the experimentally measured log(Kd) values for the series, correctly rank ordering the actives.
eHiTS flexible docking has proved to be among the most accurate pose prediction tools [4] and combined with the LASSO [3] ligand based filter it provides one of the highest enrichment factors based on comparative evaluation studies [5]. While LASSO can rapidly and efficiently reduce the number of candidates to be docked to a few percent of the total database, accurate flexible docking with eHiTS used to take several minutes of CPU time per ligand on traditional hardware architectures. The algorithm has been recently redesigned and coded to take advantage of the Cell B/E accelerator architecture providing between 30-100 fold speed-up [6] and bringing the runtime down to a few seconds per ligand on a Sony Playstation PS3 gaming machine or even faster on an IBM Cell Blade while still producing the most accurate flexible docking.
The revolutionary hardware technology requires new computational methods, replacing approximate pre-computed grids with proximity look-up and explicit pair-wise interaction computation. As a result, the calculation is not only orders of magnitude faster, but it also provides more accurate energy predictions. The emerging technologies presented could also be applied to speed up other molecular modeling related problems, e.g. QM or MD simulations and protein folding, by multiple orders of magnitude.
References
[1] Z. Zsoldos, D. Reid, A. Simon, S.B. Sadjad, A.P. Johnson: eHiTS a new fast, exhaustive flexible ligand docking system; J. Mol. Graph. Modeling (26), 1, 2007, 198-212; doi:10.1016/j.jmgm.2006.06.002
[2] S.B. Hansen, G. Sulzenbacher, T. Huxfold, P. Marchot, P. Taylor, Y. Bourne: Structures of Aplysia AChBP complexes with nicotinic agonists and antagonists reveal distinctive binding interfaces and conformations. The EMBO Journal (2005) 24, 3635-3646. doi:10.1038/sj.emboj.7600828
[3] D. Reid, B.S. Sadjad, Z. Zsoldos, A. Simon: LASSO - ligand activity by surface similarity order: a new tool for ligand based virtual screening. Journal of Computer-Aided Molecular Design, http://dx.doi.org/10.1007/s10822-007-9164-5, doi: 10.1007/s10822-007-9164-5
[4] M. Kontoyianni, L.M. McClellan, G.S. Sokol: Evaluation of Docking Performance: Comparative Data on Docking Algorithms, J. Med. Chem., 2004; 47(3); 558-565. eHiTS results for the same test case added by Fedor Zhuravlev, Assist. Prof., Technical University of Denmark: http://www.simbiosys.ca/ehits/ehits_validation.html
[5] G.B. McGaughey, R.P. Sheridan, C.I. Bayly, C. Culberson, C. Kreatsoulas, S. Lindsley, V. Maiorov, J. Truchon, W.D. Cornell: Comparison of Topological, Shape, and Docking Methods in Virtual Screening. J. Chem. Inf. Model. 2007; 47(4), 1504-19. DOI: 10.1021/ci700052x eHiTS results added by Merck: http://www.simbiosys.ca/ehits/ehits_enrichment.html
[6] http://www.bio-itworld.com/inside-it/2008/05/gta4-and-life-sciences.html
Barry Hardy
eCheminfo Community of Practice
docking eCheminfo cheminformatics chemoinformatics bioinformatics Medicinal Chemistry Computational Chemistry Virtual Screening Molecular Modelling Molecular Modeling pharmaceutical pharma meeting workshop training Oxford Critical Path toxicology Bursary Life Sciences Pharma Drug Discovery Research and Development Drug Development Healthcare Innovation Knowledge Management events
SimBioSys news blog | posted today more information relevant to this meeting | SBS's eCheminfo presentation and poster abstracts / full presentations are available on-line. See:
http://www.simbiosys.ca/blog/2008/10/05/simbiosys-will-be-at-the-echeminfo-meeting-oct-13-17-in-bryn-mawr-pa-usa/
Posted by: SimBioSys | October 06, 2008 at 04:08 PM