John Irwin at UCSF, I and others working in the area of drug discovery have had several discussions and email exchanges on the topic of the performance and comparison of different virtual screening and docking methods on different targets and problems.
The eCheminfo network supports community of practice activities, i.e., it is intended to support the activities of a group of people who bond together to share knowledge on good, better, and best practices, to learn skills from each other, share experiences, and engage in a process of collective learning. It potentially then could be a neutral environment for supporting coordination on practice activities, such as the difficult area of comparative study in screening and docking.
We thought it would be useful to summarise some ideas on supporting greater collaboration on such work and to invite comments and discussion which we have done below.
Please contribute to the discussion and add your comments on the Cheminfostream Blog or John’s Docking.org blog. (We can also be reached via email at barry.hardy [at] douglasconnect.com and jji [at] cgl.ucsf.edu.) We look forward to your input.
Barry Hardy
Could we take a Community Approach to Comparing Virtual Screening Methods?
After twenty years of undeniable progress, molecular docking seems to have plateaued. A recent paper by Tirado-Rives and Jorgensen [1] dashes some of the few hopes we had left by showing that conformational energetics alone make it impossible to rank order diverse compounds in high throughput virtual screening. In a Perspective in the same issue [2], Leach, Shoichet and Pieshoff summarize the stagnating state of the art that is docking, and suggest a pragmatic way forward, through measurement and benchmarking. Again in the same issue, a laborious evaluation of 10 docking programs, using 37 scoring functions was applied to seven protein types for three tasks: binding mode prediction, virtual screening for lead identification, and rank-ordering by affinity for lead optimization [3]. Among some encouraging results and upbeat analysis, the paper makes a number of worrying observations, including that "high fidelity in the reproduction of observed binding poses did not automatically impart success in virtual screening". Moreover, for eight diverse systems, "no statistically significant relationship existed between docking scores and ligand affinity."
The physics of protein-ligand binding is clearly both important, and challenging. The NIH sponsored workshop described by Leach, Shoichet and Pieshoff [4] called for more high quality data to be made available for benchmarking, and, "well developed testing sets to be evaluated with all available technology, without barriers, if we are to see forward rather than lateral growth in the field."
Most efforts to compare docking methods in "apples-to-apples" comparisons have been plagued by one methodological weakness or another. For example, a common criticism is that the "experts" running the program are more familiar with one program than another. Criticisms of unfair bias due to past or ongoing association with a particular software group are frequent, particularly from the developers whose software performed worst. Numerous criticisms are also levelled at how success is judged, how the test sets are compiled, in fact, nearly everything about docking comparison studies can be criticised.
In the spirit of collaboration, and in an effort to move the field forward as advocated by NIGMS, we are suggesting here an "open source" initiative to compare docking methodologies (Our use of “open source” here is to the methods used to carry out comparisions of methods, not whether the source code used is “open source”). We propose that a form of peer review, as hosted on a wiki and supplemented by workshop activity or virtual conference-based discussion, be applied at all stages of a fair "competition", including the design of the experiment, collection of the data, running of programs, and the analysis of the results. The goal is not to show up one program or another as a winner or loser, but to honestly and fairly compare methods, allowing all reasonable criticisms to be raised during the process, so that the entire field can move forward.
The UCSF group are now offering a dataset which they recently compiled from the literature, in which they have attempted to design a database of actives and challenging decoys for 40 diverse targets [5]. They are also actively soliciting experimental test data from pharma. They know it is a challenge to get this data released, even for projects that are no longer active, but they are asking for it nonetheless, for the benefit of the field.
In the upcoming eCheminfo Community of Practice meeting in Bryn Mawr we have scheduled a forum (16.00 Tuesday 17th October) to discuss whether such an "open source" project to benchmark docking programs is of interest, and if so, how to best move forward. We think this is an auspicious time for such a project, and we hope you (and your company or organization) do too. The world has benefited enormously from other "open source" projects, such as Linux, MySQL, wikipedia, and so on. We think this is docking's time. What do you think?
As certainly not everyone interested in this topic can be present at the meeting in Bryn Mawr, and time there is limited, it would be good to have some exchange of ideas virtually in our run up to the meeting and beyond.
Barry Hardy (eCheminfo Community of Practice) and John Irwin (UCSF, Docking.org)
References
[1] Tirado-Rives & Jorgensen, Contribution of Conformer Focusing to the Uncertainty in Predicting Free Energies for Protein-Ligand Binding, J. Med. Chem, 2006, 59,5880-5884.
[2] Leach, Shoichet, Pieshoff, Prediction of Protein-Ligand Interactions. Docking and Scoring: Successes and Gaps., J Med Chem, 2006, 49, 5851-5855.
[3] Warren et al, A Critical Assessment of Docking Programs and Scoring Functions, J Med Chem, 2006, 49, 5912-5931.
[4] http://www.nigms.nih.gov/News/Reports/DockingMeeting022406.htm
[5] Huang, Shoichet, Irwin, Benchmarking Sets for Molecular Docking, J. Med. Chem, 2006, in press.
InnovationWell eCheminfo cheminformatics chemoinformatics bioinformatics Computational Chemistry Virtual Screening Docking Molecular Modelling Molecular Modeling pharmaceutical pharma meeting workshop conference management Bryn Mawr Philadelphia Critical Path toxicology personalised medicine Life Sciences Pharma Drug DiscoveryResearch and Development Drug Development Healthcare Innovation Knowledge Management events
Barry, we're already all way too busy, but this is a noble cause that would surely benefit us all. Running a community-based docking comparison is a sociological problem as much as a scientific one: to nucleate a community supported effort that will set the ground rules, run the tests, and interpret the results, in a way that everyone can agree is fair. A wiki-based forum would allow for peer review ofs every aspect of the comparison, from compiling the data, to how the programs are run, to how the results are judged. Everyone can be heard, and all reasonable objections aired, offering an outcome that will be useful to all of us. One downside: unmoderated, such an exercise could quickly degenerate into chaos, so firm moderation is a must. An interesting experiment!
Posted by: John Irwin | October 04, 2006 at 07:42 PM
Dear Barry,
1. there is absolutely no question about the need of proper benchmarking. To be really competitive I agree that it is necessary to make the data and maybe clear technical workflow descriptions available.
2. I would even go a step further than just opening a 'docking' challenge. The mentioned missing 'statistical significance' is my biggest worry and the question is how can we most efficiently move forward in optimizing things? Docking is just a summation of several steps and maybe too many steps at the same time? So, I would rather recommend to split all parts of the docking process into bits and pieces. Then identify those parts with the highest failure risk and focus on them. The process can be at least chopped in
2.1. atom typing
2.3. ligand preparation (ionic forms, tautomers, ...)
2.2. ligand conformer generation
2.3. protein preparation (protonation, residue orientation, ...)
2.4. ligand placement (top-down, bottom-up, fragment based, group based, ...)
2.5. energy calculation (force field type, grid type, algorithm, ...)
2.6. constraint handling (global and local optimization strategy? process to escape local minima?)
2.7. scoring (single-objective, multi-objective, consensus, ...)
3. I would like to identify the best program on the market for each of those steps, since I do not believe that there is one single program that is equally good on all 'targets', this would contradict the no-free-lunch theorem.
http://en.wikipedia.org/wiki/No-free-lunch_theorem
4. Then the next question for me is, even if I know that, do software suppliers support pipelining single expert modules? If not, why? What can be done to change that?
5. Finally, if this pipelining of expert modules would hypothetically exist, does there then a method exist to predict which modules should be combined for which target? If not, what is needed to get this kind of prediction method?
Very kind regards, Joerg Kurt Wegner
Posted by: Joerg Kurt Wegner | October 05, 2006 at 08:50 PM
Dear Barry,
I think that the availability of high-quality public datasets is crucial
for the comparison of existing algorithms/programs and for the
identification of methodological problems (which may lead to new
developments that provide real improvements). Facilities to
comment/discuss models and (maybe even more important) individual
predictions would certainly help with the analysis of current shortcomings
and the development of new ideas and algorithms.
Speaking from a (Q)SAR perspective (I am not a docking expert) I would
keep in mind that there is always an (intentional or unintentional)
temptation to overfit a particular testset by tuning parameters until
the model gives good results just by chance. Keeping activity values
secret would help in this respect, but it prevents the analysis of poor
predictions. A pragmatic solution could provide multiple testsets
(having test data for 40 targets would go exactly into this direction)
and to think carefully about procedures, that ensure that none of the
test set information has been used for model development (maybe more
important for (Q)SAR than for docking techniques).
It is also important to remember that validation results are only valid
for the validation dataset and cannot be generalized to real world
applications (e.g. "our in-house library", "drug-like molecules", the
"chemical universe") unless you have drawn a representative sample.
What can be generalized are the results within the applicability domain
(AD) of the model (a forthcoming paper will provide some empirical
evidence). It is therefore important to provide correct AD definitions
for the involved algorithms (I suspect that most of the algorithms for
the docking steps mentioned by Joerg have limited applicability domains)
and to consider only predictions within the applicability domain for
validation purposes. If you are interested in some concrete examples you
can visit the validation pages at
http://www.predictive-toxicology.org/lazar/ (comments are very welcome).
Finally I would suggest not to reinvent the wheel, but to use and/or
collaborate with existing resources like PubChem, DSSTox, ChemDB, ...
Best regards,
Christoph
Posted by: Christoph Helma | October 10, 2006 at 12:44 PM
Your post on community approaches to docking is of great interest to us. We are carrying out an open source/open notebook science project involving the synthesis of diketopiperazines as new anti-malarial agents. We have started to use docking software to plan our next synthetic targets. However, because our expertise does not lie in docking we would appreciate feedback from the docking community as we make this work public. Here is where we stand:
http://usefulchem.wikispaces.com/First+100+Targets
Posted by: Jean-Claude Bradley | October 16, 2006 at 09:30 PM
Hi all,
the topic is indeed of great relevance.
Although everybody reports great enrichment factors for his tools, it is clear that without standardized test datasets it is impossible to make an objective comparison between various VS approaches.
A community based project for compiling standardized benchmark datasets would be of great value and I would be happy to contribute.
However with all the focus on docking, I want to remind you that there have been major successes in VS using ligand based approaches. So let's not forget ligand based VS in these efforts!
Sebastian
Posted by: Sebastian Rohrer | October 19, 2006 at 04:43 PM
Dear Barry,
I have no doubt that such a community-wide comparison would facilitate the development of the field by highlighting approaches of general applicability. I base this statement on my experience of a domain where the shared test-bed approach has really spurred R&D, specifically in the field of text search engines where the annual Text Retrieval Conference (TREC) organised by the NIST has for long played a central role in the development of the subject (see http://trec.nist.gov/). Each year, TREC provides a large dataset on which participants in the competition can carry out searches for pre-defined queries for which the relevant documents (i.e., the true positives) are kept from the participants. The searches are then evaluated using common performance metrics and there’s an annual conference to discuss the results. A similar common dataset/evaluation procedure was used for several years – the MUC conferences – in the natural language community (see http://www.cs.mu.oz.au/acl/C/C96/C96-1079.pdf), and common datasets play an important role in QSAR and ligand-based virtual screening (the steroid and MDDR datasets, although in these cases the positives are known).
Peter Willett
Posted by: Peter Willett | October 24, 2006 at 09:52 AM
Are you there?
These are very inportant.
Want to check your kids or employees.
Alco testers for home and office - http://www.xlar.com/alcohol-testers-and-breathalyzer.html
May be it helps you
Posted by: BraddyViapcar | January 08, 2009 at 05:05 AM