Pharmaceutical research is under challenge to improve the choice, quality and safety of lead candidates. There is a clear need for an open discussion and an awareness of the requirements for a much more complex knowledge management and knowledge transfer between academic, government and commercial interests. The semantic web has the potential to make significant contributions to the drug discovery of the future but is at this time at an early development stage and there are only a few public tools for the data mining and sharing of chemical information.
Just a few years ago, the only imaginable way of doing in silico drug design - or, indeed, any cheminformatics research - was to use in-house and commercial software and databases. New developments in Web services however are offering today’s researchers additional resources. Although cheminformatics admittedly lags far behind bioinformatics (where an enormous wealth of data and software is literally a click away), we are beginning to see some chemical resources in open access.
A goal for this eCheminfo program on "Web-based Services in Drug Design" is to present some of the possibilities of web-based tools and data and to lead into discussions on how can web services work for both the academic world and industry, while maintaining commercial, ip and security concerns? What potential impact could they have on discovery productivity? What are the best sustainable business models that can be applied to such services? How significant are the benefits of increased upstream and downstream knowledge flow due to services based on ontology frameworks? What are the key current hindrances to be overcome for the integration of web services into drug discovery in the chemical information area?
We are convening the following community of practice meeting sessions to both present the latest research advances in this area and to discuss the above issues. (As with all our meetings, if you cannot make the meetings you can access the seminars and discussions through the eCheminfo website by signing up for community membership.)
Applications of Web-based Services in Drug Discovery
eCheminfo InterAction Meeting Session, Philadelphia, 11 October 2005
chaired by Marc Nicklaus, (National Institutes of Health)
eCheminfo 2005 InterAction Meeting, 11-12 October 2005, Philadelphia, USA
Presenters & Discussion Leaders:
A Web-based Chemoinformatics System for Drug Discovery, Brett Tounge (Johnson & Johnson)
Web enabling technology for the design, enumeration, optimization and tracking of compound libraries, Brad Feuston (Merck)
ZINC web services - providing 3D structures of purchasable compounds for virtual screening to humans and machines, John Irwin (UCSF)
Pubchem, Steve Bryant (NCBI)
Search-and-query Information System for the Study and Discovery of Novel Agents in the Treatment of Cancer, David Covell (NCI)
Title TBA, Dmitrii Rassokhin (Johnson & Johnson)
Applications of Web-based Services in Drug Discovery
eCheminfo InterAction Meeting Session, Basel, 10 November 2005
chaired by Kim Henrick (European Bioinformatics Institute)
eCheminfo 2005 InterAction Meeting, 9-10 November 2005, Basel, Switzerland
Presenters & Discussion Leaders:
Investigating chemical trends in the context of ligand-protein complexes by using on-line data analysis directly on the web, Dimitris Dimitropoulos (European Bioinformatics Institute)
The Representation of Chemical Structures and its Application to Property Prediction, Johann Gasteiger (Universitaet Erlangen-Nuernberg)
Open Archives as a Route for the Capture, Dissemination and Access to Chemical Information, Simon Coles (University of Southampton)
Identification of biological units in protein crystals, Eugene Krissinel (European Bioinformatics Institute)
SWISS-MODEL Server and Repository: Web based resources for comparative protein structure modeling and their application in drug discovery, Torsten Schwede (University of Basel)
ABSTRACTS
PHILADELPHIA SESSION
A Web-based Chemoinformatics System for Drug Discovery
Brett A. Tounge, Johnson & Johnson Pharmaceutical Research and Development, L.L.C., Welsh and McKean Roads, PO Box 776, Spring House, PA 19477, USA
One of the key questions that must be addressed when implementing a chemoinformatics system is whether the tools will be designed for use by the expert user or by the “bench scientist”. This decision can impact not only the style of tools that are rolled out, but is also a factor in terms of how these tools are delivered to the end users. The system that we outline here was designed for use by the non-expert user. As such, the tools that we discuss are in many cases simplified versions of some common algorithms used in chemoinformatics. In addition, the focus is on how to distribute these tools using a web services interface, which greatly simplifies delivering new protocols to the end user.
Web enabling technology for the design, enumeration, optimization and tracking of compound libraries
Bradley P. Feuston(1)*,Subhas J. Chakravorty(1), John F. Conway(3),J. Christopher Culberson(1), Joseph Forbes(1), Bryan Kraker(2), Patricia A. Lennon(3), Craig Lindsley(5), Georgia B. McGaughey(1), Ralph Mosley(2), Robert P. Sheridan(2), Mario Valenciano(4) and Simon K. Kearsley(2)
(1) Molecular Systems Department, P.O. BOX 4, West Point, PA 19486
(2) Molecular Systems Department, P.O. BOX 2000, Rahway, NJ 07065
(3) Basic Research Systems, P.O. BOX 4, West Point, PA 19486
(4) Basic Research Systems, P.O. BOX 2000, Rahway, NJ 07065
(5) Medicinal Chemistry Department, P.O. BOX 4, West Point, PA 19486
Motivated by the need to augment Merck’s in-house small molecule collection, web-based tools for designing, enumerating, optimizing and tracking compound libraries have been developed. The path leading to the current version of this virtual library tool kit (VLTK) is discussed in context of the (then) available commercial offerings and the constraints and requirements imposed by the end users. Though the effort was initiated to simplify the tasks of designing novel, drug-like and diverse compound libraries containing between 2K-10K unique entities, it has also evolved into a powerful tool for outsourcing syntheses as well as lead identification and optimization. The web tool includes components that select reagents, analyze synthons, identify backup reagents, enumerate libraries, calculate properties, optimize libraries and finally track the synthesized compounds through biological assays. In addition to accommodating project specific designs and virtual 3D library scanning, the application includes tools for parallel synthesis, laboratory automation and compound registration.
To whom correspondence should be addressed.
Search-and-query Information System for the Study and Discovery of Novel Agents in the Treatment of Cancer
David Covell, National Cancer Institute, Building 1052, Rm 236, Frederick, MD 21702-1201, USA
The National Cancer Institute has maintained a panel of immortalized tumor cell lines since 1990 for the purpose of screening chemical agents as candidates in the search for more effective cancer treatments. During these past 25 years nearly 80,000 small synthetic compounds and an equal number of natural product extracts have been assayed in the NCI’s tumor screen. In parallel with this screening effort, the NCI has maintained a follow-up protocol comprised of secondary testing of active compounds in hollow-fiber and xenograft models. Parallel measurements of gene expressions within the NCI’s tumor cell panel have complemented this data as well as the development of nearly a dozen molecular target assays screened against a panel of ~200,000 small molecules. The data generated in these screens has been the subject of efforts to devise informatic-based methods for data mining. The product of this effort is embodied within the publicly accessible web tool at the url ‘spheroid.ncifcrf.gov’. The utilities within this web site represent a search-and-query information system for the study and discovery of novel agents in the treatment of cancer. The functionalities within this web utility allow interactions with a wide range of public databases and the NCI’s screening data. Presentation and discussion of these utilities will be made in the context of recent drug discovery explorations.
PubChem
Steve Bryant, NCBI
PubChem is a new online information resource from NCBI. The system provides information on the biological properties and activities of chemical substances. Following the sequence-deposition model followed by GenBank, PubChem's content is derived from user depositions of chemical structure and
bioassay data, including data from NIH's Molecular Libraries initiative. The retrieval system supports searches based on chemical names and chemical structure, as well as searches based on bioassay descriptions and activity information. It furthermore provides links to depositor sites, for further information, as well as links to other NCBI resources such as the PubMed literature database and Entrez's protein 3D structure database.
ZINC web services - providing 3D structures of purchasable compounds for virtual screening to humans and machines
John Irwin, Pharmaceutical Chemistry, UCSF, 1700 4th St, Suite 501D, San Francisco CA 94143-2550 USA
Despite the successes of virtual screening, and its growing use, there remain many barriers to entry for non-specialists wishing to use this technology. We created the ZINC database, a free collection of commercially available compounds for virtual screening to lower one of these barriers. We made ZINC more flexible and adaptive by creating web services. Molecules matching specific criteria, including chemical structures, may be searched, often in less than a minute. Results of datasbase searches may be reviewed in a web browser, the 3D structure of molecules may be viewed in a Jmol applet, and structures may be downloaded individually or en masse in popular formats. Because there will always be molecules that are not in ZINC, we allow users to upload and process their own molecules using the same protocols we use to prepare the database. The ZINC webserver can also handle queries in machine readable form using a well defined ontology.
BASEL SESSION
Open Archives as a Route for the Capture, Dissemination and Access to Chemical Information
Simon J. Coles, School of Chemistry, University of Southampton, Highfield, Southampton, Hampshire, SO17 1BJ, UK
Modern advances in high throughput synthesis, scientific analytical instrumentation and data analysis and mining techniques are presenting increasingly big challenges for chemical information management and discovery. Consequently, the conventional process of peer review of journal articles as the primary route for the dissemination of scientific data is unable to keep apace with these high rates of generation and is hindering the passage of this data to the public domain. The architecture and philosophy of the Open Archive presents a solution to both the data management and publication problems.
Recent work undertaken by the eBank-UK project (http://www.ukoln.ac.uk/projects/ebank-uk/) has been addressing the issue of dissemination of scientific data and uses the philosophy of the Open Archive Initiative (OAI) to solve this problem, whilst the R4L project (http://r4l.eprints.org) uses the same approach for laboratory data management. The UK National Crystallography Service (NCS) (http://www.ncs.chem.soton.ac.uk/) has developed an Open Archive infrastructure for crystal structure data (http://ecrystals.chem.soton.ac.uk) as an exemplar of this methodology.
All the data generated during the course of the crystal structure determination experiment is seamlessly or automatically captured, time-stamped for priority assertion purposes and deposited in a laboratory management repository. A report generation tool is then employed to collate all experimental information in the laboratory repository, based on a particular compound. This report is utilised to prepare a journal article, based on the experimental data, and both write ups are subsequently deposited in an Institutional Repository. The Institutional Repository publicises its data content to the internet through Open Archive Initiative (OAI) protocols, which allows aggregator services to harvest pertinent metadata. The aggregator search and discovery tools then provide seamless and unhindered access to the scientific reports and their underlying data, thus maximising efficient sharing of experimental chemical information.
Investigating chemical trends in the context of ligand-protein complexes by using on-line data analysis directly on the web
Dimitris Dimitropoulos, EMBL Outstation - Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
A prebuild relational data-warehouse with a rich set of relationships for the PDB and the PDB ligands, is the perfect environment for evaluating interesting research questions that may potentially reveal links between ligand chemistry, and the protein environment characteristics. In the MSD search database several common functional groups have been identified for ligands and associated on the atom level to the occurrences of the bound molecules in PDB entries using a consistent nomenclature. These are used as a starting point in order to explore the chemistry relationships to binding site information, secondary structure, and protein classification. As an example we will demonstrate how the MSD-mine tool for on-line data analysis and mining over the web, can be used to examine potential preference of functional fragment distribution towards particular SCOP domains, and in a separate example the contribution of different fragment areas in the binding site activity. The MSD-mine tool and the example scenarios are accessible from http://www.ebi.ac.uk/msdmine.
Identification of biological units in protein crystals
Eugene Krissinel & Kim Henrick, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
PDB entries of protein structures solved by means of X-ray diffraction on protein crystals represent asymmetric units (ASU) of the crystals. In most instances, ASU may be chosen in many different ways and they do not necessarily coincide with the biological units, or stable protein assemblies that perform certain physiological functions. It is reasonable to expect that protein assemblies merge, rather than transform, during the crystallisation, therefore protein crystals do carry rather valuable data on the composition and geometry of biological units. Given that nearly 80% of PDB entries are obtained by means of protein crystallography and that direct experimental identification of assembly structure is difficult, detection of biological units in protein crystals is of considerable practical interest.
We propose a new approach to the problem from general principles of chemical thermodynamics, which is different from previous attempts [1,2] based on scoring of individual protein interfaces in crystal. We perform an exhaustive graph-theoretical search of all assemblies that are possible in a given crystal, and leave only those that appear to be thermodynamically stable. The stability estimate is based on the consideration of protein affinity and entropy change upon dissociation. Applied to PDB entries with oligomeric states known from the literature, our method gives 89% of correct predictions, which is higher than previously reported [2].
The method is implemented as a publicly available web server (http://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver), which provides the assembly-related data for all PDB entries of structures solved by X-ray diffraction. The server can take PDB and mmCIF-formatted coordinate files for upload and calculate protein assemblies in real time (a few minutes in most instances). Detail, on-residue level, data on protein interactions, solvation energies, surface areas, hydrogen bonds and salt bridges are provided on output. Probable dissociation patterns of stable assemblies are also calculated. The calculated assemblies and individual crystal contacts (interfaces) may be visualised using the Rasmol software. The server includes a search facility for the identification of structurally equivalent protein interfaces in the PDB archive.
[1] Henrick, K.; Thornton, J. Thrends Biochem. Sci., 1998, 23, 358.
[2] Ponstingl, H.; Kabir, T.; Thornton, J. J. Appl. Cryst. 2003, 36, 1116.
SWISS-MODEL Server and Repository: Web based resources for comparative protein structure modeling and their application in drug discovery
Jürgen Kopp, Lorenza Bordoli, Konstantin Arnold, Markus Meuwly, Vincent Zoete, Hólmfríður B. Þorsteinsdóttir, and Torsten Schwede, University of Basel (Switzerland) & Swiss Institute of Bioinformatics
One of the bottlenecks of structure-based drug design is the availability of experimentally determined protein structures. Today, the number of structurally characterized proteins is about two orders of magnitude smaller than the number of known protein sequences, i.e. no direct experimental structural information is available for the vast majority of protein sequences. Theoretical methods for protein structure prediction are aiming to bridge this structure knowledge gap. As shown during the biannual CASP experiments, homology modeling is the only computational approach, that can generate accurate three-dimensional models for a protein for successful application in structure based drug development. The SWISS-MODEL Server and Repository have been developed to provide instant web-based access to annotated models generated by automated homology modeling, bridging the gap between sequence and structure databases.
Validation of homology models for drug discovery applications is a crucial aspect, and one important question is how errors and inaccuracies of the homology models affect the molecular modeling of protein-ligand interactions. We used a MM-GBSA approach to compute the binding free energy of interaction of 16 HIV-1 protease inhibitor complexes in experimental and model structures. Using this system, we can introduce systematic errors in the protein model to simulate the typical inaccuracies that occur during homology modeling to quantify the effect on ligand binding affinity and ranking of inhibitors.
References:
Schwede T, Kopp J, Guex N, and Peitsch MC (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Research 31, 3381-338.
Kopp J, and Schwede T (2004). The SWISS-MODEL Repository of annotated three-dimensional protein structure homology models. Nucleic Acids Research 32 , D230-D234.
Contact: Torsten Schwede
The Representation of Chemical Structures and its Application to Property Prediction
Johann Gasteiger, Computer-Chemie-Centrum, Universität Erlangen-Nürnberg
D-91052 Erlangen, Germany
The relationships between the structure of a chemical compound and many of its physical, chemical, or biological properties are too complex to be calculated from first principles. This is particularly true for the biological activity of a compound. In this situation an indirect approach has to be taken.[1] First, chemical descriptors have to be calculated for the molecular structure. Then, a relationship between these structure descriptors and the property to be predicted has to be established by inductive learning methods such as statistical or pattern recognition methods or neural networks. Methods for calculating structure descriptors available on the internet will be presented. Furthermore, program packages for data analysis and data mining that can be accessed on the web will be indicated. Of crucial importance for property modeling is the availability of data. Fortunately, increasingly, data become available on the internet. Some applications in modeling properties such as solubility or toxicity will be presented.
[1] Chemoinformatics – A Textbook , J. Gasteiger, T. Engel (Editors), Wiley-VCH,
Weinheim, 2003.
Barry Hardy
eCheminfo Community of Practice Manager
Douglas Connect, Switzerland
web services
pharma
cheminformatics
Semantic Web
Drug Discovery
echeminfo
conference
events
Recent Comments