MSDsite: The technology used in database searching and retrieval for the analysis and viewing of bound ligands and active sites
Kim Henrick, the Macromolecular Structure Database (MSD) group leader at the European Bioinformatics Institute, will present the following seminar at the eCheminfo 2004 Web conference (http://echeminfo.com), 8-19 November 2004:
MSDsite: behind the scene: The technology used in database searching and retrieval for the analysis and viewing of bound ligands and active sites
Adel Golovin, Dimitris Dimitropoulos, Tom Oldfield, and Kim Henrick, EMBL Outstation, The European Bioinformatics Institute, Welcome Trust, Genome Campus, Hinxton, Cambridge CB10 1SD, UK
The three-dimensional environments of ligand binding sites have been derived from the parsing and loading of the PDB entries into a relational database. We will introduce the web-based query system, MSDsite (http://www.ebi.ac.uk/msd-srv/msdsite) and demonstrate the technologies used. Non-trivial textual queries are facilitated by use of a graphical query interface where search attributes can be specified with dialog boxes to build complex queries. The interface was built with a biological content in mind and Ligand searching requires specific tasks that can't be resolved using simple basic SQL relational operators such as join or merge. For fast executable in Oracle query generation a web application server has been developed that contains java classes that composes complex SQL queries andprovide calculations on the dataset. In addition these classes design the SQL to enforce the correct use of indexes, and apply query hints. The SQL queries are also designed to use the relational algebra operation 'INTERSECT', which allows execution within the Oracle RDBMS in parallel without nesting and is faster than self joins for those cases where the result set is many times less than the size of the table queried. The approach used is based on a star architecture query where the ligand is central and interactions to the environment residues fan out. This design is used because it results in an algorithm of order 'N' with regard to the number of environment residues and is therefore scalable for complex active sites. The levels of optimisation developed will be described wherein table hierarchy is reflected within the query design. The manner in which MSDsite applies pattern searching and short sequence alignment is performed using SQL queries will also be described where we use Oracle hint mechanism, 'LEADING' and 'USE_NL' to implicitly force access to nested tables by the primary key.
Barry Hardy
Douglas Connect
www.douglasconnect.com
Cheminformatics & Chemical Modelling in Drug Discovery: http://echeminfo.com/
Cheminfostream Blog: http://barryhardy.blogs.com/cheminfostream/
Recent Comments