Biology
Essay by review • November 4, 2010 • Essay • 296 Words (2 Pages) • 1,208 Views
CS5238 Combinatorial methods in bioinformatics 2004/2005 Semester 1
Lecture 8: Finding structural similarities among proteins (II)
Lecturer: Prof Jean-Claude Latombe
Scribe: Cheng Chi Kan, Lee Pern Chern and Moritz Buck
1 Voting scheme with hash table
Many-to-many comparisons are evaluated when we align protein structures. In order to avoid repetition, a better
organization of computation is necessary. This could be achieved by pre-computing the indexes of proteins and
arranging them in a hash table. Then, queries are evaluated based on a voting scheme using the hash table. This
voting scheme replaces the seed generation process.
In this lecture, we look into the voting scheme used in 3dSEARCH [2]. The algorithm is based on the concept of
geometric hashing [1] developed in the eld of computer vision. The basic idea is to represent all secondary structure
elements (SSEs) from all target proteins with a large, highly redundant hash (or index) table. Once the table has been
constructed, every SSE from a given query structure can be compared simultaneously to the entire set of SSEs of the
target structures, by indexing the SSE into the table.
The hash table that consists of 3-dimensional regular grid of cube bins ( 2A) is constructed as follows. For each
target structure, we compute a coordinate system P for every pair of vectors (i; j) (Figure 1). Then, we transform all
y
z j
i
x
Figure 1: The coordinate systems for the vectors (i; j). The z-axis is coincident with i and the y-axis is parallel to j.
remaining vectors in the protein to the coordinate system P. At last, for each remaining vector k, we place an entry
(with the orientation,
...
...