Sequence Families

This BioCompanion copy is a demo version . This section is dedicated to the use of multiple sequence tools. In contrast to patterns and pairwise alignments, these tools work on all sequences simultaneously .


Principle of Multiple Sequence Alignment

Once a sequence search is completed, the question arises whether the found similarities do share a similarity amongst each other. This can be achieved in either automatic or manual fashion by using programs which will align the sequences of interest.

Prerequisites

Usually, sequences of different origin share similarity only in parts. This has possibly become clear in a previous exercise. The result of the fact that the sequences might be in different locations of each database or sequence entry will leave the ends or overhang parts of two sequences badly aligned due to low similarity. Therefore, before alignments are attempted, it is a good practice to create sequence fragments of approximately the same length which will allow programs to operate more easily.

NOTE: If sequences are not specifically taylored for multiple sequence alignment, programs might fail or report alignments unreliably.

To benefit most from multiple sequence alignment capabilities, you should have the SeqLab environment available to you. An earlier section of this BioCompanion informed about the prerequisites for using SeqLab.

Finding the Best

The approach used for automatic sequence alignment can be described as "clustering" of the most similar sequences. In a first step, the program will need to find the sequence pair(s) which share(s) the most obvious similarity. To achieve this, each sequence is compared to each, which results in (n*n)/2 comparisons if we have n sequences to compare. As in rigorous sequence searching, a comparison is made using sequence comparison tables to compute the best possible alignment and score this appropriately. Note that the scores will be biased if the sequences have not been tailored as mentioned above, and an alignment approach will possibly yield unexpected results.

A more visual approach to find similar sequences can be used with SeqLab if you use the feature coloring method. It should be kept in mind, however, that SeqLab is only a vizualisation aid, and any automatic multiple sequence alignments will still need the other steps as described.

Grouping

Once the comparison for each possible sequence pair has been completed , the "best" candidates serve as nuclei, and additional sequences are aligned to the already existing alignment. This will work well with similar proteins but too many gaps, in particular on DNA level, will most probably not yield the desired result. The largest errors will occur if regions with low similarity are used as "closest" set, as these will cause trouble for additional sequences to be matched.

If problems are encountered because similarity cannot be determined well enough automatically, either manual alignment is required or the selection of sequences must be improved by tailoring or omission of very remotely related fragments.

The major enhancement of GCG Version 9 (1996/7) is the edit mode of the Wisconsin Package Interface (WPI), which changed its name to SeqLab . One of the strengths of SeqLab is its sophisticated way to facilitate the grouping of sequences in semi-automatic fashion, see below for details .

Result Evaluation

The result of a multiple sequence alignment will be a block of sequences which are nicely painted on top of each other. Programs exist which will plot the degree of similarity along the sequence coordinate. Other programs allow to print or paint the output nicely. The GCG programs also produce a figure which schematically displays the level of similarity as a dendrogram. As outlined below, the dendrogram produced by an alignment program which illustrates sequence similarity must not mistakenly be interpreted as phylogenetic tree, however, can be used to verify that the alignment proceeded as expected. It is possible to apply heuristic methods to such an alignment which will allow a phylogenetic tree approximation as described below .

Limitations

Multiple Sequence Alignment is NOT the tool for you if you are working on fragment assembly or shotgun sequencing . In order to align multiple sequences reliably, the similarity amongst the members of the alignment should be extensive along the entire length rather than only overlapping fragments.


JAMF source file: seqlab.jam
Next file in HTML: 'SeqLab - Editing Multipe Sequence Alignments Interactively'

[next page] , or [overview] , or [table of contents]