This BioCompanion copy is a demo version .
This section is dedicated to the use of multiple
sequence tools. In contrast to patterns and pairwise alignments, these tools work on all sequences
simultaneously .
Once a sequence search is completed, the question arises whether the found similarities do share
a similarity amongst each other. This can be achieved in either automatic or manual fashion by
using programs which will align the sequences
of interest.
Usually, sequences of different origin share similarity only in parts. This has possibly become
clear in a previous exercise. The result of the fact that the sequences might be in different
locations of each database or sequence entry will leave the ends or overhang parts of two sequences
badly aligned due to low similarity. Therefore, before alignments are attempted, it is a good
practice to create sequence fragments of approximately the same length which will allow programs
to operate more easily.
NOTE: If sequences are not specifically taylored for multiple sequence alignment, programs
might fail or report alignments unreliably.
To benefit most from multiple sequence alignment capabilities, you should have the
SeqLab environment available to you. An earlier section of this BioCompanion
informed about the prerequisites for using SeqLab.
The approach used for automatic sequence alignment
can be described as "clustering" of the most similar sequences. In a first step, the program
will need to find the sequence pair(s) which share(s) the most obvious similarity. To achieve
this, each sequence is compared to each, which results in
(n*n)/2 comparisons if we have n sequences to compare. As in
rigorous sequence searching, a comparison is made using sequence comparison tables to compute
the best possible alignment and score this appropriately. Note that the scores will be biased
if the sequences have not been tailored as mentioned above, and an alignment approach will possibly
yield unexpected results.
A more visual approach to find similar sequences can be used with
SeqLab if you use the feature coloring method. It should be kept in mind,
however, that SeqLab is only a vizualisation aid, and any automatic multiple sequence alignments
will still need the other steps as described.
Once the comparison for each possible sequence pair has been completed , the "best" candidates
serve as nuclei, and additional sequences are aligned to the already existing alignment. This
will work well with similar proteins but too many
gaps, in particular on DNA level, will most probably not yield the desired result. The largest
errors will occur if regions with low similarity are used as "closest" set, as these will cause
trouble for additional sequences to be matched.
If problems are encountered because similarity cannot be determined well enough automatically,
either manual alignment is required or the selection of sequences must be improved by tailoring
or omission of very remotely related fragments.
The major enhancement of GCG Version 9
(1996/7) is the edit mode of the Wisconsin Package Interface (WPI), which changed
its name to SeqLab . One of the strengths of SeqLab is its sophisticated way
to facilitate the grouping of sequences in semi-automatic fashion, see
below for details .
The result of a multiple sequence alignment will be a block of sequences which are nicely painted
on top of each other. Programs exist which will plot
the degree of similarity along the sequence coordinate. Other programs allow to print or paint
the output nicely. The GCG programs also produce a figure which schematically displays the level of similarity
as a dendrogram. As outlined below, the dendrogram produced by an alignment program
which illustrates sequence similarity must not mistakenly be interpreted as phylogenetic tree,
however, can be used to verify that the alignment proceeded as expected. It is possible
to apply heuristic methods to such an alignment which will allow a phylogenetic tree approximation
as described below .
Multiple Sequence Alignment is NOT the tool for you if you are working on
fragment assembly or shotgun sequencing . In order to align multiple sequences reliably,
the similarity amongst the members of the alignment should be extensive along the entire length
rather than only overlapping fragments.
[next page] , or [overview] , or [table of contents] Principle of Multiple Sequence Alignment
Prerequisites
Finding the Best
Grouping
Result Evaluation
Limitations
JAMF source file: seqlab.jam
Next file in HTML:
'SeqLab - Editing Multipe Sequence Alignments Interactively'