PrettyPlot displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment, it simply displays it.
PrettyPlot is an improved version of the GCG program Pretty which produces a boxed sequence alignment as graphics output in addition to the standard text file. PrettyPlot was originally written to produce publication-quality sequence alignment output. There are also several enhancements in the way the consensus sequence is calculated and in the options for sequence display.
PrettyPlot prints and plots sequences with their columns aligned. This utility is used after a number of sequences have had gaps added to make them all align. PrettyPlot s output allows you to look at relationships among several sequences. You should use a file of sequence names to define the sequences you want PrettyPlot to display (see Appendix VI). Although a specification such as "*.pep" is also accepted you would not be able to use the various weighting and naming options described below.
You can change the alignments displayed by PrettyPlot with a text editor. The output from PrettyPlot can then be separated into individual sequence files by running PrettyPlot with the command line option -UGLy.
This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org). Additional code and suggestions were provided by David Mathog at Caltech, and by Jaakko Hattula of Tampere University of Technology, Finland.
The original Pretty program was designed with the help of Ann Palmenberg of the UW Biophysics lab. The sequences in the example were aligned for Dr. Palmenberg's work, and were used in the GCG manual.
The original suggestions for the PrettyPlot program were from Denis Duboule and Sigfried Labeit at EMBL. Gert Vriend added the star marking. Rita Grandori suggested the -NOCOLLISION option.
By repeatedly using the program Gap with the command line option -OUT, gaps were added to a group of picorna virus capsid proteins in the antigenic region to make them align with each other and with a growing consensus sequence.
This procedure can be replaced by any multiple sequence alignment procedure such as Mali (Martin Vingron), Clustal (Des Higgins), MSE (Will Gilbert) or the new PileUp program in GCG version 7. The resulting sequence files must be converted to GCG format before use in PrettyPlot typically by extracting the individual sequences with an editor, creating individual sequence files beginning with ".." on the first line, and converting to GCG format with the Reformat program.
% prettyplot -Consensus -LineSize=90 PRETTYPLOT uses any sequences PRETTYPLOT of what sequence(s) ? @pretty.list Start (* 1 *) ? End (* 349 *) ? Fa10.Ugly, len: 349 Fa12.Ugly, len: 349 /////////////////// R14.Ugly, len: 349 R2.Ugly, len: 349 Find consensus to what minimum plurality (* 3.3 *) ? %
Here is part of the text output file:
Plurality: 2.00 Threshold: 1.00 AveWeight 0.55 AveMatch 0.54 AvMisMatch -0.40 PRETTY of: @Pretty.Fil February 6, 1989 19:25 .. 1 50 Fa10.Ugly .......... .......... .......... ..TTttGESA D.PvtTtVE. Fa12.Ugly .......... .......... .......... ..TTatGESA D.PvtTtVE. Fo1k.Ugly .......... .......... .......... ..TTsaGESA D.PvtTtVE. E.Ugly Gvenae.kgV tEnTna.Tad fvaqpvyLPE .nqT...... kV.AFfynrs P1m.Ugly GlgqmlEsmI .DnTvreTvg AatsrdaLPn teasGPthSk EIPALTAVET P1s.Ugly GlgqmlEsmI .DnTvreTvg AatsrdaLPn teasGPahSk EIPALTAVET P2s.Ugly GigdmIEgaV .Egitknalv pptstnsLPg hkpsGPahSk EIPALTAVET P3s.Ugly GiedlIseva .qgal..Tls lpkqqdsLPD tkasGPahSk EVPALTAVET Cb3.Ugly ...gpVEdaI .......T.. Aaigr..vaD tvgTGPtnSe aIPALTAaET R14.Ugly GlgdelEevI vEkT.kqTv. Asi....... ..ssGPkhtq kVPiLTAnET R2.Ugly ...npVEnyI dEvlnevlv. .......vPn inssnPttSn saPALdAaET Consensus G----VE--I -E-T---T-- A------LPD --TTGPGESA D-PALTAVET /////////////////////////////////////////////////////////////////
The graphics version of the output is shown below:
LineUp is a screen editor for editing multiple sequence alignments. You can edit up to 30 sequences simultaneously. New sequences can be typed in by hand or added from existing sequence files. A consensus sequence identifies places where the sequences are in conflict. Pretty displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it. PileUp creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment.
Mali is Martin Vingron's Multiple Alignment program. Clustal is Des Higgins' multiple alignment program. MSE is Will Gilbert's multiple alignment editor. All these programs are available from the EBI Network File Server (NETSERV@ebi.ac.uk).
If you run Gap with the command line options for sequence output, it will write sequence files with the sequences expanded by the addition of gaps. Only two sequences can be aligned at once.
PrettyPlot displays sequences which have already been aligned. You can use up to 500 sequences with up to 2,000,000 symbols in total unless your site has increased the limits for GCG.
If you use one of the command line options -CONsensus, -DIFferences, or -CASe, PrettyPlot calculates a consensus for the column using a symbol comparison table called PrettyPep.Cmp for peptide or PrettyDNA.Cmp for nucleic acids. The consensus is found by finding the symbol in the column for which its comparison to all of the symbols in the column (including itself) yields the greatest number of votes. A vote is cast for each symbol comparison that is over some set threshold value. The votes can be either 1.0 or some "vote weight" assigned to the sequence from which the vote comes.
If there is no coalition of votes that is larger than all of the other coalitions or if the largest coalition is below the minimum plurality, then there is a choice of consensus for the column. By default, no consensus is then displayed. The -NOCOLLision option makes PrettyPlot box all possible consensus matches, and choose the first one found for use in the consensus sequence.
The weights for each sequence, the threshold, and the minimum plurality are all real numbers.
If you use -CASe, PrettyPlot will show the members of the winning coalition in upper case and others in lower case in the text output file. The graphics output will always be in upper case.
If you use -DIFferences, PrettyPlot will suppress the members of the winning coalition and show all the other positions in lower case.
If you use -CONsensus, PrettyPLot will add a line to your alignments with a "consensus" sequence. The consensus is the symbol that had the largest number of votes (vote weights) in the column. The consensus is included in both the text and graphics versions of the output.
Since different symbols could contribute to a consensus for either -CASe or -DIFferences, such a consensus will not necessarily define a consensus symbol for the consensus sequence row.
For example, in the default comparison matrix, aspartate (D) and glutamate (E) have a score of over 1.0 so they are considered to match. If an alignment has five D and five E residues at position 15, they are all considered to match for the -CASe and -DIFferences options, but neither is in the majority for defining a consensus.
To resolve these conflicts, PrettyPlot (but not Pretty) has a command line option -NOCOLLision which simply uses the first residue it finds when there is an equal choice.
determines the symbol comparison value below which a symbol may not vote for a coalition. Please note that in the default comparison table an exact match between two amino acid residues scores 1.5, and that some other pairs (D and E, W and Y, L and F for example) are also, by default, considered to match. You should specify -THReshold= 1.5 to force exact matches only.
defines the number of votes (vote weights) below which there will be no consensus. The default value is just over half the total weight. By default, each sequence has a vote of 1.0 (see threshold) in creating the consensus.
If several of your sequences are very similar, you may not want their votes to dominate the consensus for the column. If your input file specification to PrettyPlot is a file of file names, you can assign each sequence a vote weight by adding a number to the line after the sequence name. The vote weight is the vote that each row casts for the consensus. Here is the file of file names used to run the example above. Note how each kind of sequence is assigned a vote weight so that their combined impact on the election is never more than one vote.
Multiple sequence alignments are best represented with files of sequence names. For PrettyPlot these files may include a vote weight as a column of numbers. Here is the input file (pretty.list) from the example session:
A multiple sequence alignment represented as a list file for input to the programs PRETTY, PROFILEMAKE and LINEUP. 7/30/94 .. GenDocData:fa10.ugly wgt: 0.5 GenDocData:fa12.ugly wgt: 0.5 GenDocData:fo1k.ugly wgt: 1.0 GenDocData:e.ugly wgt: 1.0 GenDocData:p1m.ugly wgt: 0.25 GenDocData:p1s.ugly wgt: 0.25 GenDocData:p2s.ugly wgt: 0.25 GenDocData:p3s.ugly wgt: 0.25 GenDocData:cb3.ugly wgt: 1.0 GenDocData:r14.ugly wgt: 0.5 GenDocData:r2.ugly wgt: 0.5
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum Syntax: % prettyplot [-INfile=]@Pretty.Fil -Default Prompted Parameters: -BEGin=1 -END=349 range of interest [-OUTfile=]pretty.pretty output file [-OUTfile2=]pretty.fil ugly format output file Local Data Files: [-DATa=]prettydna.cmp consensus comparison table for Nucleotides [-DATa=]prettypep.cmp consensus comparison table for Proteins -STAR=pretty.star file of positions to be marked with asterisk Optional Parameters: -CONsensus generates (displays) a consensus sequence -DIFferences[="-"] only shows positions disagreeing with the consensus -CASe shows positions agreeing with consensus in upper case -THReshold=1.0 sets min value for symbol to vote in consensus -PLUrality=2.0 defines the minimum number of votes for a consensus -LINESize=50 sets the number of residues per line -DENSity=50 same as LINESize -BLOcksize=10 sets the number of residues per block -UGLy writes the individual sequences into new files -VOTes=matrix.cmp alternative local data file (can also use -DATa as above) -NOTEXT no text output file -NOPLOT no graphics output file -NOBOX no boxes drawn (use with color modes) -NOSEQNUMber no sequence numbering on right of plot -NONAME no sequence name on left of plot -NOTITLE no title at top of plot -TOPNUMber=Consensus number every 10th position in named sequence or consensus -STARSEQ=Consensus sequence positions used for asterisk -STAR=Pretty.Star file of sequence positions to be marked with "*" -NOCOLLisions allows more than one alternative consensus residue -NOSHORTname full filename or MSF file and entry name shown Coloring of residues, in order of priority: -DOCOLors highlight residues in color -BLACKaa=X residues to color black -GREENaa=FLMWYIV residues to color green -BLUEaa=RKH residues to color blue -REDAa=DE residues to color red (-RED is too short) -CYANaa=X residues to color cyan -YELLOWaa=AG residues to color yellow -VIOLETaa=P residues to color violet Alternative coloring of residues, in order of priority: -CCOLors highlight quality of consensus match -CONSCOLor highlight quality of consensus match -CCONsensus=RED colour for residues on consensus line -CIDentity=RED colour for identity to consensus -CSImilarity=GREEN colour for similarity to consensus -COThers=BLACK colour for other residues
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
If you use one of the command line options -CONsensus, -DIFferences, or -CASe, PrettyPlot calculates a consensus for each column using a symbol comparison table (Appendix II) . You can provide your own table called either PrettyPep.Cmp for peptides or PrettyDNA.Cmp for nucleic acids. You can define some other table with the command line specification -VOTes=FileName.
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
specifies the number of sequence symbols to display on each line. Typically a linesize of 50 is used for text output, but for graphics values of 100 of more can be more useful.
specifies the number of sequence symbols to put into each block in the text output file. The graphics output is never split into blocks.
causes PrettyPlot to show a consensus sequence for the set of sequences you are displaying. Read how PrettyPlot finds the consensus above.
causes PrettyPlot to print only the positions that did not vote with the winning consensus and to print blanks at all other positions. If an optional character is added PrettyPlot will use that character at all of the positions that agree with the consensus. The '-' character has to be enclosed in quotes if it is the last character in a command or the VMS command interpreter will think you are starting a new command line.
causes PrettyPlot to print all of the positions in each column that voted with the winning coalition in upper case and to print all other positions in lower case. This option overrides -DIFferences if both are used, and only applies to the text output file. In the graphics output, all positions are in upper case.
determines the symbol comparison value below which a symbol may not vote for a coalition. See the topic called CALCULATING A CONSENSUS.
defines the number of votes (vote weights) below which there will be no consensus. See the topic called CALCULATING A CONSENSUS.
rewrites the sequences in a PrettyPlot text output file into individual sequence files in GCG format. The PrettyPlot output file must have a line with two periods ("..") separating the text in the heading from the sequences. -UGLy also causes PrettyPlot to write a file of file names to go with the new sequence files.
writes a text output file (the same as Pretty) as well as graphics.
cancels the graphics output.
removes the sequence numbering from the graphics output.
removes the sequence names from the graphics output.
removes the title lines from the graphics output.
numbers every 10th position in the alignment, or every 10th position in the consensus sequence.
reads a file of sequence positions to be marked with an asterisk in the graphics output. The default file name is the same as the input file with the extension ".Star". The Star file format is a heading of free text ending with "..", then every number on the remaining lines is used as a sequence position to be marked.
marks each position listed in the Star file with an asterisk, either using the consensus sequence ("=Consensus" ) or one of the sequence fragments as a base for the sequence position numbering.
allows positions where there are alternative consensus residues to have all the possible consensus resides boxed in preference to the default behaviour of boxing none. This is only importarnt where the consensus plurality is less than half of the total sequence voting weights, but this is by default often the case as the plurality is 2.0 and each sequence has a vote of 1.0 towards the consensus.
uses the full filename (or the MSF file name and entry name).
tells PrettyPlot to highlight selected residues in color, according to the qualifier values below. The colors are searched in the order: Black, Green, Blue, Red, Cyan, Yellow, Violet. Setting any residue to one of the earliest colors overrides any later setting.
specifies residues to be black (the default) on a color plot.
specifies residues to be green on a color plot.
specifies residues to be blue on a color plot.
specifies residues to be red on a color plot. For RED but not for any other colour, the "aa" part of the qualifier is required. This is because GCG have a qualifier -REDuce which their graphics library uses, and it clashed with the EGCG qualifier if the name is shorter.
specifies residues to be cyan (by default none are) on a color plot.
specifies residues to be yellow on a color plot.
specifies residues to be violet on a color plot.
tells PrettyPlot to highlight residues according to how well they match the consensus sequence. The colours used are black red for a perfect match, green for residues similar to the consensus and blue for other residues.
specifies residues identical to the calculated consensus to be plotted in red.
specifies residues similar to the calculated consensus to be plotted in green. Similarity is determined by comparison matrix value higher than the threshold set by -THReshold (default 1.0).
specifies residues conficting with the calculated consensus to be plotted in blue.
Printed: April 22, 1996 15:55 (1162)