PrettyBox displays multiple sequence alignments as shaded boxes in Postscript format (e.g., the output file must be printed and/or displayed on a Postscript-compatible device). PrettyBox will optionally calculate a consensus sequence. The program does not create the alignment; it simply displays it.
PrettyBox is an improved version of the GCG program Pretty which produces a boxed sequence alignment as graphics output in addition to the standard text file. PrettyBox was originally written to produce publication-quality sequence alignment output. There are also several enhancements in the way the consensus sequence is calculated and in the options for sequence display.
PrettyBox prints and plots sequences with their columns aligned. This utility is used after a number of sequences have had gaps added to make them all align. PrettyBox s output allows you to look at relationships among several sequences.
This program was written by Rick Westerman (E-mail: westerm@aclcb.purdue.edu Post: Ag Campus Laboratory for Computational Biology, BCHM Building, Purdue University, West Lafayette, IN 47907, USA), and modified for EGCG by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
The original Pretty program was designed with the help of Ann Palmenberg of the UW Biophysics lab. The sequences in the example were aligned for Dr. Palmenberg's work, and were used in the GCG manual.
By repeatedly using the program Gap with the command line option -OUT, gaps were added to a group of picorna virus capsid proteins in the antigenic region to make them align with each other and with a growing consensus sequence.
This procedure can be replaced by any multiple sequence alignment procedure such as Mali (Martin Vingron), Clustal (Des Higgins), MSE (Will Gilbert) or the new PileUp program in GCG version 7. The resulting sequence files must be converted to GCG format before use in PrettyBox typically by extracting the individual sequences with an editor, creating individual sequence files beginning with ".." on the first line, and converting to GCG format with the Reformat program.
% prettybox -SeqName=Partial PRETTYBOX uses any sequences PRETTYBOX of what sequence(s) ? @pretty.list Fa10.Ugly, len: 349 Fa12.Ugly, len: 349 /////////////////// R14.Ugly, len: 349 R2.Ugly, len: 349 Start (* 1 *) ? End (* 349 *) ? Orient output as: L) Landscape P) Portrait Please choose one (* L *) ? Display a consensus (* No *) ? Find consensus to what plurality (* 5.6 *) ? Do numbering on: R) Right side T) Top side N) None Please choose one (* R *) ? %
This is the plot from the example session
LineUp is a screen editor for editing multiple sequence alignments. You can edit up to 30 sequences simultaneously. New sequences can be typed in by hand or added from existing sequence files. A consensus sequence identifies places where the sequences are in conflict. Pretty displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it. PileUp creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. PrettyPlot displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment, it simply displays it.
Mali is Martin Vingron's Multiple Alignment program. Clustal is Des Higgins' multiple alignment program. MSE is Will Gilbert's multiple alignment editor. All these programs are available from the EBI Network File Server (NETSERV@ebi.ac.uk).
If you run Gap with the command line options for sequence output, it will write sequence files with the sequences expanded by the addition of gaps. Only two sequences can be aligned at once.
PrettyBox displays sequences which have already been aligned. You can use up to 500 sequences with up to 2,000,000 symbols in total unless your site has increased the limits for GCG.
The graphics output must be PostScript for PrettyBox to work. No other graphics drivers are supported.
If you use one of the command line option -CONsensus, PrettyBox calculates a consensus for the column using a symbol comparison table called PrettyPep.Cmp for peptide or PrettyDNA.Cmp for nucleic acids. The consensus is found by finding the symbol in the column for which its comparison to all of the symbols in the column (including itself) yields the greatest number of votes. A vote is cast for each symbol comparison that is over some set threshold value. The votes can be either 1.0 or some "vote weight" assigned to the sequence from which the vote comes.
If there is no coalition of votes that is larger than all of the other coalitions or if the largest coalition is below the minimum plurality, then there is a choice of consensus for the column. By default, no consensus is then displayed.
The weights for each sequence, the threshold, and the minimum plurality are all real numbers.
If you use -CONsensus, PrettyBox will add a line to your alignments with a "consensus" sequence. The consensus is the symbol that had the largest number of votes (vote weights) in the column. The consensus is included in both the text and graphics versions of the output.
Since different symbols could contribute to a consensus for either -CASe or -DIFferences, such a consensus will not necessarily define a consensus symbol for the consensus sequence row.
For example, in the default comparison matrix, aspartate (D) and glutamate (E) have a score of over 1.0 so they are considered to match. If an alignment has five D and five E residues at position 15, they are all considered to match for the -CASe and -DIFferences options, but neither is in the majority for defining a consensus.
determines the symbol comparison value below which a symbol may not vote for a coalition. Please note that in the default comparison table an exact match between two amino acid residues scores 1.5, and that some other pairs (D and E, W and Y, L and F for example) are also, by default, considered to match. You should specify -THReshold= 1.5 to force exact matches only.
defines the number of votes (vote weights) below which there will be no consensus. The default value is just over half the total weight. By default, each sequence has a vote of 1.0 (see threshold) in creating the consensus.
If several of your sequences are very similar, you may not want their votes to dominate the consensus for the column. If your input file specification to PrettyBox is a file of file names, you can assign each sequence a vote weight by adding a number to the line after the sequence name. The vote weight is the vote that each row casts for the consensus. Here is the file of file names used to run the example above. Note how each kind of sequence is assigned a vote weight so that their combined impact on the election is never more than one vote.
Multiple sequence alignments are best represented with files of sequence names. For PrettyBox these files may include a vote weight as a column of numbers. Here is the input file (pretty.list) from the example session:
A multiple sequence alignment represented as a list file for input to the programs PRETTY, PROFILEMAKE and LINEUP. 7/30/94 .. GenDocData:fa10.ugly wgt: 0.5 GenDocData:fa12.ugly wgt: 0.5 GenDocData:fo1k.ugly wgt: 1.0 GenDocData:e.ugly wgt: 1.0 GenDocData:p1m.ugly wgt: 0.25 GenDocData:p1s.ugly wgt: 0.25 GenDocData:p2s.ugly wgt: 0.25 GenDocData:p3s.ugly wgt: 0.25 GenDocData:cb3.ugly wgt: 1.0 GenDocData:r14.ugly wgt: 0.5 GenDocData:r2.ugly wgt: 0.5
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum syntax: % prettybox [ -INfile=]@pretty.list -Default Prompted Parameters: -BEGin=1 -END=349 Range of interest -ORIentation=land Which direction the printing will be, Landscape or Portrait -NUMbering=right Print sequence numbering to Right side, Top side, or None -[NO]CONsensus Displays a consensus sequence Local Data Files: -DATa=prettydna.cmp Consensus comparison table for Nucleotides -DATa=prettypep.cmp Consensus comparison table for Proteins Optional Parameters: -OUTput=file.name Normally output goes to the PlotPort (postscript-compatible) -PROtein Insists that your sequences are proteins, not nucleic acids -SIMilar Consider similarity in generating a consensus; most useful with proteins -[No]OFFset If to offset the consensus line from the other sequences -IDEntity Boxes only positions of unanimous agreement -CASe Shows positions agreeing with the consensus in upper case, other postions are lower case -THReshold=1.0 Sets minimum comparison value for symbol to vote in the consensus -PLUrality=2.0 Defines the minimum number of votes for a consensus to exist -LINesize=50 Sets the number of residues per line -BLOcksize=10 Sets the number of residues per block -FONtsize=12 Font size in terms of postscript numbers -XMArgin=20 Left/right margins in postscript units -YMArgin=20 Top/bottom margins in postscript units -[No]HEAder If to print the header -BLAnklines=2 Blank lines between each set of sequences -SEQName=partial If the names of the sequences should be Partial (don't include the file name, good for Pileup files), Full, or None. -PAIr=1.5,1.0,0.5 Thresholds for identical, similar, and somewhat-similar pair-wise matching. Protein defaults are: 1.5, 1.0, 0.5. Nucleic acid defaults are: 1.0, 1.0, 1.0. -COLor=B,L,P,W What color identical, similar, somewhat-similar, and non- matching residues will have. The colors are: Black, Dark, Light, Pale, White. -DENsity=fine Density of printing, this can be either Rough or Fine; Rough may xerox better. This works with Dark, Light, and Pale colors only.
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
If you use one of the command line option -CONsensus, PrettyBox calculates a consensus for each column using a symbol comparison table (Appendix II) . You can provide your own table called either PrettyPep.Cmp for peptides or PrettyDNA.Cmp for nucleic acids. You can define some other table with the command line specification -VOTes=FileName.
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
specifies the number of sequence symbols to display on each line. Typically a linesize of 50 is used for text output, but for graphics values of 100 of more can be more useful.
specifies the number of sequence symbols to put into each block in the graphics output.
causes PrettyBox to show a consensus sequence for the set of sequences you are displaying. Read how PrettyBox finds the consensus above.
determines the symbol comparison value below which a symbol may not vote for a coalition. See the topic called CALCULATING A CONSENSUS.
defines the number of votes (vote weights) below which there will be no consensus. See the topic called CALCULATING A CONSENSUS.
uses only the filename (or the MSF file entry name).
Printed: April 22, 1996 15:55 (1162)