Econsensus

Go back to top

ECONSENSUS


FUNCTION

EConsensus calculates a consensus sequence for a set of pre-aligned short nucleic acid sequences by tabulating the percent of G, A, T, and C for each position in the set. GCG's FitConsensus uses the EConsensus output table as a probe to search for the best examples of the derived consensus in other nucleotide sequences.


DESCRIPTION

EConsensus reads a file of aligned sequences for which you want to know the consensus pattern. EConsensus constructs a consensus table with the percent of each nucleotide at each position. The total number of nucleotides contributing to each position in the sequence shown in the table is also reported. Below the table, EConsensus writes the least ambiguous expression of the consensus sequence for a confidence level that you request.


AUTHOR

This GCG program was modified by Jaakko Hattula (Tampere University of Technology, Finland) and Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).


EXAMPLE

Here is a session using EConsensus to find the consensus of the intervening sequence acceptor splice sites from the file acceptor.dat:

  
  
  % econsensus
  
    ECONSENSUS of what file ?  acceptor.dat
  
    Find consensus to what percent certainty (* 75.0 *) ?
  
    What should I call the output file (* acceptor.csn *) ?
  
   ................
  
  %
  


OUTPUT

Here is the output file, which is a legal GCG sequence file:

  
  
   ECONSENSUS of: acceptor.dat
  
  IVS Acceptor Splice Site Sequences
  from Stephen Mount NAR 10(2); 459-472 figure 1 page 460
  Acceptor
  
                                          *****
  
   %G      15   22   10   10   10    6    7    9    7    5    5   24    1    0
   %A      15   10   10   15    6   15   11   19   12    3   10   25    4  100
   %T      52   44   50   54   60   49   48   45   45   57   58   30   31    0
   %C      18   25   30   21   24   30   34   28   36   35   27   21   64    0
  
  Total   114  114  115  127  127  127  128  128  128  130  131  131  131  131
  
   %G     100   52   24   19
   %A       0   22   17   20
   %T       0    8   37   29
   %C       0   18   22   32
  
  Total   131  131  131  131
  
                                          *****
  
   ECONSENSUS sequence to a certainty level of 75.0 percent at each position:
  
   Length: 18  December 12, 1995 16:15  Type: N  Check: 3343  ..
  
    1  BBYHYYYHYY YDYAGVBH
  
  


RELATED PROGRAMS

FitConsensus uses the file written by EConsensus to search for the best places in a nucleotide sequence where the consensus table fits. The mapping programs can be run with the command line option -ALL to search for all potential restriction sites in an ambiguous sequence.

ProfileMake creates a position-specific scoring table, called a profile, that quantitatively represents the information from a group of aligned sequences. The profile can then be used for database searching (ProfileSearch) or sequence alignment (ProfileGap) .


CONSIDERATIONS

EConsensus makes no attempt to align the sequences in the input file, so you should be sure that they are optimally aligned before running the program. The input file structure is described below. The ambiguous representation of the sequence may be arbitrary if there are equal numbers of observations of some nucleotides.


STATISTICS USED

EConsensus counts the number of G's, A's, T's, and C's in each position of the prealigned sequences. G, A, T, and C each have a value of one. The ambiguous nucleotide codes are divided. R, for instance, represents A or G and therefore contributes 0.5 to G and 0.5 to A. Periods (gaps) have no value. When the count is complete, the counts of each nucleotide at each position are totaled, normalized to 100, and rounded to the nearest integer. The normalized integers are reported as the %G, %A, etc., at each position. The total number of observations used to generate the percent figures is also shown. An observation is any IUB code (see Appendix III) ; periods do not count as observations.

For some user-set certainty level, EConsensus writes the least ambiguous expression of the sequence in the table using the IUB ambiguity codes. For each column (position) in the table, the computer starts with the largest member (G, A, T, or C) and adds successively smaller members until the sum is equal to or greater than the certainty level set by you. If two nucleotides have the same score, EConsensus picks one to add to the consensus arbitrarily. This may be somewhat misleading.


INPUT FILE STRUCTURE

The input file has a heading of indefinite length, followed by a line containing two adjacent periods (..). The sequences follow with one sequence per line. Every sequence is the same length. The maximum size for the sequences is 130 bases. EConsensus assumes optimal alignment of the sequences. Here is part of the input file for the example above:

  
  
  IVS Acceptor Splice Site Sequences
  compiled by Stephen Mount NAR 10(2); 459-472 figure 1 page 460
  
             /       ..
   .........AAATAGGAT
   .........TTGTAGGTG
   ..........TGTAGGTG
   TTTATTTATTTCAAGATT
  
   //////////////////
  
   GTCACTTGTCACTAGGTA
  


COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Minimal Syntax: % econsensus [-INfile=]acceptor.dat -Default
  
  Prompted Parameters:
  
  [-OUTfile=]acceptor.csn  output file name
  -CERtainty=75.0          percent certainty at which to find consensus
  
  Local Data Files:     None
  
  Optional Parameters:  None
  
  -NOMONitor               turn off monitoring of progress
  


LOCAL DATA FILES

None.


OPTIONAL PARAMETERS

None.

Printed: April 22, 1996 15:52 (1162)