Ediverge

Go back to top

EDIVERGE


FUNCTION

EDiverge is a version of Diverge with command line control. Diverge measures the percent divergence of two protein coding sequences using the method of Perler and Efstratiadis.


DESCRIPTION

EDiverge makes a codon by codon comparison of two aligned protein coding sequences using the method of Perler and Efstratiadis (Cell 20; 555-566 (1980) (methods: pp. 564-5)). For each nucleotide difference between sequence one and sequence two, EDiverge scores whether it is a type 1, 2, or 3 silent or replacement change (see Perler and Efstratiadis). This score is divided into the possible silent and replacement changes in each category to come up with six percent divergence figures. All of the data is reported so that the values can be assembled into a weighted average.


AUTHOR

This GCG program was modified by Jaakko Hattula (Tampere University of Technology, Finland) and Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).


EXAMPLE

Below is a session using EDiverge to measure the percent divergence between G and A gamma. SeqEd and Assemble were used to create files of the coding sequences with intervening sequences removed. Gap with the command line option -OUT would have been used if the coding sequences were not perfectly aligned.

  
  
  % ediverge
  
   EDIVERGE uses nucleotide sequence data
  
   EDIVERGE of what sequence ?  agammacod.seq
  
                Start (* 1 *) ?
                End (* 444 *) ?
            Reverse (* No *) ?
  
   What sequence (* agammacod.seq *) ?  ggammacod.seq
  
                Start (* 1 *) ?
                End (* 444 *) ?
            Reverse (* No *) ?
  
   What should I call the output file (* agammacod.diverge *) ?
  
  
  %
  
  


OUTPUT

Here is all of the output file:

  
  
   DIVERGE between: agammacod.seq  check: 2862  from: 1  to: 444
  
   ASSEMBLE    July 27, 1994 11:40
  Symbols:     1 to: 92    from: gamma.seq  ck: 6474,  7114 to: 7205
  Symbols:    93 to: 315   from: gamma.seq  ck: 6474,  7328 to: 7550
  Symbols:   316 to: 444   from: gamma.seq  ck: 6474,  8417 to: 8545
  Human fetal beta globins G and A gamma
  from Shen, Slightom and Smithies,  Cell 26; 191-203. . . .
  
   and: ggammacod.seq  check: 2906  from: 1  to: 444
  
   ASSEMBLE    July 27, 1994 11:40
  Symbols:     1 to: 92    from: gamma.seq  ck: 6474,  2179 to: 2270
  Symbols:    93 to: 315   from: gamma.seq  ck: 6474,  2393 to: 2615
  Symbols:   316 to: 444   from: gamma.seq  ck: 6474,  3502 to: 3630
  Human fetal beta globins G and A gamma
  from Shen, Slightom and Smithies,  Cell 26; 191-203. . . .
  
                             July 27, 1994 11:44   ..
  
  Possible Silent       Actual Silent        Percent Silent
  1      2      3      1      2      3      1      2      3
82.0    4.0   73.0    0.0    0.0    0.0    0.0    0.0    0.0
                             corrected:    0.0    0.0    0.0
  
    Possible Replacement  Actual Replacement   Percent Replacement
  1      2      3      1      2      3      1      2      3
 4.0   82.0  285.0    0.0    0.0    1.0    0.0    0.0    0.4
                             corrected:    0.0    0.0    0.4
  


RELATED PROGRAMS

SeqEd and Assemble create new sequence files from ranges within existing sequence files. When run with the command line option -OUT, Gap creates aligned sequences in files. LineUp allows you to edit multiple sequence alignments. Distances makes a table of the pair-wise distances between the sequences in a multiple sequence alignment.


CONSIDERATIONS

Sequences one and two must be aligned codon by codon. Gap may locate gaps across codon boundaries. You may want to align the sequences at the peptide level to make sure the nucleic acid alignment makes sense. LineUp lets you adjust alignments manually.


WHAT EDIVERGE DOES

EDiverge calculates the percent and corrected percent divergence for each category of silent or replacement change exactly as described by Perler and Efstratiadis (Cell 20; 555-566 (1980) methods: pp. 564-5) for changes in coding sequences.


COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Minimum syntax: % ediverge [-INfile1=]agammacod.seq -Default
  
  Prompted Parameters:
  
  -BEGin1=1 -END1=576          Range of interest
  [-INfile2=]ggammacod.seq     Sequence file
  -BEGin2=1 -END2=576          Range of interest
  -NOREV1 -NOREV2              Strand of each sequence
  [-OUTfile=]agammacod.diverge Output file
  
  Optional Parameters: None
  


LOCAL DATA FILES

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate= mycode.txt. Translation tables are discussed in more detail in the Data Files manual.


OPTIONAL PARAMETERS

The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

-TRANSlate=filename.txt

Usually, translation is based on the translation table in a default or local data file called translate.txt. This option allows you to use a translation table in a different file. (See the Data Files manual for information about translation tables.)


REFERENCES

Perler, F., Efstratiadis, A., Lomedico, P., Gilbert, W., Kolodner, R., and Dodgson, J. (1980). The Evolution of Genes: The Chicken Preproinsulin Gene. Cell 20, 555-566.

Printed: April 22, 1996 15:52 (1162)