Ecorrespond

Go back to top

ECORRESPOND


FUNCTION

ECorrespond looks for similar patterns of codon usage by comparing codon frequency tables.


DESCRIPTION

The frequencies compared are the number of incidents of the codon in question divided by the total number of codons specifying that amino acid or terminator in each table. The statistic gets smaller as the patterns of codon usage become more similar (see Grantham, Nucl. Acids Res. 9(1); r43-r74 (1981)). ECorrespond requires codon frequency tables generated by CodonFrequency as the object of the comparison. If an amino acid is not used at all in one of the tables, its codons contribute nothing to the sum of squares. These ignored codons are counted and reported. You can file the results of a session with ECorrespond or display the results only on the screen. You may use ambiguous file names or indirect file specifications (files of filenames) for the input file(s), and ECorrespond makes all of the implied comparisons.


AUTHOR

This GCG program was modified by Jaakko Hattula (Tampere University of Technology, Finland) and Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).


EXAMPLE

Here is session using ECorrespond to find the correspondence among all of the files ending in .cod that are provided in the Wisconsin Package(TM):

  
  
  % ecorrespond
  
   ECORRESPOND of what frequency file(s) ?  *.cod
  
   to what other frequency file(s) (* *.cod *) ?
  
   Do you want to file the results (* No *) ?  Y
  
   What should I call the output file (* drosophilahigh.corr *) ?
  
   /////////////////////////////////////////////
  
  %
  


OUTPUT

Correspond always writes output on your screen. You can also choose to file the results. Here is part of the output file from the example session:

  
  
   ECORRESPOND  December 12, 1995 16:21
  
  Between                           and        D-Squared   D   Not-Counted ..
  
  drosophila_high.cod   drosophila_high.cod    0.000000    0.000000   0
  drosophila_high.cod           ecohigh.cod    4.678955    2.163089   3
  drosophila_high.cod            ecolow.cod    4.938438    2.222260   3
  ecohigh.cod           drosophila_high.cod    4.678955    2.163089   3
  ecohigh.cod                   ecohigh.cod    0.000000    0.000000   3
  ecohigh.cod                    ecolow.cod    3.389803    1.841142   3
  ecolow.cod            drosophila_high.cod    4.938438    2.222260   3
  ecolow.cod                    ecohigh.cod    3.389803    1.841142   3
  ecolow.cod                     ecolow.cod    0.000000    0.000000   3
  


RELATED PROGRAMS

CodonFrequency generates codon frequency tables. CodonPreference finds regions of sequences that show a preference for a pattern of codon choices in a codon frequency table.


RESTRICTIONS

If you use ambiguous file names, all of the files in the set of files implied by your file name or file of file names must be real codon frequency tables like the ones written by ECodonFrequency. If either file specification does not contain any files, ECorrespond simply does nothing.


STATISTIC USED

ECorrespond reads the normalized (/1000) data from the fourth column of the codon frequency table. It then totals these figures for each synonymous family. If the total for a family in either table is 0.0, then none of the codons from that family contribute anything to the value of D squared.

  
  Frequency((codon)) = Number((column 4)) / Total((family))
  
 D squared = Sum over all 64 codons of:
  
 ( Freq((codon,table 1)) - Frequency((codon,table 2)) ) (2)
  


SUGGESTIONS

If you plan to compare many codon frequency tables, naming your tables with the extension .cod simplifies your task. This allows you to specify the files ambiguously with *.cod.


INPUT FILE

The codon frequency tables that ECorrespond compares should be in the same format as the tables from the CodonFrequency program. ECorrespond only reads the fourth column of information for calculating frequencies.


COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Minimal Syntax: % ecorrespond [-INfile1=]*.cod -Default
  
  Prompted Parameters:
  
  [-INfile2=]*.cod        tables to compare to
  -FILE                   should output file be used?
  [-OUTfile=]xxx.corr     output file name
  
  Optional Parameters:
  
  -CONtinue1=1            continue after each set
  


LOCAL DATA FILES

None.


OPTIONAL PARAMETERS

The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the User's Guide.

-CONtinue

makes this program loop back to the beginning and prompt for more input files after the comparison is done.


REFERENCES

Grantham R., Gautier C., Gouy M., Jacobzone M. and Mercier R. (1981). "Codon catalog usage is a genome strategy modulated for gene expressivity." Nucl. Acids Res. 9, r43-r74.

Printed: April 22, 1996 15:52 (1162)