Ecodonfrequency

Go back to top

ECODONFREQUENCY


FUNCTION

ECodonFrequency tabulates codon usage from sequences and/or existing codon usage tables. The output file is correctly formatted for input to the CodFish, CodonPreference, Correspond, and Frames programs.

ECodonFrequency is a modified version of GCG version 7's CodonFrequency with command line control added.


DESCRIPTION

ECodonFrequency counts codons and writes their frequencies into codon frequency tables. It counts the codons from ranges within sequences or existing codon frequency tables. The output table is a file with the sum of all the observations for each of the 64 possible codons. This file is suitable for input to other GCG programs, including BackTranslate, CodonPreference, and Correspond.

ECodonFrequency supports the assembly of fragments from circular molecules by letting you define a range in the sequence that extends across the end and into the beginning of a molecule. The terminal bell rings when a circular range is chosen.

To count codons from sequences, specify ranges until you have assembled a sequence you want to count. For each range, ECodonFrequency shows you the starting and ending symbols to double check that you have chosen the range and strand accurately.

After choosing each range, you must decide if you would like to add another exon to the gene or count the codons in the gene you have assembled. It is critical that you count the codon frequencies from multi-exon genes after assembling all the ranges since intervening sequences often interrupt such genes within a codon, thus destroying the reading frame.

After ECodonFrequency counts all the codons in your gene, you may specify another gene from the current sequence file or get other sequence files or codon frequency tables.

You can specify multiple sequences (such as a list file or sequence specification using an asterisk (*) wildcard) to count codons from more than one sequence at a time. By default, each sequence in a multiple sequence specification is treated as a separate gene and counted separately by ECodonFrequency If you add the -ONEPEPtide command-line qualifier, then all sequences in a multiple sequence specification are concatentated together into a single sequence before counting codons. If you use a list file to specify multiple sequences, you can add begin, end, and strand sequence attributes to specify the range and strand for each sequence. For more information about list files, see "Using List Files (formerly Files of Sequence Names) " in Chapter 2, Using Sequences in the User's Guide.

After each sequence range is counted or each new codon frequency table is read, ECodonFrequency asks if you want to write the data to a file. If you choose to do so, the program writes a file with the number of observations for each codon. In addition, ECodonFrequency normalizes the codon observations to a frequency per thousand and to a fraction for each codon within its synonymous family.


AUTHOR

This GCG program was modified by Jaakko Hattula (Tampere University of Technology, Finland) and Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).


EXAMPLE

Here is a session using ECodonFrequency to generate a codon frequency table for the human fetal beta globin gene G-Gamma:

  
  
  %  ecodonfrequency
  
   You can add codon frequencies from either:
  
     E)xisting codon usage files
     S)equence files
  
   Please select one (* S *):
  
   ECODONFREQUENCY uses nucleotide sequence data
  
   ECODONFREQUENCY of what sequence ?  gamma.seq
  
               Start (* 1 *) ?  2179
             End (* 11375 *) ?  2270
            Reverse (* No *) ?
  
   That begins ATGGG and ends GGAAG.  Is this correct (* Yes *) ?
  
   Get another exon from this gene (* No *) ?  y
  
               Start (* 1 *) ?  2393
             End (* 11375 *) ?  2615
            Reverse (* No *) ?
  
   That begins GCTCC and ends TCAAG.  Is this correct (* Yes *) ?
  
   Get another exon from this gene (* No *) ?  y
  
               Start (* 1 *) ?  3502
             End (* 11375 *) ?  3630
            Reverse (* No *) ?
  
   That begins CTCCT and ends ACTGA.  Is this correct (* Yes *) ?
  
   Get another exon from this gene (* No *) ?
  
   That's done, now would you like to:
  
  1) Get a new sequence input file
  2) Get a new codon table input file
  3) Specify another gene from this sequence file
  
  W)rite the frequencies to your output file
  
   Please choose one (* W *):
  
   What should I call the output file (* gamma.cod *) ?  ggammacod.cod
  
  %
  


OUTPUT

Note that the multi-exon G-Gamma gene is assembled from three exons before the codons are counted! Here is part of the output file:

  
  
  
   ECODONFREQUENCY  December 12, 1995 16:13
  
  From          :   check: 6474  from: 2179  to: 2270
   continuing on:   check: 6474  from: 2393  to: 2615
   continuing on:   check: 6474  from: 3502  to: 3630
  
  Human fetal beta globins G and A gamma
  from Shen, Slightom and Smithies,  Cell 26; 191-203.
  Analyzed by Smithies et al. Cell 26; 345-353.
  
  
  Gly     GGG        0.00      0.00      0.00
  Gly     GGA        6.00     40.54      0.46
  Gly     GGT        1.00      6.76      0.08
  Gly     GGC        6.00     40.54      0.46
  
  Glu     GAG        4.00     27.03      0.50
  Glu     GAA        4.00     27.03      0.50
  Asp     GAT        5.00     33.78      0.62
  Asp     GAC        3.00     20.27      0.38
  
   ///////////////////////////////////////////
  
  Leu     CTG       12.00     81.08      0.71
  Leu     CTA        0.00      0.00      0.00
  Leu     CTT        0.00      0.00      0.00
  Leu     CTC        3.00     20.27      0.18
  
  Pro     CCG        0.00      0.00      0.00
  Pro     CCA        1.00      6.76      0.25
  Pro     CCT        2.00     13.51      0.50
  Pro     CCC        1.00      6.76      0.25
  


RELATED PROGRAMS

BackTranslate, CodonPreference, and Correspond need to have codon frequency tables like the ones written by CodonFrequency as input.


RESTRICTIONS

Unknown. ECodonFrequency reads the third column of data in existing codon usage files. This column should not have normalized data in it (e.g., percentages) if you plan to add it to data that has not been normalized. Look at the file structure in the example output and under the FILES USED topic below.


FILES USED

Existing codon usage files may be written by ECodonFrequency or generated by hand. If you write a codon table yourself, it should be documented with text followed by a line with two adjacent periods. Below this heading and dividing line, write the data with the first three columns of information as in the output file shown above. The lines can be in any order and only codons whose use is greater than zero need be present. The spacing of the columns is not significant and blank lines are allowed. After creating a codon table with the first three columns of information, you should generate the complete codon usage table (five columns of information) by using the table you created as the input to CodonFrequency.


COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Minimal Syntax: % ecodonfrequency [-INfile1=]gamma.seq -Default
  
  Prompted Parameters:
  
  [-OUTfile1=]gamma.cod     output file name
  -TYPE1=S                  S=sequence input, E=existing table
  -MENU1=W                  Menu choice, W=write first set
  
  Local Data Files:
  
  -TRANSlate=translate.txt  contains the genetic code
  
  Optional Parameters:
  
  -EXONs1=1               number of exons for each sequence
  -BEGin=1 -END=100       range of interest for each exon
  -REVerse1               strand for each exon
  


LOCAL DATA FILES

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate= mycode.txt. Translation tables are discussed in more detail in the Data Files manual.


OPTIONAL PARAMETERS

The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the User's Guide.

-CODonfile=ecohigh.cod

specifies an existing codon frequency file to be used as input to ECodonFrequency

-BEGin=1

sets the beginning position for all input sequences. When the beginning position is set from the command line, ECodonFrequency ignores beginning positions specified for individual sequences in a list file. ECodonFrequency recognizes -BEGin only when more than one input sequence is specified or when -Default is on the command line.

-END=100

sets the ending position for all input sequences. When the ending position is set from the command line, ECodonFrequency ignores ending positions specified for sequences in a list file. ECodonFrequency recognizes -END only when more than one input sequence is specified or when -Default is on the command line.

-REVerse

sets the program to use the reverse strand for each input sequence. When -REVerse or -NOREVerse is on the command line, ECodonFrequency ignores any strand designation for individual sequences in a list file. ECodonFrequency recognizes -REVerse and -NOREVerse only when more than one input sequence is specified or when -Default is on the command line.

-ONEPEPtide

concatenates all input sequences in a multiple sequence specification together before processing.

-CONtinue

causes ECodonFrequency to loop back to the beginning of the program after you write the output file. If you leave the file specification blank when the program loops back to the top, ECodonFrequency stops.

-TRANSlate=filename.txt

Usually, translation is based on the translation table in a default or local data file called translate.txt. This option allows you to use a translation table in a different file. (See the Data Files manual for information about translation tables.)

-MONitor

This program normally monitors its progress on your screen. However, when you use the -Default option to suppress all program interaction, you also suppress the monitor. You can turn it back on with this option. If your program is running in batch, the monitor will appear in the log file. If the monitor is slowing the program down, suppress it with -NOMONitor.

Printed: April 22, 1996 15:52 (1162)