ECodonFrequency tabulates codon usage from sequences and/or existing codon usage tables. The output file is correctly formatted for input to the CodFish, CodonPreference, Correspond, and Frames programs.
ECodonFrequency is a modified version of GCG version 7's CodonFrequency with command line control added.
ECodonFrequency counts codons and writes their frequencies into codon frequency tables. It counts the codons from ranges within sequences or existing codon frequency tables. The output table is a file with the sum of all the observations for each of the 64 possible codons. This file is suitable for input to other GCG programs, including BackTranslate, CodonPreference, and Correspond.
ECodonFrequency supports the assembly of fragments from circular molecules by letting you define a range in the sequence that extends across the end and into the beginning of a molecule. The terminal bell rings when a circular range is chosen.
To count codons from sequences, specify ranges until you have assembled a sequence you want to count. For each range, ECodonFrequency shows you the starting and ending symbols to double check that you have chosen the range and strand accurately.
After choosing each range, you must decide if you would like to add another exon to the gene or count the codons in the gene you have assembled. It is critical that you count the codon frequencies from multi-exon genes after assembling all the ranges since intervening sequences often interrupt such genes within a codon, thus destroying the reading frame.
After ECodonFrequency counts all the codons in your gene, you may specify another gene from the current sequence file or get other sequence files or codon frequency tables.
You can specify multiple sequences (such as a list file or sequence specification using an asterisk (*) wildcard) to count codons from more than one sequence at a time. By default, each sequence in a multiple sequence specification is treated as a separate gene and counted separately by ECodonFrequency If you add the -ONEPEPtide command-line qualifier, then all sequences in a multiple sequence specification are concatentated together into a single sequence before counting codons. If you use a list file to specify multiple sequences, you can add begin, end, and strand sequence attributes to specify the range and strand for each sequence. For more information about list files, see "Using List Files (formerly Files of Sequence Names) " in Chapter 2, Using Sequences in the User's Guide.
After each sequence range is counted or each new codon frequency table is read, ECodonFrequency asks if you want to write the data to a file. If you choose to do so, the program writes a file with the number of observations for each codon. In addition, ECodonFrequency normalizes the codon observations to a frequency per thousand and to a fraction for each codon within its synonymous family.
This GCG program was modified by Jaakko Hattula (Tampere University of Technology, Finland) and Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a session using ECodonFrequency to generate a codon frequency table for the human fetal beta globin gene G-Gamma:
% ecodonfrequency You can add codon frequencies from either: E)xisting codon usage files S)equence files Please select one (* S *): ECODONFREQUENCY uses nucleotide sequence data ECODONFREQUENCY of what sequence ? gamma.seq Start (* 1 *) ? 2179 End (* 11375 *) ? 2270 Reverse (* No *) ? That begins ATGGG and ends GGAAG. Is this correct (* Yes *) ? Get another exon from this gene (* No *) ? y Start (* 1 *) ? 2393 End (* 11375 *) ? 2615 Reverse (* No *) ? That begins GCTCC and ends TCAAG. Is this correct (* Yes *) ? Get another exon from this gene (* No *) ? y Start (* 1 *) ? 3502 End (* 11375 *) ? 3630 Reverse (* No *) ? That begins CTCCT and ends ACTGA. Is this correct (* Yes *) ? Get another exon from this gene (* No *) ? That's done, now would you like to: 1) Get a new sequence input file 2) Get a new codon table input file 3) Specify another gene from this sequence file W)rite the frequencies to your output file Please choose one (* W *): What should I call the output file (* gamma.cod *) ? ggammacod.cod %
Note that the multi-exon G-Gamma gene is assembled from three exons before the codons are counted! Here is part of the output file:
ECODONFREQUENCY December 12, 1995 16:13 From : check: 6474 from: 2179 to: 2270 continuing on: check: 6474 from: 2393 to: 2615 continuing on: check: 6474 from: 3502 to: 3630 Human fetal beta globins G and A gamma from Shen, Slightom and Smithies, Cell 26; 191-203. Analyzed by Smithies et al. Cell 26; 345-353. Gly GGG 0.00 0.00 0.00 Gly GGA 6.00 40.54 0.46 Gly GGT 1.00 6.76 0.08 Gly GGC 6.00 40.54 0.46 Glu GAG 4.00 27.03 0.50 Glu GAA 4.00 27.03 0.50 Asp GAT 5.00 33.78 0.62 Asp GAC 3.00 20.27 0.38 /////////////////////////////////////////// Leu CTG 12.00 81.08 0.71 Leu CTA 0.00 0.00 0.00 Leu CTT 0.00 0.00 0.00 Leu CTC 3.00 20.27 0.18 Pro CCG 0.00 0.00 0.00 Pro CCA 1.00 6.76 0.25 Pro CCT 2.00 13.51 0.50 Pro CCC 1.00 6.76 0.25
BackTranslate, CodonPreference, and Correspond need to have codon frequency tables like the ones written by CodonFrequency as input.
Unknown. ECodonFrequency reads the third column of data in existing codon usage files. This column should not have normalized data in it (e.g., percentages) if you plan to add it to data that has not been normalized. Look at the file structure in the example output and under the FILES USED topic below.
Existing codon usage files may be written by ECodonFrequency or generated by hand. If you write a codon table yourself, it should be documented with text followed by a line with two adjacent periods. Below this heading and dividing line, write the data with the first three columns of information as in the output file shown above. The lines can be in any order and only codons whose use is greater than zero need be present. The spacing of the columns is not significant and blank lines are allowed. After creating a codon table with the first three columns of information, you should generate the complete codon usage table (five columns of information) by using the table you created as the input to CodonFrequency.
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimal Syntax: % ecodonfrequency [-INfile1=]gamma.seq -Default Prompted Parameters: [-OUTfile1=]gamma.cod output file name -TYPE1=S S=sequence input, E=existing table -MENU1=W Menu choice, W=write first set Local Data Files: -TRANSlate=translate.txt contains the genetic code Optional Parameters: -EXONs1=1 number of exons for each sequence -BEGin=1 -END=100 range of interest for each exon -REVerse1 strand for each exon
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate= mycode.txt. Translation tables are discussed in more detail in the Data Files manual.
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the User's Guide.
specifies an existing codon frequency file to be used as input to ECodonFrequency
sets the beginning position for all input sequences. When the beginning position is set from the command line, ECodonFrequency ignores beginning positions specified for individual sequences in a list file. ECodonFrequency recognizes -BEGin only when more than one input sequence is specified or when -Default is on the command line.
sets the ending position for all input sequences. When the ending position is set from the command line, ECodonFrequency ignores ending positions specified for sequences in a list file. ECodonFrequency recognizes -END only when more than one input sequence is specified or when -Default is on the command line.
sets the program to use the reverse strand for each input sequence. When -REVerse or -NOREVerse is on the command line, ECodonFrequency ignores any strand designation for individual sequences in a list file. ECodonFrequency recognizes -REVerse and -NOREVerse only when more than one input sequence is specified or when -Default is on the command line.
concatenates all input sequences in a multiple sequence specification together before processing.
causes ECodonFrequency to loop back to the beginning of the program after you write the output file. If you leave the file specification blank when the program loops back to the top, ECodonFrequency stops.
Usually, translation is based on the translation table in a default or local data file called translate.txt. This option allows you to use a translation table in a different file. (See the Data Files manual for information about translation tables.)
This program normally monitors its progress on your screen. However, when you use the -Default option to suppress all program interaction, you also suppress the monitor. You can turn it back on with this option. If your program is running in batch, the monitor will appear in the log file. If the monitor is slowing the program down, suppress it with -NOMONitor.
Printed: April 22, 1996 15:52 (1162)