EComposition determines the composition of sequence(s). For nucleotide sequence(s), EComposition also determines dinucleotide and trinucleotide content.
EComposition measures the composition of one or a group of sequences. If you specify only one sequence, you can choose a range within the sequence. Lowercase letters are converted to uppercase and counted with their uppercase equivalents. If you specify a group of sequences, EComposition displays the name of each sequence as it finishes the measurement for that sequence.
This GCG program was modified by David Mathog (E-mail: MATHOG@seqaxp.bio.caltech.edu Post: Sequence Analysis Facility, Biology Division, Caltech), and modified for EGCG by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a session using EComposition to calculate the molecular weight of sequence gamma.seq.
% ecomposition -mw ECOMPOSITION uses nucleotide sequences ECOMPOSITION of what sequence(s) ? gamma.seq Start (* 1 *) ? End (* 11375 *) ? What should I call the output file (* gamma.composition *) ? ECOMPOSITION complete. Sequences: 1 Total Length: 11,375 CPU time: 00.18 Output file: gamma.composition %
Here is part of the output file:
ECOMPOSITION of: gamma.seq Check: 6474 from: 1 to: 11,375 March 19, 1996 15:07 ***** A: 3,374 C: 2,209 G: 2,496 T: 3,296 Molecular weight: 3455812.25 Other: 0 Total: 11,375
Unknown.
You can infer the composition of the bottom strand of a nucleic acid sequence from the composition of the top strand. The -BOTHstrands option measures both strands, but information is lost because G=C and A=T, and so on.
CodonFrequency tabulates codon frequencies for any range of a sequence in a particular reading frame, as opposed to counting all trinucleotides.
If you need to stop this program,
use
You can run this program in the batch queue using a script that we supply.
Use Fetch
with a file name that starts with this program's name.
Modify the file with any text editor so that it specifies the experiment you want to do and queue the script.
See the sections on specifying sequences in Chapter 2,
Using Sequences of the User's Guide.
All parameters for this program may be put on the command line.
Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes.
In the summary below,
the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter.
Square brackets ([ and ])
enclose qualifiers or parameter values that are optional.
For more information,
see "Using Program Parameters" in Chapter 3,
Basic Concepts: Using Programs in the GCG User's Guide.
None.
The parameters and switches listed below can be set from the command line.
For more information,
see "Using Program Parameters" in Chapter 3,
Basic Concepts: Using Programs in the User's Guide.
measures the composition of both strands of a nucleic acid sequence.
Also calculates the mlecular weight for double stranded nucleic acid if the -MW
option is used.
EComposition normally displays numbers greater than 999 with commas to make them easier to read;
for example,
the number 1234567 would look like 1,234,567.
These commas make the numbers unreadable to a computer.
If you are going to use the output file from this program for input to another program,
you can suppress the commas with this option.
calculates the molecular weight,
and suppresses other output forms.
Option -BOTHstrands
is needed to force the program to calculate a molecular weight for double stranded DNA.
uses the RNA bases (U instead of T)
in molecular weight calculation.
subtracts the 5' phosphate weight from calculated molecular weight values.
This program normally monitors its progress on your screen.
However,
when you use the -Default option to suppress all program interaction,
you also suppress the monitor.
You can turn it back on with this option.
If your program is running in batch,
the monitor will appear in the log file.
If the monitor is slowing the program down,
suppress it with -NOMONitor.
writes a summary of the program's work to the screen when you've used the -Default qualifier to suppress all program interaction.
A summary typically displays at the end of a program run interactively.
You can suppress the summary for a program run interactively with -NOSUMmary.
Use this qualifier also to include a summary of the program's work in the log file for a program run in batch.
Printed: April 22,
1996 15:52 (1162)
BATCH QUEUE
NAMING SETS OF SEQUENCES
COMMAND-LINE SUMMARY
Minimal Syntax: % ecomposition [-INfile=]Primate:* -Default
Prompted Parameters:
-BEGin=1 -END=1000 range (for single sequences only)
[-OUTfile=]primate.composition output file name
Local Data Files: None
Optional Parameters:
-BOTHstrands determines composition of both strands of nucleic acids
-NOCOMmas removes the commas from the numbers in the output
-NOMONitor suppresses the screen monitor showing each sequence
-NOSUMmary suppresses the screen summary at the end of the program
-MW calculate molecular weight instead
-RNA Use U instead of T in calculations
-DEPhosphorylation calculates without the 5' phosphate for nucleic acid
LOCAL DATA FILES
OPTIONAL PARAMETERS
-BOTHstrands
-NOCOMmas
-MW
-RNA
-DEPhosphorylation
-MONitor
-SUMmary