Ecomposition

Go back to top

ECOMPOSITION


FUNCTION

EComposition determines the composition of sequence(s). For nucleotide sequence(s), EComposition also determines dinucleotide and trinucleotide content.


DESCRIPTION

EComposition measures the composition of one or a group of sequences. If you specify only one sequence, you can choose a range within the sequence. Lowercase letters are converted to uppercase and counted with their uppercase equivalents. If you specify a group of sequences, EComposition displays the name of each sequence as it finishes the measurement for that sequence.


AUTHOR

This GCG program was modified by David Mathog (E-mail: MATHOG@seqaxp.bio.caltech.edu Post: Sequence Analysis Facility, Biology Division, Caltech), and modified for EGCG by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).


EXAMPLE

Here is a session using EComposition to calculate the molecular weight of sequence gamma.seq.

  
  
  % ecomposition -mw
  
    ECOMPOSITION uses nucleotide sequences
  
    ECOMPOSITION of what sequence(s) ?  gamma.seq
  
                Start (* 1 *) ?
                End (* 11375 *) ?
  
    What should I call the output file (* gamma.composition *) ?
  
    ECOMPOSITION complete.
  
   Sequences: 1
Total Length: 11,375
    CPU time: 00.18
  
   Output file: gamma.composition
  
  %
  


OUTPUT

Here is part of the output file:

  
  
   ECOMPOSITION of: gamma.seq  Check: 6474  from: 1  to: 11,375
  
   March 19, 1996 15:07
  
                         *****
  
  A: 3,374        C: 2,209        G: 2,496        T: 3,296
  
  Molecular weight:   3455812.25
  
                       Other: 0
  
                       Total: 11,375
  
  


RESTRICTIONS

Unknown.


CONSIDERATIONS

You can infer the composition of the bottom strand of a nucleic acid sequence from the composition of the top strand. The -BOTHstrands option measures both strands, but information is lost because G=C and A=T, and so on.


RELATED PROGRAMS

CodonFrequency tabulates codon frequencies for any range of a sequence in a particular reading frame, as opposed to counting all trinucleotides.


CTRL-C

If you need to stop this program, use C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use C.


BATCH QUEUE

You can run this program in the batch queue using a script that we supply. Use Fetch with a file name that starts with this program's name. Modify the file with any text editor so that it specifies the experiment you want to do and queue the script.


NAMING SETS OF SEQUENCES

See the sections on specifying sequences in Chapter 2, Using Sequences of the User's Guide.


COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Minimal Syntax: % ecomposition [-INfile=]Primate:* -Default
  
  Prompted Parameters:
  
  -BEGin=1 -END=1000              range (for single sequences only)
  [-OUTfile=]primate.composition  output file name
  
  Local Data Files: None
  
  Optional Parameters:
  
  -BOTHstrands  determines composition of both strands of nucleic acids
  -NOCOMmas     removes the commas from the numbers in the output
  -NOMONitor    suppresses the screen monitor showing each sequence
  -NOSUMmary    suppresses the screen summary at the end of the program
  -MW           calculate molecular weight instead
  -RNA          Use U instead of T in calculations
  -DEPhosphorylation  calculates without the 5' phosphate for nucleic acid
  


LOCAL DATA FILES

None.


OPTIONAL PARAMETERS

The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the User's Guide.

-BOTHstrands

measures the composition of both strands of a nucleic acid sequence. Also calculates the mlecular weight for double stranded nucleic acid if the -MW option is used.

-NOCOMmas

EComposition normally displays numbers greater than 999 with commas to make them easier to read; for example, the number 1234567 would look like 1,234,567. These commas make the numbers unreadable to a computer. If you are going to use the output file from this program for input to another program, you can suppress the commas with this option.

-MW

calculates the molecular weight, and suppresses other output forms. Option -BOTHstrands is needed to force the program to calculate a molecular weight for double stranded DNA.

-RNA

uses the RNA bases (U instead of T) in molecular weight calculation.

-DEPhosphorylation

subtracts the 5' phosphate weight from calculated molecular weight values.

-MONitor

This program normally monitors its progress on your screen. However, when you use the -Default option to suppress all program interaction, you also suppress the monitor. You can turn it back on with this option. If your program is running in batch, the monitor will appear in the log file. If the monitor is slowing the program down, suppress it with -NOMONitor.

-SUMmary

writes a summary of the program's work to the screen when you've used the -Default qualifier to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

Use this qualifier also to include a summary of the program's work in the log file for a program run in batch.

Printed: April 22, 1996 15:52 (1162)