GAPFRAME

FUNCTION

GapFrame moves all gaps in a DNA sequence reading frame to be at codon boundaries.

DESCRIPTION

GapFrame adjusts gaps within a defined coding region of a sequence so that they are all on codon boundaries.

In practice this only involves shifting gaps by one base in either direction.

Note that the begin and end positions are for ungapped sequence, so the coding region is in the same place no matter how many gaps have been inserted.

GCG's FrameAlign program can be used to check the results by comparing the output sequence to the correct translation.

GapFrame was originally written during a course in Oslo in response to a request from one of the students, as an example of how quickly a new EGCG application could be created.

AUTHOR

This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).

EXAMPLE

Here is a sample session with GapFrame.


% gapframe

 GAPFRAME uses (gapped) nucleotide sequence data

 GAPFRAME of what sequence ?  testgap.seq

                   Start (* 1 *) ?  135
                   End (* 2167 *) ?  1292

  What should I call the output file (* testgap.gseq *) ?

%

OUTPUT

The output from GapFrame is a sequence identical to the input sequence except that gaps in the specified range have moved.


testgap.gseq  Length: 2176  March 19, 1996 15:34  Type: N  Check: 9566  ..

       1  GGTA..CCGC TGGCCGAGCA TCTGCTCGAT CACCACCAGC CGGGCGACGG

      51  GAACTGCACG ATCTACCTGG CGAGCCTGGA GCACGAGCGG GTTCGCTTCG

     101  TACGGCGCTG AGCGACAGTC ACAGGAGAGG AAACGGATGG GATCGCACCA

     151  GGAGCGGCCG CTGATCGGCC TGCTGTTCTC CGAAACCGGC GTCACCGCCG

     201  AT....ATCG AGCGCTCGCA CGCGTATGGC GCATTGCTCG CG..GTCGAG

     251  CAACTGAACC GCGAGGGCGG CGTC.GGCGG TCGCCCGATC GAAACGCTGT


         ////////////////////////////////////////////////////

ALGORITHM

To realign gaps, and gap that starts at the end of a codon needs no adjustment. Any gap after the first base of a codon can be moved one base earlier. Any gap after the second base od a codon can be moved one base later.

INPUT FILE

The input file for GapFrame is a gapped GCG nucleotide sequence file.

COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.


Minimum syntax: % gapframe [-INfile=]amir.seq -Default

Prompted Parameters:

-BEGin=1 -END=100           Range of interest
[-OUTfile=]amir.seq         Output file

Local Data Files: None

Optional Parameters: None

LOCAL DATA FILES

None.

Printed: April 22, 1996 15:53 (1162)