Gapframe

Go back to top

GAPFRAME


FUNCTION

GapFrame moves all gaps in a DNA sequence reading frame to be at codon boundaries.


DESCRIPTION

GapFrame adjusts gaps within a defined coding region of a sequence so that they are all on codon boundaries.

In practice this only involves shifting gaps by one base in either direction.

Note that the begin and end positions are for ungapped sequence, so the coding region is in the same place no matter how many gaps have been inserted.

GCG's FrameAlign program can be used to check the results by comparing the output sequence to the correct translation.

GapFrame was originally written during a course in Oslo in response to a request from one of the students, as an example of how quickly a new EGCG application could be created.


AUTHOR

This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).


EXAMPLE

Here is a sample session with GapFrame

  
  
  % gapframe
  
   GAPFRAME uses (gapped) nucleotide sequence data
  
   GAPFRAME of what sequence ?  testgap.seq
  
                Start (* 1 *) ?  135
                End (* 2167 *) ?  1292
  
    What should I call the output file (* testgap.gseq *) ?
  
  %
  


OUTPUT

The output from GapFrame is a sequence identical to the input sequence except that gaps in the specified range have moved.

  
  
  testgap.gseq  Length: 2176  March 19, 1996 15:34  Type: N  Check: 9566  ..
  
    1  GGTA..CCGC TGGCCGAGCA TCTGCTCGAT CACCACCAGC CGGGCGACGG
  
   51  GAACTGCACG ATCTACCTGG CGAGCCTGGA GCACGAGCGG GTTCGCTTCG
  
  101  TACGGCGCTG AGCGACAGTC ACAGGAGAGG AAACGGATGG GATCGCACCA
  
  151  GGAGCGGCCG CTGATCGGCC TGCTGTTCTC CGAAACCGGC GTCACCGCCG
  
  201  AT....ATCG AGCGCTCGCA CGCGTATGGC GCATTGCTCG CG..GTCGAG
  
  251  CAACTGAACC GCGAGGGCGG CGTC.GGCGG TCGCCCGATC GAAACGCTGT
  
  
      ////////////////////////////////////////////////////
  


ALGORITHM

To realign gaps, and gap that starts at the end of a codon needs no adjustment. Any gap after the first base of a codon can be moved one base earlier. Any gap after the second base od a codon can be moved one base later.


INPUT FILE

The input file for GapFrame is a gapped GCG nucleotide sequence file.


COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Minimum syntax: % gapframe [-INfile=]amir.seq -Default
  
  Prompted Parameters:
  
  -BEGin=1 -END=100           Range of interest
  [-OUTfile=]amir.seq         Output file
  
  Local Data Files: None
  
  Optional Parameters: None
  


LOCAL DATA FILES

None.

Printed: April 22, 1996 15:53 (1162)