GapFrame moves all gaps in a DNA sequence reading frame to be at codon boundaries.
GapFrame adjusts gaps within a defined coding region of a sequence so that they are all on codon boundaries.
In practice this only involves shifting gaps by one base in either direction.
Note that the begin and end positions are for ungapped sequence, so the coding region is in the same place no matter how many gaps have been inserted.
GCG's FrameAlign program can be used to check the results by comparing the output sequence to the correct translation.
GapFrame was originally written during a course in Oslo in response to a request from one of the students, as an example of how quickly a new EGCG application could be created.
This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a sample session with GapFrame
% gapframe GAPFRAME uses (gapped) nucleotide sequence data GAPFRAME of what sequence ? testgap.seq Start (* 1 *) ? 135 End (* 2167 *) ? 1292 What should I call the output file (* testgap.gseq *) ? %
The output from GapFrame is a sequence identical to the input sequence except that gaps in the specified range have moved.
testgap.gseq Length: 2176 March 19, 1996 15:34 Type: N Check: 9566 .. 1 GGTA..CCGC TGGCCGAGCA TCTGCTCGAT CACCACCAGC CGGGCGACGG 51 GAACTGCACG ATCTACCTGG CGAGCCTGGA GCACGAGCGG GTTCGCTTCG 101 TACGGCGCTG AGCGACAGTC ACAGGAGAGG AAACGGATGG GATCGCACCA 151 GGAGCGGCCG CTGATCGGCC TGCTGTTCTC CGAAACCGGC GTCACCGCCG 201 AT....ATCG AGCGCTCGCA CGCGTATGGC GCATTGCTCG CG..GTCGAG 251 CAACTGAACC GCGAGGGCGG CGTC.GGCGG TCGCCCGATC GAAACGCTGT ////////////////////////////////////////////////////
To realign gaps, and gap that starts at the end of a codon needs no adjustment. Any gap after the first base of a codon can be moved one base earlier. Any gap after the second base od a codon can be moved one base later.
The input file for GapFrame is a gapped GCG nucleotide sequence file.
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum syntax: % gapframe [-INfile=]amir.seq -Default Prompted Parameters: -BEGin=1 -END=100 Range of interest [-OUTfile=]amir.seq Output file Local Data Files: None Optional Parameters: None
None.
Printed: April 22, 1996 15:53 (1162)