ETerminator searches for prokaryotic factor-independent RNA polymerase terminators according to the method of Brendel and Trifonov. ETerminator is a version of GCG's old Terminator with command line control.
ETerminator uses a table of the dinucleotide frequencies for each position from a set of known terminators to find places in a new sequence where terminator-like sequences occur. ETerminator finds all discrete examples in the searched sequence where a measurement falls above some user-defined threshold value. The measurement for each alignment of the table over the sequence is the sum of the values in the table for each dinucleotide from the sequence. The method can also restrict the set of terminatorlike sequences shown to those that fall above some threshold for the presence of a GC-rich dyad symmetry near the poly-U region.
The method used by ETerminator is described in detail in two papers: Brendel, V. and Trifonov, E. N., Nucl. Acids Res. 12 4411-4427 (1984) and Brendel, V. and Trifonov, E. N. in CODATA Conference Proceedings, Jerusalem, 1984. Any use of ETerminator that results in publication should cite these papers.
This GCG program was modified by Jaakko Hattula (Tampere University of Technology, Finland) and Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
The original GCG Terminator program was written by Volker Brendel at the Weizmann Institute of Science, Rehovot, Israel and was adapted to run with the Wisconsin Package(TM) by Greg Hamm of the European Molecular Biology Laboratory, Heidelberg, West Germany.
Here is a session using ETerminator to search for terminator-like sequences in pbr322:
% eterminator ETERMINATOR uses nucleotide sequence data ETERMINATOR of what sequence ? GenBank:pBR322 Start (* 1 *) ? End (* 4363 *) ? Reverse (* No *) ? Primary structure threshold value (* 3.5 *) ? Secondary structure threshold value (* 0 *) ? What should I call the output file (* pbr322.trm *) ? Searching . . . %
Here is the output file:
ETERMINATOR search on: pBR322 check: 3298 from: 1 to: 4363 LOCUS PBR322 4363 bp DNA Circular SYN 06-MAY-1994 DEFINITION Plasmid pBR322 complete sequence. ACCESSION V01119 KEYWORDS ampicillin resistance; circular; cloning vector; drug resistance gene; plasmid. SOURCE None . . . Primary structure threshold: 3.50 Secondary structure threshold: 0 July 27, 1994 09:58 .. -40 -35 -30 -25 -20 -15 -10 -5 -1+ +5 p s . . . . . . . . .. . 921=> CCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCTACGTCTTGCTGGCGT 3.80 0 1398=> CATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCTGGCC 3.62 0 1573=> TCTGCGACCTGAGCAACAACATGAATGGTCTTCGGTTTCCGTGTTTCGTAA 3.62 0 1583=> GAGCAACAACATGAATGGTCTTCGGTTTCCGTGTTTCGTAAAGTCTGGAAA 4.32 0 1916=> AGTGACCAAACAGGAAAAAACCGCCCTTAACATGGCCCGCTTTATCAGAAG 4.47 0 2322=> GATGCGTAAGGAGAAAATACCGCATCAGGCGCTCTTCCGCTTCCTCGCTCA 3.73 48 -- - --- --- - -- 2494=> GCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAG 4.35 0 2499=> AGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCC 3.95 0 3041=> TGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAG 6.92 95 ----------- ----------- ---- ---- 3103=> GCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTC 4.18 68 ------- - -- -- - ------- --------- --------- . . . . . . . . .. . 3201=> GATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTT 4.62 19 --------- --------- 3504=> GTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGG 3.59 0 4228=> TATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGT 4.49 0 4313=> ACATTAACCTATAAAAATAGGCGTATCACGAGGCCCTTTCGTCTTCAAGAA 3.69 0
None
The pattern recognition method used by ETerminator is only applicable to the search for prokaryotic factor-independent terminators. As mentioned above, Terminator is not really a GCG program, but was adapted to run with the Wisconsin Package(TM) by Greg Hamm. Its behavior is not completely known, and we do not assert that all GCG conventions have been followed. We are very grateful to Drs. Brendel and Trifonov for generously allowing GCG to distribute their program. GCG will try to correct any misbehavior found.
The algorithm is described clearly in the CODATA paper.
The default primary structure threshold is such that about 95 percent of known terminators would be found by ETerminator in the set of terminator-like sequences based on primary structure alone.
The program predicts terminators in those parts of the sequence composed entirely of lower- and uppercase G, A, T, and C. Parts of the sequence containing other sequence symbols are given a primary structure value of 0.0 and a secondary structure value of 0.
ETerminator only accepts nucleotide sequences. If ETerminator rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimal Syntax: % eterminator [-INfile=]GenBank:pBR322 -Default Prompted Parameters: -BEGin=1 -END=4363 range of interest -REVerse use the reverse strand -PTHRESHold=3.50 primary structure threshold value -STHRESHold=0 secondary structure threshold value [-OUTfile=]pbr322.trm output file name Local Data Files: -DATa1=pmatrix.dat contains the normalized dinucleotide fractions -DATa2=smatrix.dat contains the significant GC-rich dyad diagonals Optional Parameters: None
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
The file pmatrix.dat is taken from Figure 3 of the CODATA paper. It is similar to Figure 3 of the NAR paper. It contains the normalized fractions of each dinucleotide observed in the set thought to be determining terminator structure. The file smatrix.dat is from Figure 2 from the CODATA paper. It contains the significant diagonals for the GC-rich dyad symmetry. Both pmatrix.dat and smatrix.dat must be provided to ETerminator as local data files.
None.
Brendel, V. and Trifonov, E.N. (1984) "A computer algorithm for testing potential prokaryotic terminators." Nucl. Acids Res. 12, 4411-4427.
Brendel, V. and Trifonov, E. N. (1984) in CODATA Conference Proceedings, Jerusalem, 1984.
Printed: April 22, 1996 15:53 (1162)