HelixTurnHelix uses the method of Dodd and Egan to determine the significance of possible helix-turn-helix matches in protein sequences.
HelixTurnHelix uses the Dodd and Egan matrix to test for the presence of a helix-turn-helix DNA-binding motif in a protein sequence.
This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a sample session with HelixTurnHelix
% helixturnhelix HELIXTURNHELIX uses protein sequences HELIXTURNHELIX of what sequence(s) ? Sw:Laci_Ecoli Start (* 1 *) ? End (* 360 *) ? What should I call the output file (* laci_ecoli.hth *) ? %
The output from HelixTurnHelix is a file containing the highest scoring hit, and all the hits above a standard deviation threshold.
HELIXTURNHELIX of sw:laci_ecoli from: 1 to: 360 Using distribution mean: 238.71 and SD: 293.61 Report scores beyond +2.50 standard deviations Hits above +2.50 SD (972.73) Score 2202 (+6.69 SD) in SW:LACI_ECOLI at residue 5 P03023 escherichia coli. lactose operon repressor. 8/91 Sequence: TLYDVAEYAGVSYQTVSRVVNQ | | 5 26
HelixTurnHelix uses the method of Dodd IB and Egan JB (1990) Nucl. Acids. Res. 18:5019-5026, an update of the method in Dodd IB and Egan JB (1987) J Mol Biol 194:557-664.
The input file for HelixTurnHelix is one or more GCG protein sequence files.
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum syntax: % helixturnhelix [-INfile=]Sw:Laci_Ecoli -Default Prompted Parameters: -BEGin=1 -END=346 Range of interest -OUTFILE=laci_ecoli.hth Output file Local Data Files: -DATa=doddegan.dat Older version of matrix -EIGHTYSEVEN Use DoddEgan87.Dat Optional Parameters: -MINSD=2.5 Show all hits over 2.5 SDs above mean -TESTMEAN=238.71 Mean score for non-HTH proteins -TESTSD=293.61 Standard deviation for non-HTH proteins
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
The frequency table is stored in file doddegan.dat. You can Fetch this table and edit it (for example to include additional motif sequences). HelixTurnHelix will insist that each column has the same total.
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
shows all hits at this number of standard deviations above the mean.
defines the mean score for a test run against a set of non-HTH proteins.
defines the standard deviation for a test run against a set of non-HTH proteins.
displays the entry names as they are processed, for use when the input file is a wild-card entry name like SW:*_Ecoli.
Dodd I.B., Egan J.B. (1987) "Systematic method for the detection of potential lambda cro-like DNA-binding regions in proteins." J. Mol. Biol. 194,557-564.
Dodd I.B., Egan J.B. (1990) "Improved detection of helix-turn-helix DNA-binding motifs in protein sequences."; Nucleic Acids Res. 18, 5019-5026.
Printed: April 22, 1996 15:53 (1162)