DoDayhoffStat compares the composition of a protein sequence against the Dayhoff statistic for protein composition. The closer the Dayhoff Stat value is to 1.0 the better the composition of the protein sequence fits with the theoretical value.
DoDayhoffStat compares the composition of a protein sequence with that of Dayhoff statistic for amino acid composition. The comparision is done using data from Dayhoff, M. O. (1978) Atlas of Protein Sequence and Structure, Vol. 5, supplement 3.
This program was written by Rodrigo Lopez S. (E-mail: rodrigol@biotek.uio.no; Post: Biotechnology Centre of Oslo, PO Box 1125 Blindern, N-0317 Oslo 3, Norway).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a sample session with Dodayhoffstat.
% dodayhoffstat DODAYHOFFSTAT uses protein sequence data DODAYHOFFSTAT of what sequence ? sw:ach2_drome Start (* 1 *) ? End (* 576 *) ? What should I call the output file (* ach2_drome.dayhoff*) ? %
The output from Dodayhoffstat is a simple report sent to the output file. Here is a sample of the output file from the above example:
DODAYHOFFSTATS of: sw:ach2_drome check: 7983 from: 1 to: 576 ID ACH2_DROME STANDARD; PRT; 576 AA. AC P17644; DT 01-AUG-1990 (REL. 15, CREATED) DT 01-AUG-1990 (REL. 15, LAST SEQUENCE UPDATE) DT 01-FEB-1994 (REL. 28, LAST ANNOTATION UPDATE) DE ACETYLCHOLINE RECEPTOR PROTEIN, ALPHA-LIKE CHAIN 2 PRECURSOR. . . . Residue Number Mole Percent Dayhoff Stat A = Ala 28 4.861 0.565 C = Cys 12 2.083 0.718 D = Asp 36 6.250 1.136 E = Glu 23 3.993 0.666 F = Phe 25 4.340 1.206 G = Gly 34 5.903 0.703 H = His 14 2.431 1.215 I = Ile 40 6.944 1.543 K = Lys 34 5.903 0.894 L = Leu 78 13.542 1.830 M = Met 17 2.951 1.736 N = Asn 24 4.167 0.969 P = Pro 29 5.035 0.968 Q = Gln 18 3.125 0.801 R = Arg 25 4.340 0.886 S = Ser 36 6.250 0.893 T = Thr 31 5.382 0.882 V = Val 37 6.424 0.973 W = Trp 11 1.910 1.469 Y = Tyr 24 4.167 1.225
Dodayhoffstat uses the Dayhoff statistic for amino acid occurence per 1000 amino acids. These data are contained within the file dodayhoff.dat.
The input file for Dodayhoffstat is a GCG protein sequence file.
All parameters for this program may be put on the command line. Use the option /CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Syntax: % DODAYhoffstat [-INfile=]swiss:fos_human -Default Required Parameters: -BEGin=1 -END=500the range of interest [-OUTFile=]fos_human.dstats Local Data Files: [-DATa1=]DoDayhoff.Datthis file contains the statistic
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like /DATa1=MyFile.Dat. For more information see Chapter 4, Using Data Files in the User's Guide.
You can Fetch the dodayhoff.dat table and edit it to reflect protein family specific frequencies.
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Dayhoff, M. O. (1978). Atlas of Protein Sequence and Structure, Vol. 5, supplement 3.
Printed: April 22, 1996 15:52 (1162)