Dodayhoffstat

Go back to top

DODAYHOFFSTAT


FUNCTION

DoDayhoffStat compares the composition of a protein sequence against the Dayhoff statistic for protein composition. The closer the Dayhoff Stat value is to 1.0 the better the composition of the protein sequence fits with the theoretical value.


DESCRIPTION

DoDayhoffStat compares the composition of a protein sequence with that of Dayhoff statistic for amino acid composition. The comparision is done using data from Dayhoff, M. O. (1978) Atlas of Protein Sequence and Structure, Vol. 5, supplement 3.


AUTHOR

This program was written by Rodrigo Lopez S. (E-mail: rodrigol@biotek.uio.no; Post: Biotechnology Centre of Oslo, PO Box 1125 Blindern, N-0317 Oslo 3, Norway).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).


EXAMPLE

Here is a sample session with Dodayhoffstat.

  
  
  % dodayhoffstat
  
   DODAYHOFFSTAT uses protein sequence data
  
   DODAYHOFFSTAT of what sequence  ?  sw:ach2_drome
  
               Start (* 1 *) ?
             End (*   576 *) ?
  
   What should I call the output file (* ach2_drome.dayhoff*) ?
  
  %
  


OUTPUT

The output from Dodayhoffstat is a simple report sent to the output file. Here is a sample of the output file from the above example:

  
  
   DODAYHOFFSTATS of: sw:ach2_drome  check: 7983  from: 1  to: 576
  
  ID   ACH2_DROME     STANDARD;      PRT;   576 AA.
  AC   P17644;
  DT   01-AUG-1990 (REL. 15, CREATED)
  DT   01-AUG-1990 (REL. 15, LAST SEQUENCE UPDATE)
  DT   01-FEB-1994 (REL. 28, LAST ANNOTATION UPDATE)
  DE   ACETYLCHOLINE RECEPTOR PROTEIN, ALPHA-LIKE CHAIN 2 PRECURSOR. . . .
  
  Residue            Number        Mole Percent       Dayhoff Stat
  A = Ala              28             4.861             0.565
  C = Cys              12             2.083             0.718
  D = Asp              36             6.250             1.136
  E = Glu              23             3.993             0.666
  F = Phe              25             4.340             1.206
  G = Gly              34             5.903             0.703
  H = His              14             2.431             1.215
  I = Ile              40             6.944             1.543
  K = Lys              34             5.903             0.894
  L = Leu              78            13.542             1.830
  M = Met              17             2.951             1.736
  N = Asn              24             4.167             0.969
  P = Pro              29             5.035             0.968
  Q = Gln              18             3.125             0.801
  R = Arg              25             4.340             0.886
  S = Ser              36             6.250             0.893
  T = Thr              31             5.382             0.882
  V = Val              37             6.424             0.973
  W = Trp              11             1.910             1.469
  Y = Tyr              24             4.167             1.225
  


ALGORITHM

Dodayhoffstat uses the Dayhoff statistic for amino acid occurence per 1000 amino acids. These data are contained within the file dodayhoff.dat.


INPUT FILE

The input file for Dodayhoffstat is a GCG protein sequence file.


COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option /CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Syntax: % DODAYhoffstat [-INfile=]swiss:fos_human -Default
  
  Required Parameters:
  
  -BEGin=1 -END=500the range of interest
  [-OUTFile=]fos_human.dstats
  Local Data Files:
  [-DATa1=]DoDayhoff.Datthis file contains the statistic
  
  


LOCAL DATA FILES

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like /DATa1=MyFile.Dat. For more information see Chapter 4, Using Data Files in the User's Guide.

You can Fetch the dodayhoff.dat table and edit it to reflect protein family specific frequencies.


OPTIONAL PARAMETERS

The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.


REFERENCES

Dayhoff, M. O. (1978). Atlas of Protein Sequence and Structure, Vol. 5, supplement 3.

Printed: April 22, 1996 15:52 (1162)