FUNCTION
PepStats gives a short statistical summary on the composition of a protein sequence and gives the molecular weight and isoelectric point.
DESCRIPTION
PepStats is a smaller and less powerful version of PeptideSort. Its main function is to provide details of protein properties that are only obtainable from PeptideSort by carefully specifying "no enzyme". It can also provide statistics for the longest open reading frame(s) from a file written by Extract.
AUTHOR
This program was written by Rodrigo Lopez S. (E-mail: rodrigol@biotek.uio.no; Post: Biotechnology Centre of Oslo, PO Box 1125 Blindern, N-0317 Oslo 3, Norway).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
EXAMPLE
Here is a session with PepStats .
% pepstats
PEPSTATS uses protein (with stop codons) sequence data
PEPSTATS of what sequence ? Sw:Gshr_Human
Start (* 1 *) ?
End (* 478 *) ?
Length = 478
Peptide Sw:Gshr_Human has 1 peptide(s)
and the last stop is at position: 479
What should I call the output file (* gshr_human.stats *) ?
%
OUTPUT
Here is the output file:
PEPSTATS of: Gshr_Human check: 4050 from: 1 to: 478
ID GSHR_HUMAN STANDARD; PRT; 478 AA.
AC P00390;
DT 21-JUL-1986 (REL. 01, CREATED)
DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE)
DT 01-NOV-1990 (REL. 16, LAST ANNOTATION UPDATE) . . .
Continuous From: 1 To: 478 Length: 478
Summary for whole sequence:
Molecular weight = 51569.02 Residues = 478
Average Residue Weight = 107.885 Charge = 1
Isoelectric point = 7.67
Residue Number Mole Percent
A = Ala 42 8.787
B = Asx 0 0.000
C = Cys 10 2.092
D = Asp 21 4.393
E = Glu 29 6.067
F = Phe 14 2.929
G = Gly 43 8.996
H = His 16 3.347
I = Ile 29 6.067
K = Lys 34 7.113
L = Leu 34 7.113
M = Met 15 3.138
N = Asn 17 3.556
P = Pro 24 5.021
Q = Gln 11 2.301
R = Arg 17 3.556
S = Ser 31 6.485
T = Thr 31 6.485
V = Val 44 9.205
W = Trp 3 0.628
Y = Tyr 13 2.720
Z = Glx 0 0.000
Small (A+G) 85 17.782
Hydroxyl (S+T) 62 12.971
Acidic (D+E) 50 10.460
Acid/Amide (D+E+N+Q) 78 16.318
Basic (H+K+R) 67 14.017
Charged (D+E+H+K+R) 117 24.477
Small hphob (I+L+M+V) 122 25.523
Aromatic (F+W+Y) 30 6.276
RELATED PROGRAMS
PeptideSort shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by weight, position, and HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein. PepStats is based on PeptideSort.
RESTRICTIONS
None known
INPUT FILE
The input file of PepStats is a protein sequence file.
COMMAND-LINE SUMMARY
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum syntax: % pepstats [-INfile=]Sw:Gshr_Human -Default Prompted parameters: -BEGin=1 -END=478 range of interest [-OUTfile=]gshr_human.stats output file name Local Data Files: [-DATa1=]aminoacid.dat contains amino acid data Optional parameters: -MINLen=0 minimum peptide length -NONTERM first residue is not the N-terminus -NOCTERM last residue is not the C-terminus
LOCAL DATA FILES
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
File AminoAcid.Dat contains values for calculation of properties. You can Fetch this file and edit the values. The file is supplied by GCG for use by the program PeptideSort.
OPTIONAL PARAMETERS
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
-MINLen=0|&
specifies a minimum length for peptides to be considered, for example when using the output of EExtractPeptide as an input file.
-NONTERM
specifies that the first residue in the input file is not the N-terminus of the protein.
-NOCTERM
specifies that the last residue in the input file is not the C-terminus of the protein.
Printed: April 22, 1996 15:54 (1162)