SigCleave uses the von Heijne method to locate signal sequences, and to identify the cleavage site. The method is 95% accurate in resolving signal sequences from non-signal sequences with a cutoff score of 3.5, and 75-80% accurate in identifying the cleavage site. The program reports all hits above a minimum value.
SigCleave uses the von Heijne matrix method to search for a signal peptide sequence.
This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a sample session with SigCleave
% sigcleave SIGCLEAVE uses protein sequence data What sequence ? Sw:ach2_drome Start (* 1 *) ? End (* 576 *) ? What should I call the output file (* ach2_drome.sig *) ? %
The output from SigCleave is a simple report to the terminal. MaxSite is the amino acid position immediately after the predicted cleavage site. MinWeight is the calculated weight value for the predicted cleavage site. Sequence shows the amino acid positions used to calculate the MinWeight, with "-" used to indicate the predicted cleavage site.
The value of MinWeight should be greater than 3.5. At this level, the method should correctly identify 95% of signal peptides, and reject 95% of non-signal peptides. The cleavage site should be correctly predicted in 75-80% of cases.
SIGCLEAVE of Sw:Ach2_Drome Check: 7983 from: 1 to: 576 ID ACH2_DROME STANDARD; PRT; 576 AA. AC P17644; DT 01-AUG-1990 (REL. 15, CREATED) DT 01-AUG-1990 (REL. 15, LAST SEQUENCE UPDATE) DT 01-NOV-1990 (REL. 16, LAST ANNOTATION UPDATE) . . . Report scores over 3.50 Maximum score 13.7 at residue 42 Sequence: LLVLLLLCETVQA-NPDAKRLYDDLLSNYNRLIRPVSNNTDTVLVKLGLRLSQLIDLNLKDQIL | (signal) | (mature peptide) 29 42 Other entries above 3.50 Score 12.1 at residue 39 Sequence: LCLLLVLLLLCET-VQANPDAKRLYDDLLSNYNRLIRPVSNNTDTVLVKLGLRLSQLIDLNLKD | (signal) | (mature peptide) 26 39 ///////////////////////////////////////////////////////////////
SigCleave uses the method of von Heijne (Nuc. Acids Res. 14:4683 (1986)), as modified by von Heijne in his book "Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit" (Acad. Press, (1987), 113-117), where the treatment of positions -1 and -3 in the matrix is slightly altered.
The input file for SigCleave is a GCG protein sequence file.
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum syntax: % sigcleave [-INfile=]Sw:ach2_drome -Default Prompted Parameters: -BEGin=1 -END=576 Range of interest [-OUTfile=]ach2_drome.sig Output file Local Data Files: -DATa=sigweighteuk.dat Weight matrix file Optional Parameters: -MINWeight=3.5 Show all hits above this weight -PROKaryote Use table for prokaryote signal sequences -PVal=-13 (-)matrix columns before cleavage site -NVal=2 Matrix columns after cleavage residue
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
For the von Heijne matrix, SigCleave uses the signal profile found in either sigweightprok.dat or sigweighteuk.dat. These are the tables from the original papers.
You can Fetch these tables and edit them (for example to include additional signal sequences). SigCleave will insist that each column has the same total.
If you use matrix tables with a different number of residues before or after the cleavage site, you must also set the optional parameters NVal and PVal.
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
shows all hits above this weight score
uses the weight matrix for prokaryote rather than eukaryote signal sequences.
specifies (minus) the number of columns before the cleavage site in the weight matrix table.
specifies the number of columns after the residue at the cleavage site in the weight matrix table.
von Heijne G. (1986) "A new method for predicting signal sequences cleavage sites." Nucleic Acids Res. 14, 4683-4690.
von Heijne G. (1987) in "Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit" (Acad. Press, (1987), 113-117).
Printed: April 22, 1996 15:55 (1162)