ECompTable creates a scoring matrix using equivalences defined in a simplification scheme such as the one used for Simplify. ECompTable is a version of GCG's CompTable with command line control added. (See the Chapter 4, Using Data Files in the GCG User's Guide for more information.)
Scientists comparing protein sequences sometimes want to consider similar amino acids as equivalent. Sequence simplification can be done either by changing the symbols in the sequences being compared (see Simplify) or, for programs that use scoring matrices, by creating a table that scores matches between the symbols you consider to be equivalent.
This GCG program was modified by Jaakko Hattula (Tampere University of Technology, Finland) and Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a session using ECompTable to make a scoring matrix with the standard simplification file used by Simplify (you can use Fetch and to make a copy of simplify.txt and modify it to create the input file for CompTable) :
% ecomptable ECOMPTABLE of what file ? simplify.txt What is the symbol match value (* 1.0 *) ? What is the symbol mismatch value (* -0.20 *) ? 0.0 What should I call the output file (* simplify.cmp *) ? %
Here is part of the output file:
COMPTABLE of: Simplify.Txt FileCheck: 5908 A standard simplification used by SIMPLIFY and WORDSEARCH to simplify peptide sequences. The first line below means "for all of the P, A, G, S, or T characters in the sequence, substitute A." The program COMPTABLE can construct a symbol comparison table with the equivalences from this file. July 8, 1994 17:11 A B C D E F G H I J K L ... .. 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... A 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... B 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... C 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... D 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... E ////////////////////////////////////////////////
Simplify simplifies a sequence file with the simplifications from a simplification table.
Here is the input file for the example above:
A standard simplification used by SIMPLIFY and WORDSEARCH to simplify peptide sequences. The first line below means "for all of the P, A, G, S, or T characters in the sequence, substitute A." The program COMPTABLE can construct a symbol comparison table with the equivalences from this file. 10/7/84 .. A PAGST D QNEDBZ H HKR I LIVM F FYW C C
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum Syntax: % ecomptable [-INfile=]simplify txt -Default Prompted Parameters: -MATch=1.0 Symbol match value -MISMATch=0.2 Symbol mismatch value [-OUTfile=]simplify.cmp Output file Local Data Files: None Optional Parameters: None
None.
None.
Printed: April 22, 1996 15:52 (1162)