CheckLenComp compares two sorted CheckLen output files, and produces a list of entries from the first file which are not found in the second.
CheckLenComp is one of the programs used to generate the PirOnly database by comparison of SwissProt and Pir entries.
The program does not prompt for values, so everything must be specified on the command line. See the PirOnly documentation for more details of the procedure.
This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a sample session with CheckLenComp
% checklencomp piro.sorted sw.sorted sw-piro.comp pironlyrest.dat %
The main output from CheckLenComp is a file containing a list of unique sequences from the first of the input files. This file can then be used as input to the DataSet program to build the new subset database.
The first output file lists the matched identical sequeces between the two input files.
The example below shows part of the output from a run on the Pir and SwissProt databases.
PIR entries not in SwissProt January 28, 1993 23:49 .. PIR3:C38746 PIR2:A31369 PIR2:PS0147 PIR3:D33356 //////////////////////////////
PIR3:A34516 = SW:KPBA_MOUSE PIR2:S02185 = SW:HEMX_ECOLI PIR3:A29501 = SW:FIBA_MACFU PIR1:SMHU2 = SW:MT2_HUMAN PIR2:JQ0234 = SW:YCR3_ORYSA PIR1:GGICE6 = SW:GLB6_CHITH PIR1:XNECGM = SW:GLMS_ECOLI PIR3:JQ0835 = SW:BXB5_BOMMO PIR1:Q1BP87 = SW:Y18_BPT7 PIR2:JN0016 = SW:PERI_RAT PIR2:S19418 = SW:YCZ6_YEAST ////////////////////////////
The input for CheckLenComp is two sorted output files from CheckLen. The first file contains the entries in the database to be used, the second contains the comparison entries for exclusion of duplicates.
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum syntax: % checklencomp -Default Prompted Parameters: [-INfile=]piro.checklen Sorted CheckLen file of original database [-INfile2=]sw.checklen Sorted CheckLen file of comparison database [-OUTfile=]sw-piro.comp List of matched identical entries [-RESTfile=]pironlyrest.dat List of unique entries for use as DataSet input
Printed: April 22, 1996 15:52 (1162)