PirOnly is a utility that excludes entries from NBRF if they are also present in the latest release of SwissProt.
The result is a database called PirOnly in GCG/NBRF format that can be used with SwissProt to search all sequences in both the protein sequence databases.
The reason for using SwissProt as the main database is twofold. SwissProt is fully annotated, and it has sensible entry names that allow GCG programs to easily select all entries from, for example, Escherichia coli with the specification SwissProt:*_Ecoli.
PirOnly is a command procedure pironly.csh that runs a series of programs to cross reference entries in SwissProt and NBRF. PirOnly can be run each time SwissProt or NBRF is updated to produce a new database.
This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
PirOnly is a command procedure pironly.csh that should be submitted as a batch job. You should first edit the command procedure to check that the database locations are correct on your system, and to insert the correct release number and date in the DataSet run at the end of the file.
PirOnly produces a new database called PirOnly in a directory called "pironlydir". These logical names must be added to your GCG startup procedure (usually in GenCom:sitelogicals ) to make them available to all users.
Several files are created by the command procedure.
SwissProtDir:pir.numbers contains the NBRF cross reference lines from SwissProt, sorted by NBRF accession number.
SwissProtDir:sw-pir.comp contains a list of equivalent NBRF and SwissProt entries (in addition to those already cross referenced in SwissProt).
SwissProtDir:pironlyrest.dat contains a list of NBRF entries that were not found in SwissProt. This file is used by the DataSet utility program to create the final PirOnly database.
The SwissProt database includes a cross reference to entries in the NBRF (or PIR) database. These cross references are extracted using grep. The output is sorted by accession number, and stored in file SwissProtDir:pir.numbers.
This file is then read by the program PirOnly which scans the accession numbers of entries in NBRF (in both the PROTEIN and NEW databases) and writes a list of entries that do not have any accession number listed in SwissProt.
Program CheckLen then writes a file containing the checksum, length and entry name of (a) all entries listed by PirOnly and (b) all entries in SwissProt.
Program CheckLenComp reads the CheckLen files, and if two entries have the same checksum and length their sequences are compared. If both sequences are exactly identical, the NBRF and SwissProt entry names are written to file SwissProtDir:sw-piro.comp. NBRF entries which do not match a SwissProt sequence are listed to file SwissProtDir:pironlyrest.dat.
The DataSet utility program (the normal GCG program) is used to create a new database from the entries listed in pironlyrest.dat.
\ the new database logical name to file .datasetrc in your default directory \
Printed: April 22, 1996 15:54 (1162)