Filteroverlap

Go back to top

FILTEROVERLAP


FUNCTION

FilterOverlap reads the output file from EOverlap and filters out only those overlaps which meet specified values when the alignments are built. Output from GCG's Overlap program may also be used, but only if generated from a self comparison of a single database.


DESCRIPTION

FilterOverlap processes the output from the EGCG program EOverlap (a modified version of GCG's Overlap) and extracts only those candidate overlaps which meet specified values for their actual alignment scores.


AUTHOR

This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).


EXAMPLE

Here is a sample session with FilterOverlap

  
  
  % filteroverlap
  
   FILTEROVERLAP of what file ?  overlap.dat
  
   What should I call the output file (* overlap.filter *) ?
  
   What gap weight (* 0.0 *) ?
  
   What gap weight (* 1.0 *) ?
  
   What stringency (* 0.80 *) ?
  
   Aligning ...........-..
  Accepted match "mu:MU10" "mu:MU5", 230 203.3 0.8839
  
  %
  


OUTPUT

The output from FilterOverlap is a revised version of the original input file, with only the accepted overlaps remaining.

  
  
   OVERLAP of: mu:*
      to: mu:*
   Min overlap fraction: 0.80  Min overlap length: 10  Integral width: 3
                    December 12, 1995 14:02
  
  Filter with Stringency: 0.80 MinOverlap: 10 Integrate; 3
  
  Sequence1 Strand Pos Sequence2 Strand Pos Length Matches Ratio  Len1  Len2 ..
  
  MU10        +      2 MU5         -      1    230     203  0.88   361   230
  
               //////////////////////////////////////////
  
  MU32        +      6 MU9         -      1     35      35  1.00    40    39
  


INPUT FILE

The input file for FilterOverlap is an output file from EOverlap, although an output file from GCG's Overlap for a self-comparison of a GCG database is also suitable.


COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Minimum Syntax: % filteroverlap [-INfile=]test.overlap -Default
  
  Prompted Parameters:
  
  [-OUTfile=]test.filter      Output file for accepted overlaps
  
  Local Data Files:
  
  -DATa=overdna.cmp           Comparison matrix for overlap testing
  
  Optional Parameters:
  
  -ALIGNfile=test.align       File to contain accepted alignments
  -REJALIGNfile=test.rejalign File to contain rejeted alignments
  -REJECTfile=test.reject     File to contain rejected overlaps
  -ADDINTegrate=0             Show all hits longer than 6 residues
  -MONitor                    Show results of each test
  -SUMmary                    Show summary statistics
  


LOCAL DATA FILES

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

This program uses the local data file overdna.cmp as the comparison matrix. THis matrix scores 1.0 for a match, -5.0 for a mismatch and 0.1 for any match to "N" or "X" (a low value so that these show up with ":" in the display of sequence alignments).


OPTIONAL PARAMETERS

The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

-ADDINTegrate=2

includes a further 2 diagonals in the alignment, in addition to the number summed together by EOverlap

-ALIGNfile[=overlap.align]

specifies a file to contain the actual alignments used in calculating the scores for accepted overlaps.

-REJECTfile[=overlap.reject]

specifies a file to contain a list of candidate overlaps rejected by the specified criteria.

-REJALIGNfile[=overlap.rejalign]

specifies a file to contain the actual alignments used in calculating the scores for rejected overlaps.

-MONitor

shows results of each test on the screen

-SUMmary

shows run statistics (overlaps tested, overlaps accepted) on the screen.

Printed: April 22, 1996 15:53 (1162)