BasePairPlot plots the percentage occurence and the observed over expected frequency of a di-nucleotide pair relative to their position in a nucleic acid sequence.
BasePairPlot uses the method described by Gardiner-Garden (1987); J. Mol.Biol. 196:261-282 for detecting CpG islands, modified to identify any pair of dinucleotides.
This program was written by Rodrigo Lopez S. (E-mail: rodrigol@biotek.uio.no; Post: Biotechnology Centre of Oslo, PO Box 1125 Blindern, N-0317 Oslo 3, Norway).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a session with BasePairPlot that was used to see the CG content of human H4 histone mRNA between position 200 and 700.
% basepairplot BASEPAIRPLOT uses nucleotide sequence data BASEPAIRPLOT of what sequence: ? GenEMBL:hsh4bhis Start (* 1 *) ? End (* 814 *) ? What window size (* 100 *) ? What shift increment (* 1 *) ? What dinucleotide (* CG *) ? What should I call the output file (* hsh4bhis.islands *) ? The minimum density for a one-page plot is 626.2 residues/100 units. What density do you want (* 626.2 *) ? %
This is the plot from the example session
StatPlot plots a set of parallel curves from a table of numbers like the table written by the Window program. The statistics in each column of the table are associated with a position in the analyzed sequence. Window makes a table of the frequencies of different sequence patterns within a window as it is moved along a sequence. A pattern is any short sequence like GC or R or ATG. You can plot the output with the program StatPlot. Composition determines the composition of sequence(s). For nucleotide sequence(s), Composition also determines dinucleotide and trinucleotide content.
CpGPlot plots the frequency of occurence of CpG di-nucleotides and C and G percentage relative to their position in a sequence by the method described by Gardiner-Garden (1987)
When attempting to plot identical dinucleotides pairs only the percentage curve will be plotted.
BasePairPlot uses the method described by Gardiner-Garden (1987); J. Mol.Biol. 196:261-282 for detecting CpG islands, modified to identify any dinucleotide.
None
Plots of very long nucleic acid sequences are difficult to interpret in one page. Thus the user is recommended to use a different density (lower) for plotting in mulitple pages or select shorter ranges from within the sequence.
The input file for BasePairPlot is a GCG formatted nucleic acid sequence
The Wisconsin Package must be configured for graphics before you run any program with graphics output! If the % setplot command is available in your installation, this is the easiest way to establish your graphics configuration, but you can also use commands like % postscript that correspond to the graphics languages the Wisconsin Package supports. See Chapter 5, Using Graphics in the User's Guide for more information about configuring your process for graphics.
If you need to stop this program,
use
All parameters for this program may be put on the command line.
Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes.
In the summary below,
the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter.
Square brackets ([ and ])
enclose qualifiers or parameter values that are optional.
For more information,
see "Using Program Parameters" in Chapter 3,
Basic Concepts: Using Programs in the GCG User's Guide.
The files described below supply auxiliary data to this program.
The program automatically reads them from a public data directory unless you either 1)
have a data file with exactly the same name in your current working directory;
or 2)
name a file on the command line with an expression like -DATa1=myfile.dat.
For more information see Chapter 4,
Using Data Files in the User's Guide.
If you are studying a sequence with known features,
this program marks the plot with small boxes showing the positions of these features.
The presence of a file in your directory with the same name as your sequence and the file name extension .mrk causes the program to mark each range specified in the file.
You can provide a marking file on the command line with an expression like -MARk=
hsh4bhis.mrk.
The file gamma.mrk contains information about the format of marking files.
The figure for the example session shows marked regions.
The parameters and switches listed below can be set from the command line.
For more information,
see "Using Program Parameters" in Chapter 3,
Basic Concepts: Using Programs in the GCG User's Guide.
plots a line indicating possible islands of unusual composition.
specifies an alternative title for the plot.
suppresses the plot title.
If you are studying a sequence with known features,
this program marks the plot with small boxes showing the positions of these features.
The presence of a file in your directory with the same name as your sequence and the file name extension .mrk causes the program to mark each range specified in the file.
The file gamma.mrk contains information about the format of marking files.
These options apply to most EGCG graphics programs.
These and other options will be described in detail in the EGCG User's Guide.
sets the plot density in standard units of residues per 100 platen units.
GCG defines the page as 150 platen units across in landscape orientation although the plot normally uses only 120 to 140 units of this,
the rest being used for the left and right margins.
sets the left margin position (width)
of the plot area in GCG platen units.
sets the right margin position (the right edge is at 150.0)
of the plot area in GCG platen units.
sets the bottom margin position (width)
of the plot area in GCG platen units.
sets the top margin position (the top edge is at 100.0)
of the plot area in GCG platen units.
puts a line border around the plot.
suppresses any line border around the plot.
always puts a page number on the plot.
By default page numbers appear from the second page onward.
removes all page numbers from the plot.
These options let you specify the text to appear around the plot.
overrides the default plot title of program name and sequence name.
removes the main title from the plot
overrides the default plot title of date and time.
removes the subtitle from the plot.
plots the sequence along the top of the plot area.
removes the sequence from the plot.
sets the character height for the plot in GCG platen units.
The default value is 1.5.
These options let you specify how the plot axes appear.
sets the character height for the axis labels in GCG platen units.
The default value is 1.5.
sets the distance in GCG platen units between an axis and its label.
sets the character height for the tick labels in GCG platen units.
The default value is 1.5.
sets the length in GCG platen units of a major tick on the x axis.
sets the length in GCG platen units of a major tick on the y axis.
sets the number of minor divisions between labelled ticks on the x axis.
By default there are 5 minor divisions.
sets the number of minor divisions between labelled ticks on the y axis.
By default there are no minor divisions.
sets the position in GCG platen units of the bottom of the box drawn when the x axis is marked.
sets the position in GCG platen units of the top of the box drawn when the x axis is marked.
These options let you specify how the main data lines on the plot appear.
sets the line style for the first data line on the plot,
using a standard set of line definitions.
-LINESTyle2 sets the style for a second data line (if any)
and so on.
sets the line period size i GCG platen units for the first data line on the plot,
using a standard set of line definitions.
-LINEPERiod2 sets the period size for a second data line (if any)
and so on.
sets the line colour for the first data line on the plot,
using a standard set of line definitions.
-LINECOLor2 sets the colour for a second data line (if any)
and so on.
sets the size in GCG platen units of arrowheads.
These options let you specify how the horizontal and vertical lines appear.
sets the line type for a vertical reference line,
using a standard set of EGCG line types.
See the EGCG User's Guide
for more information.
sets the line style for a vertical reference line,
using a standard set of line definitions.
See the GCG User's Guide
for more information.
sets the line period size in GCG platen units for a dashed or dotted vertical reference line.
sets the line colour for a vertical reference line.
sets the line type for a horizontal reference line,
using a standard set of EGCG line types.
See the EGCG User's Guide
for more information.
sets the line style for a horizontal reference line,
using a standard set of line definitions.
See the GCG User's Guide
for more information.
sets the line period size in GCG platen units for a dashed or dotted horizontal reference line.
sets the line colour for a horizontal reference line.
These options apply to all GCG graphics programs.
These and many others are described in detail in Chapter 5,
Using Graphics of the User's Guide.
writes the plot as a text file of plotting instructions suitable for input to the Figure
program instead of drawing the plot on your plotter.
draws all text characters on the plot using Font 3 (see Appendix I)
.
draws the entire plot with the pen in stall 1.
These options let you expand or reduce the plot (zoom),
move it in either direction (pan),
or rotate it 90 degrees (rotate).
expands the plot by 20 percent by resetting the scaling factor (normally 1.0)
to 1.2 (zoom in).
You can expand the axes independently with -XSCAle and -YSCAle.
Numbers less than 1.0 contract the plot (zoom out).
moves the plot to the right by 30 platen units (pan right).
moves the plot up by 30 platen units (pan up).
rotates the plot 90 degrees.
Usually,
plots are displayed with the horizontal axis longer than the vertical (landscape).
Note that plots are reduced or enlarged,
depending on the platen size,
to fill the page.
Gardiner-Garden,
M (1987).
J.Mol.Biol.
196 261-282.
Printed: April 22,
1996 15:52 (1162)
COMMAND-LINE SUMMARY
Minimum syntax: % basepairplot [-INfile=]GenEMBL:hsh4bhis -default
Prompted Parameters:
-BEGin=1 -END=221 the range of interest
-WINdow=100 the window length
-SHIFT=1 the window shift
-STRing=CG the di-nucleotide to plot
Local Data Files: None
Optional Parameters:
-THRLine plots a line identifying possible 'islands'
-TITLEText="A Title" alternative plot title
-NOTITle suppresses plot title
Most EGCG graphics programs accept these and other switches. See the Using
Graphics chapter of the EGCG USERS GUIDE for descriptions.
-DENSity=150.0 plot density in bases per 100 platen units
-LEFTMARgin=10.0 sets the left plot margin position
-RIGHTMARgin=140.0 sets the right plot margin position
-BOTTOMMARgin=10.0 sets the bottom plot margin position
-TOPMARgin=90.0 sets the top plot margin position
-BORDer puts a line border around the plot
-NOBORDer suppresses a line border
-PAGENUMber forces page numbering
-NOPAGENUMber suppresses page numbering
-TITletext="text" overrides the default plot title
-NOTITletext suppresses the plot title
-SUBTITletext="text" overrides the default plot subtitle
-NOSUBTITletext suppresses the plot subtitle
-CHEIGHT=1.5 default plot character height
-LINESTyle1=1 plot line style 1 (set for each line)
-LINEPERiod1=1 plot line period 1 (set for each line)
-LINECOLor1=0 plot line colour 1 (set for each line)
All GCG graphics programs accept these and other switches. See the Using
Graphics chapter of the USERS GUIDE for descriptions.
-FIGure[=FileName] stores plot in a file for later input to FIGURE
-FONT=3 draws all text on the plot using font 3
-COLor=1 draws entire plot with pen in stall 1
-SCAle=1.2 enlarges the plot by 20 percent (zoom in)
-XPAN=10.0 moves plot to the right 10 platen units (pan right)
-YPAN=10.0 moves plot up 10 platen units (pan up)
-PORtrait rotates plot 90 degrees
LOCAL DATA FILES
OPTIONAL PARAMETERS
-THRLine
-TITLEText="A Title
-NOTITle
-MARk=hsh4bhis.mrk
-DENsity=1.2
-LEFTMARgin=10.0
-RIGHTMARgin=140.0
-BOTTOMMARgin=10.0
-TOPMARgin=90.0
-BORder
-NOBORder
-PAGENUMber
-NOPAGENUMber
-TITletext="text
-NOTITletext
-SUBTITletext="text
-NOSUBTITletext
-SHOWSEQuence
-NOSHOWSEQuence
-CHEIGHT=1.5
-LABCHEIGHT=1.5
-LABELPOSition=7.0
-TICKCHEIGHT=1.5
-XTICKLen=0.7
-YTICKLen=0.7
-XTICKDIVisions=5
-YTICKDIVisions=0
-MINMARK=1.0
-MAXMARK=1.0
-LINESTyle1=1
-LINEPERiod1=3.0
-LINECOLor1=0
-HEAD=1.0
-VLINETYpe=1
-VLINESTyle=1
-VLINEPERiod=3.0
-VLINECOLor=0
-HLINETYpe=1
-HLINESTyle=1
-HLINEPERiod=3.0
-HLINECOLor=0
-FIGure=programname.figure
-FONT=3
-COLor=1
-SCAle=1.2
-XPAN=30.0
-YPAN=30.0
-PORtrait
REFERENCES