EPeptideSort shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by weight, position, and HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein. EPeptideSort is a modified version of GCG's PeptideSort which has additional options to control output of peptides sorted by weight, retention and position.
EPeptideSort cuts a peptide sequence with any or all of the proteolytic enzymes and reagents listed in the public or local data file proenzall.dat. The peptides from each digest are sorted by position, weight, and retention time in a high-pressure liquid chromatograph at pH 2.1. For each peptide in each sorting, the following data are displayed: beginning and ending positions, molecular weight, HPLC retention at pH 2.1, HPLC retention at pH 7.4, charge, number of aromatic residues, number of acidic residues, number of basic residues, number of residues containing sulfur, number of hydrophilic residues, and number of hydrophobic residues. The content, isoelectric point, and molar extinction coefficient at 280 nm of each peptide are shown with the table of peptides sorted by position. The content can be displayed in the order of expected elution from an amino acid analyzer.
This GCG program was modified by Jaakko Hattula (Tampere University of Technology, Finland) and Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a session using EPeptideSort to sort the tryptic peptides from the protein sequence in the file ggamma.pep:
% peptidesort -weight EPEPTIDESORT uses protein sequence data EPEPTIDESORT of what sequence ? ggamma.pep Start (* 1 *) ? End (* 148 *) ? Select the enzymes: Type nothing or "*" to get all enzymes. Type "?" for help on which enzymes are available and how to select them.
Enzyme (* * *): tryp Tryp Tryp "TRYP" selected 2 enzymes, new total: 2. Enzyme: What should I call the output file (* ggamma.pepsort *) ? %
Here is the output file:
EPEPTIDESORT of: check: 6924 from: 1 to: 148 TRANSLATE of: gamma.seq check: 6474 from: 2179 to: 2270 and of: gamma.seq check: 6474 from: 2393 to: 2615 and of: gamma.seq check: 6474 from: 3502 to: 3630 generated symbols 1 to: 148. Human fetal beta globins G and A gamma from Shen, Slightom and Smithies, Cell 26; 191-203. . . . With Enzymes: TRYP March 19, 1996 12:50 .. Digest with: Tryp. Peptides Sorted by Weight Pos From To Mol Wt Ret2.1 Ret7.4 Chg Aro Acid Base Sulf Phil Phob 8 67 - 67 146.2 3.3 -0.5 1.0 0 0 1 0 1 0 6 61 - 62 245.3 6.6 2.2 1.0 0 0 1 0 1 1 16 146 - 148 300.3 15.5 2.0 0.0 1 0 0 0 1 1 7 63 - 66 411.5 3.5 -3.5 1.0 0 0 1 0 2 2 10 78 - 83 739.8 18.5 -2.8 -1.0 0 2 1 0 4 2 2 10 - 18 976.1 41.9 44.2 1.0 1 0 1 0 4 5 9 68 - 77 1016.2 32.8 29.9 0.0 0 1 1 0 4 6 1 1 - 9 1093.2 8.3 -25.3 -2.0 1 3 1 1 6 3 12 97 - 105 1098.2 27.4 0.9 -1.0 1 2 1 0 6 3 15 134 - 145 1178.4 15.4 26.5 1.0 0 0 1 1 5 7 4 32 - 41 1274.5 61.9 48.2 1.0 2 0 1 0 4 6 3 19 - 31 1316.4 -0.9 -25.2 -2.0 0 3 1 0 6 7 11 84 - 96 1448.6 20.2 -5.5 -1.0 1 2 1 1 7 6 14 122 - 133 1449.6 24.3 -2.6 -1.0 2 2 1 0 8 4 13 106 - 121 1694.1 78.5 68.8 1.0 1 0 1 0 4 12 5 42 - 60 1990.2 59.9 68.7 0.0 3 1 1 1 9 10
PeptideMap creates a peptide map with an output format similar to the DNA restriction maps. Isoelectric plots the charge as a function of pH for any peptide sequence.
The algorithm used by EPeptideSort to estimate HPLC retention times (Meek, Proc. Natl. Acad. Sci. USA 77; 1632 (1980)) is based on the assumption that the retention of a peptide correlates to its amino acid composition. This assumption holds for peptides of up to about 20 amino acids, but steric and conformational factors can affect the retention of longer peptides. Retention times calculated by EPeptideSort for peptides longer than 20 amino acids should not be considered accurate.
The formula for estimating the retention time is the sum of the retention coefficients for the amino acids in the peptide, plus the coefficients for the end groups, plus a value t0, which is the time for elution of unretained compounds. The retention time reported by EPeptideSort does not include the t0 value. You will have to determine this time for your HPLC system and add it to the reported times.
Meek's paper does not report retention coefficients for cysteine, only for cystine. EPeptideSort assumes that these are the same. Therefore the estimated retention time for a peptide containing cysteines may be inaccurate.
The retention times reported by EPeptideSort should be regarded as estimates, since the actual retention times can vary according to the elution conditions. Meek's retention coefficients were determined empirically using a linear gradient of acetonitrile, starting at 0% at 0 min and increasing to 60% at 80 min (0.75% per min). Increasing the gradient rate to 1.5% acetonitrile per min resulted in retention times that were 70% of normal. Decreasing the gradient rate to 0.5% per min resulted in retention times that were 120% of normal. Meek also noted minor differences in relative retention rates with columns made by different manufacturers.
A digest may not produce more than 1,000 peptides. If you choose all enzymes by typing * to the prompt Select enzymes: and your protein sequence is over 500 residues long, there may be a great deal of output. Remember to delete the output file when you are finished looking at the data.
The program presents you with an enzyme selection prompt that lets you enter enzymes individually or collectively. To get help with selecting enzymes, type a ? at the enzyme prompt. Here is what you see:
Select enzymes: Type "*" to select all enzymes. Type "**" to select all enzymes including isoschizomers. Type individual names like "AluI" to select specific enzymes. Type "?" to see this message and all available enzymes. Type "??" to see the available enzymes AND their recognition sites. Type "?A*" to see what enzymes start with "A." Type "A*" to select all enzymes starting with "A." Type parts of names like "Al*" to select all enzymes starting with "AL." Type "~A*" to unselect all selected enzymes starting with "A." Type "/*" to see what enzymes you have selected so far. Type "#" to select no enzymes at all. Pressafter each selection. Press and nothing else to end your selections. Spaces are allowed and letter case is ignored.
We maintain our enzyme files with a semicolon (;) character in front of all but one member of a family of isoschizomers. (Isoschizomers are restriction endonucleases with the same recognition site.) The isoschizomers beginning with a semicolon are normally not displayed by our mapping programs unless you specifically select them by name or type "**" instead of "*" at the enzyme prompt.
There is more information on enzyme files in the Data Files manual.
A command-line expression like -ENZymes=AluI,EcoRII would choose AluI and EcoRII and suppress interactive enzyme selection.
EPeptideSort only accepts protein sequences. If EPeptideSort rejects your protein sequence, turn to Appendix VI to see how to change or set the type of a sequence.
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimal Syntax: % epeptidesortort [-INfile=]gzeinaa.pep -Default Prompted Parameters: -BEGin=18 -END=243 range of interest -ENZymes=*[,...] enzymes of interest [-OUTfile=]gzeinaa.pepsort output file name Local Data Files: -DATa1=proenzall.dat contains enzyme data -DATa2=aminoacid.dat contains amino acid data Optional parameters: -7 sorts on HPLC retention at pH 7.4 instead of pH 2.1 -MINCuts=2 shows only enzymes that cut at least 2 times -MAXCuts=4 shows only enzymes that cut less than 4 times -ELUtion[=DNEQSGHRTAPYVMCILFKW] sets the order of the composition display -SUMmary output only summary (or NOSUMmary to stop) -POSition If none of these three, then output all three -RETention types of values. If one or more, then just -WEIght output those specified
GCG's original PeptideSort was written by John Devereux in the GCG laboratory. It was designed to handle several suggestions made to us by Drs. Michael Gribskov and Roland Rueckert. HPLC retention is from Meek, Proc. Natl. Acad. Sci. USA 77; 1632 (1980). Molar extinction coefficient is from Gill, S.C. and von Hippel, P.H., Anal. Biochem. 182; 319-326 (1989).
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
EPeptideSort needs three data files that can be either local or public. proenzall.dat (see the Data Files manual) contains information about the enzymes and proteolytic reagents. aminoacid.dat has information on the physical properties of the amino acids. extinctcoef.dat contains extinction coefficient data for the amino acids.
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the User's Guide.
causes EPeptideSort report the peptides sorted by position, but not the other sections unless specifically requested. -NOPOSition specifically blocks the "sorted by position" section, but reports the rest.
causes EPeptideSort report the peptides sorted by retention, but not the other sections unless specifically requested. -NORETention specifically blocks the "sorted by retension" section, but reports the rest.
causes EPeptideSort report the peptides sorted by weight, but not the other sections unless specifically requested. -NOWEIght specifically blocks the "sorted by weight" section, but reports the rest.
causes EPeptideSort report the amino acid summary, but not the other sections unless specifically requested. -NOSUMmary specifically blocks the summary section, but reports the rest.
causes EPeptideSort to sort each digest on HPLC retention at pH 7.4 instead of on HPLC retention at pH 2.1 (default).
excludes enzymes that do not cut at least n times.
excludes enzymes that cut more than n times.
sets the order for the composition data display. If you leave the optional parameter blank but use the -ELUtion switch the order is changed from alphabetical to DNE... as expected from the Waters analyzer.
Meek (1980) Proc. Natl. Acad. Sci. USA 77, 1632.
Gill, S.C. and von Hippel, P.H. (1989) Anal. Biochem. 182, 319-326.
Janin (1979) Nature 277, 491-492.
Printed: April 22, 1996 15:53 (1162)