Eprotpars

Go back to top

EPROTPARS*


FUNCTION

EProtPars (Protein Sequence Parsimony Method) infers an unrooted phylogeny from protein sequences, using a new method intermediate between the approaches of Eck and Dayhoff (1966) and Fitch (1971).

EProtPars is a modified version of the PHYLIP version 3.572c's PROTPARS, by Joseph Felsenstein, with command line control added.


DESCRIPTION

EProtPars estimates phylogenies from protein sequences (input using the standard one-letter code for amino acids) using the parsimony method, in a variant which counts only those nucleotide changes that change the amino acid, on the assumption that silent changes are more easily accomplished.

The input file for EProtPars can be an MSF or PHYLIP formated file.


AUTHOR

This program was originally written by Joe Felsenstein (E-mail:joe@evolution.genetics.washington.edu. Post: Department of Genetics, University of Washington, Box 357360, Seattle, Washington 98195-7360, U.S.A.)

This version was modified for inclusion in EGCG by Maria Jesus Martin (E-mail: martin@ebi.ac.uk; Post: EMBL Outstation Hinxton, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SQ or E-mail: martin@tdi.es; Post: Tecnologia para Diagnostico e Investigacion, Condes de Torreanaz 5, 28028 Madrid).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).


EXAMPLE

Here is a session with EProtPars

  
  
  % eprotpars -options
  
   EPROTPARS of what sequences file ?  fos.msf{*}
  
   What should I call the output file (* fos.eprotpars *) ?
  
   Randomize input order of sequences  (* No *) ?
  
   OutGroup root (* No *) ?
  
   Use threshold parsimony (* No *) ?
  
   Print out the data at start of run (* No *) ?
  
   Print out steps in each site (* No *) ?
  
   Print sequences at all nodes of tree (* No *) ?
  
  Adding species:
FOSAVINK
FOSCHICK
FOSMOUSE
FOSRAT
FOSHUMAN
FOSXMSVFR
FOSMSVFB
FOSBMOUSE
FOSBSTAEP
  
  Doing global rearrangements
    !-----------------!
.................
  
  Output written to fos.eprotpars
  
  Trees also written onto fos.eprotparstrees
  
  
  %
  


INPUT FILE

The input file for EProtPars is either GCG MSF protein sequence file or a PHYLIP protein sequence file.

In the PHYLIP format the first line contains the number of species and the number of amino acid positions (counting any stop codons that you want to include), separated by blanks. Next come the species data. Each sequence starts on a new line, has a ten-character species name that must be blank-filled to be of that length, followed immediately by the species data in the one-letter code. The sequences must either be in the "interleaved" or "sequential" formats. In the interleaved format, some lines giving the first part of each of the sequences, then lines giving the next part of each, and so on. Thus the sequences mightlook like this:

  
  
   9 50
  FOSAVINK   ---------- ---------- ----------
  FOSCHICK   MMYQGFAGEY EAPSSRCSSA SPAGDSLTYY
  FOSMOUSE   MMFSGFNADY EASSSRCSSA SPAGDSLSYY
  FOSRAT     MMFSGFNADY EASSSRCSSA SPAGDSLSYY
  FOSHUMAN   MMFSGFNADY EASSSRCSSA SPAGDSLSYY
  FOSXMSVFR  ---------- ---------- ----DSLSYY
  FOSMSVFB   MMFSGFNADY EASSFRCSSA SPAGDSLSYY
  FOSBMOUSE  -MFQAFPGDY DSGSRCSSSP SAESQ----Y
  FOSBSTAEP  ---------- ---------- ----------
  
  ---------- -----SQDFC
  PSPADSFSSM GSPVNSQDFC
  HSPADSFSSM GSPVNTQDFC
  HSPADSFSSM GSPVNTQDFC
  HSPADSFSSM GSPVNAQDFC
  HSPADSFSSM GSPVNTQDFC
  HSPADSFSSM GSPVNTQDFC
  LSSVDSFGSP PTAAASQE-C
  ---------- ----------
  
  
The "sequential" format has all of the data for the first species, then all of the characters for the next species, and so on. For the PHYLIP formats, there is an option ( Use user trees in input file?) which signals that one or more user-defined "in nested-pairs parenthesis notation" trees are to be provided for evaluation. This "user tree" is supplied in the input file after the species data, with a line containing the number of user-defined trees being defined.

Here is an example with one user-defined tree in a sequential PHYLIP format:

  
  
   9 50
  FOSAVINK    ---------- ---------- ----------
  ---------- -----SQDFC
  FOSCHICK    MMYQGFAGEY EAPSSRCSSA SPAGDSLTYY
  PSPADSFSSM GSPVNSQDFC
  FOSMOUSE    MMFSGFNADY EASSSRCSSA SPAGDSLSYY
  HSPADSFSSM GSPVNTQDFC
  FOSRAT      MMFSGFNADY EASSSRCSSA SPAGDSLSYY
  HSPADSFSSM GSPVNTQDFC
  FOSHUMAN    MMFSGFNADY EASSSRCSSA SPAGDSLSYY
  HSPADSFSSM GSPVNAQDFC
  FOSXMSVFR   ---------- ---------- ----DSLSYY
  HSPADSFSSM GSPVNTQDFC
  FOSMSVFB    MMFSGFNADY EASSFRCSSA SPAGDSLSYY
  HSPADSFSSM GSPVNTQDFC
  FOSBMOUSE   -MFQAFPGDY DSGSRCSSSP SAESQ----Y
  LSSVDSFGSP PTAAASQE.C
  FOSBSTAEP   ---------- ---------- ----------
  ---------- ----------
  1
  ((((FOSBMOUSE,(FOSBSTAEP,FOSXMSVFR)),(FOSHUMAN,(FOSRAT,(FOSMSVFB,
  FOSMOUSE)))),FOSCHICK),FOSAVINK);
  

For more information about the Phylip format, please see the "main.doc" file from PHYLIP (Phylogeny Inference Package) distribution Version 3.57c by Joseph Felsenstein, available by anonymous FTP at evolution.genetics.washington.edu in directory pub/phylip.


OUTPUT FILE

The output from EProtPars are two files, one containing an ASCII representation of the most parsimonius tree and another containing the tree in nested-pairs parenthesis notation.

Here is the output file from the example session.

  
  
  Eprotpars Phylogram of fos.msf{*}. August 19, 1996 12:20
  
  
  One most parsimonious tree found:
  
  
  
  
                      +-----FOSBMOUSE
          +-----------7
          !           !  +--FOSBSTAEP
          !           +--8
       +--5              +--FOSXMSVFR
       !  !
       !  !        +--------FOSHUMAN
       !  +--------4
       !           !  +-----FOSRAT
    +--2           +--3
    !  !              !  +--FOSMSVFB
    !  !              +--6
  --1  !                 +--FOSMOUSE
    !  !
    !  +--------------------FOSCHICK
    !
    +-----------------------FOSAVINK
  
    remember: this is an unrooted tree!
  
  
  requires a total of   1677.000
  
  

Here is the output tree file from the example session.

  
  
  ((((FOSBMOUSE,(FOSBSTAEP,FOSXMSVFR)),(FOSHUMAN,(FOSRAT,(FOSMSVFB,
  FOSMOUSE)))),FOSCHICK),FOSAVINK);
  
  


RELATED PROGRAMS

PileUp creates a multiple sequence alignment from a group of related sequences using progressive pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. The user should note that this tree is not a phylogenetic tree. LineUp creates and edits multiple sequence alignments. Pretty displays multiple sequence alignments.

ToPhylip writes GCG sequences into a single file in PHYLIP format. Phylip2Tree displays trees computed with one of the PHYLIP-programs or with EProtPars EDnaPars, EDnaML, EDnaMLK, ENeighbor, EFitch and EKitsch, in GCG style. ESeqBoot produces multiple data sets from a molecular sequence data set by bootstrap, jackknife, or permutation resampling.EDnaPars estimates phylogenies from nucleic acid sequences using the parsimony method. EDnaDist computes a distance matrix from nucleic acid sequences, under four different models of nucleotide substitution (Jukes and Cantor (1969), Kimura (1980), Jin and Nei(1990) and a model of maximum likelihood (Felsenstein, 1981)). EProtDist computes a distance measure for protein sequences, using maximum likelihood estimates based on the Dayhoff PAM matrix, Kimura's 1983 approximation to it, or a model based on the genetic code plus a constraint on changing to a different category of amino acid. ENeighbor estimates phylogenies from distance matrix data using the Neighbor-Joining method or the UPGMA method of clustering. EFitch estimates phylogenies from distance matrix data under the "additive tree model" according to which the distances are expected to equal the sums of branch lengths between the species. EKitsch estimates phylogenies from distance matrix data under the "ultrametric" model which is the same as the additive tree model except that an evolutionary clock is assumed. EConsense computes consensus trees by the majority-rule consensus tree. It can be used as the final step in doing bootstrap analyses.


ALGORITHM

EProtPars infers an unrooted phylogeny from protein sequences, using a new method intermediate between the approaches of Eck and Dayhoff (1966) and Fitch (1971). Eck and Dayhoff (1966) allowed any amino acid to change to any other, and counted the number of such changes needed to evolve the protein sequences on each given phylogeny. This has the problem that it allows replacements which are not consistent with the genetic code, counting them equally with replacements that are consistent. Fitch, on the other hand, counted the minimum number of nucleotide substitutions that would be needed to achieve the given protein sequences. This counts silent changes equally with those that change the amino acid. .s 1 The present method insists that any changes of amino acid be consistent with the genetic code so that, for example, lysine is allowed to change to methionine but not to proline. However, changes between two amino acids via a third are allowed and counted as two changes if each of the two replacements is individually allowed. This sometimes allows changes that at first sight you would think should be outlawed. Thus we can change from phenylalanine to glutamine via leucine in two steps total. Consulting the genetic code, you will find that there is a leucine codon one step away from a phenylalanine codon, and a leucine codon one step away from glutamine. But they are not the same leucine codon. It actually takes three base substitutions to get from either of the phenylalanine codons UUU and UUC to either of the glutamine codons CAA or CAG. Why then does this program count only two? The answer is that recent DNA sequence comparisons seem to show that synonymous changes are considerably faster and easier than ones that change the amino acid. We are assuming that, in effect, synonymous changes occur so much more readily that they need not be counted. Thus, in the chain of changes UUU (Phe) --> CUU (Leu) --> CUA (Leu) --> CAA (Glu), the middle one is not counted because it does not change the amino acid (leucine). .s 1 To maintain consistency with the genetic code, it is necessary for the program internally to treat serine as two separate states (ser1 and ser2) since the two groups of serine codons are not adjacent in the code. Changes to the state "deletion" are counted as three steps to prevent the algorithm from assuming unnecessary deletions. The state "unknown" is simply taken to mean that the amino acid, which has not been determined, will in each tree that is evaluated be assumed be whichever one causes the fewest steps.

The assumptions of this method (which has not been described in the literature), are thus something like this:

(1) Change in different sites is independent. (2) Change in different lineages is independent. (3) The probability of a base substitution that changes the amino acid sequence is small over the lengths of time involved in a branch of the phylogeny. (4) The expected amounts of change in different branches of the phylogeny do not vary by so much that two changes in a high-rate branch are more probable than one change in a low-rate branch. (5) The expected amounts of change do not vary enough among sites that two changes in one site are more probable than one change in another. (6) The probability of a base change that is synonymous is much higher than the probability of a change that is not synonymous.

That these are the assumptions of parsimony methods has been documented by Felsenstein: (1973, 1978, 1979, 1981, 1983, 1988). For an opposing view arguing that the parsimony methods make no substantive assumptions such as these, see the works by Farris (1983) and Sober (1983a, 1983b, 1988), but also read the exchange between Felsenstein and Sober (1986).


CONSIDERATIONS

Phylip format

When using a PHYLIP formated input file, EProtPars show some extra options.

If a "user tree" or "user trees" are supplied in the input file, as it is described in the input file section, EProtPars reads a tree or trees from the input file and evaluates them. For that, answer 'yes' to the 'Use user trees in input file?' question or use the -USERTRee command-line option. When more than one tree is supplied, the program also performs a statistical test of each of these trees against the best tree. This test is a version of the test proposed by Alan Templeton (1983) and evaluated in a test case by Felsenstein (1985). It is closely parallel to a test using log likelihood differences described by Kishino and Hasegawa (1989), and uses the mean and variance of step differences between trees, taken across positions. If the mean is more than 1.96 standard deviations different then the trees are declared significantly different. The program prints out a table of the steps for each tree, the differences of each from the best one, the variance of that quantity as determined by the step differences at individual positions, and a conclusion as to whether that tree is or is not significantly worse than the best one.

If you have a "multiple data sets" input file, answer 'yes' to the 'Analyze multiple data sets ?' question or use -SETS= n command-line option (where n is the number of data sets). The data sets have the same format as the first data set. Here is an (very small) input file with two five-species data sets:

  
  
  

5 6 Alpha CCACCA Beta CCAAAA Gamma CAACCA Delta AACAAC Epsilon AACCCA 5 6 Alpha CACACA Beta CCAACC Gamma CAACAC Delta GCCTGG Epsilon TGCAAT

Using the program ESeqBoot you can make multiple data sets by bootstrapping. Trees can be produced for all of these using this option.

Output file

The exact contents of the output file depend on which options you have selected. If you select all possible output information, the output will consist of (1) the name of the program and date, (2) the input information printed out, (3) a series of phylogenies, some with associated information indicating how much change there was in each character or on each part of the tree.

Answer 'yes' to the 'Print out the data at start of run ?' or use -SHOWData command-line option for the data to appear in the output file, with the convention that "." means "the same as in the first species".

It is important to realize that the lengths of the segments of the printed tree are not significant, but purely conventional and are presented just to make the topology visible.

If you answer yes to 'Print out steps in each site ?' or use -SHOWSteps command-line option, the program print out a table containing the number of steps that different characters (or sites) require on the tree.

If you answer yes to 'Print sequences at all nodes of tree?' or use -SHOWChanges command-line option, a table is printed out after each tree, showing for each branch whether there are known to be changes in the branch, and what the states are inferred to have been at the top end of the branch. If the inferred state is a "?" there will be multiple equally-parsimonious assignments of states; the users must work these out for themselves by hand.

Others considerations

The exact details of the search of different trees depend on the order of input of species. You have the option to tell the program to use a random number generator to choose the input order of species. The program will then prompt you for a "seed" for the random number generator (or you can tell it from -RANDom= 1 command-line option) . The seed should be an integer between 1 and 32767, and should of form 4n+1, which means that it must give a remainder of 1 when divided by 4. This can be judged by looking at the last two digits of the number. Each different seed leads to a different sequence of addition of species. By simply changing the random number seed and re-running the programs one can look for other, and better trees. If the seed entered is not odd, the program will not proceed, but will prompt for another seed. The Jumble option also causes the program to ask you how many times you want to restart the process. If you answer 10, the program will try ten different orders of species in constructing the trees, and the results printed out will reflect this entire search process (that is, the best trees found among all 10 runs will be printed out, not the best trees from each individual run). Of course this is slow, taking 10 times longer than a single run. But it does give us a much greater chance of finding all of the most parsimonious trees.In practice, it is advisable to use the Jumble option to evaluate many different orderings of the input species and specify that it be done many times (as many as ten).

The Outgroup option ( -OUTGroup= 1 command-line option) specifies which species is to be used to root the tree by having it become the outgroup (the species being taken in the numerical order that they occur in the input file).

The Threshold option ( -THREShold= 1000 command-line option) sets a threshold such that if the number of steps counted in a character is higher than the threshold, it will be taken to be the threshold value rather than the actual number of steps. The defaults a threshold so high that it will never be surpassed (this will be a positive real number greater than 1). The use of thresholds to obtain methods intermediate between parsimony and compatibility methods is described by Felsenstein (1981).


COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Minimum Syntax: % eprotpars [-INfile=]file.msf{*} -default
  
  Prompted Parameters:
  
  [-OUTfile=]file.eprotpars output file.
  -INTERLeaved              interleaved PHYLIP formated input file
                             (only for PHYLIP formated input file).
  -NOINTERLeaved            sequencial PHYLIP formated input file
                             (only for PHYLIP formated input file).
  
  
  Optional Parameters:
  
  -OPTions                  makes the program ask for further specific
                             options.
  -USERTree                 one or more user-defined trees is to be
                             provided for evaluation in the input file
                             (only for PHYLIP formated input file).
  -RANDom=1                 use a random number generator to choose the
                             input order of species. The seed should be
                             an integer between 1 and 32767.
  -JUMnumber=10             number of times to restart the process
                             (with different orders of species).
  -OUTGroup=1               species used to root the tree.
  -THREShold=1000           threshold for the number of steps counted in
                             a character.
  -SETS=2                   multiple data sets
                             (only for PHYLIP formated input file).
  -SHOWData                 print data in the output file.
  -SHOWSteps                print out a table of the number of steps that
                             different characters require on the tree.
  -SHOWChanges              print sequences at all nodes of tree in the
                             output file.
  
  
  


OPTIONAL PARAMETERS

The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

-OPTions

makes the program ask for all specific options.

-USERTree

tells the program that one or more user-defined trees are to be provided for evaluation in the input file. Only when using PHYLIP formated input files. When more than one tree is supplied, the program also performs a statistical test of each of these trees against the best tree. The program prints out a table of the steps for each tree, the differences of each from the best one, the variance of that quantity as determined by the step differences at individual positions, and a conclusion as to whether that tree is or is not significantly worse than the best one.

-RANDom=1

use a random number generator to choose the input order of species. The seed should be an integer between 1 and 32767, and should of form 4n+1, which means that it must give a remainder of 1 when divided by 4. Each different seed leads to a different sequence of addition of species. If the seed entered is not odd, the program will not proceed, but will prompt for another seed.

-JUMnumber=10

causes the program to ask you how many times you want to restart the process. If you answer 10, the program will try ten different orders of species in constructing the trees, and the results printed out will reflect this entire search process (that is, the best trees found among all 10 runs will be printed out, not the best trees from each individual run). Of course this is slow, taking 10 times longer than a single run. But it does give us a much greater chance of finding all of the most parsimonious trees.

-OUTGroup=1

specifies which species is to be used to root the tree by having it become the outgroup (the species being taken in the numerical order that they occur in the input file).

-THREShold=1000

sets a threshold such that if the number of steps counted in a character is higher than the threshold, it will be taken to be the threshold value rather than the actual number of steps. The defaultis a threshold so high that it will never be surpassed (this will be a positive real number greater than 1).

-SETS=2

tells the program how many data sets there are from the input file. This is possible only for PHYLIP formated input file.

-SHOWData

print the sequences data in the output file, with the convention that "." means "the same as in the first species".

-SHOWSteps

print out a table of the number of steps that different characters (or sites) require on the tree. A typical example looks like this:

  
   steps in each site:
      0   1   2   3   4   5   6   7   8   9
  *-----------------------------------------
 0!       2   2   2   2   1   1   2   2   1
10!   1   2   3   1   1   1   1   1   1   2
20!   1   2   2   1   2   2   1   1   1   2
30!   1   2   1   1   1   2   1   3   1   1
40!   1
  
  
The numbers across the top and down the side indicate which site is being referred to. Thus site 23 is column "3" of row "20" and has 2 steps in this case.

-SHOWChanges

print out a table after each tree, showing for each branch whether there are known to be changes in the branch, and what the states are inferred to have been at the top end of the branch. If the inferred state is a "?" there will be multiple equally-parsimonious assignments of states; the user must work these out for themselves by hand.

Below is an example of the output file when using these options.

  
  
  
  Name          Sequences
  ----          ---------
  
  Alpha        ABCDEFGHIK
  Beta         ..--......
  Gamma        ?...S...??
  Delta        CIK.......
  Epsilon      DIK.......
  
  
  
  
  3 trees in all found
  
  
  
  
  +--------Gamma
  !
    +--2     +--Epsilon
    !  !  +--4
    !  +--3  +--Delta
  --1     !
    !     +-----Beta
    !
    +-----------Alpha
  
    remember: this is an unrooted tree!
  
  
  requires a total of     14.000
  
  steps in each position:
      0   1   2   3   4   5   6   7   8   9
  *-----------------------------------------
 0!       3   1   5   3   2   0   0   0   0
10!   0
  
  
  From    To     Any Steps?    State at upper node
                          ( . means same as in the node below it on tree)
  
  
      1                ANCDEFGHIK
    1      2         no     ..........
    2   Gamma        yes    ?B..S...??
    2      3         yes    ..?.......
    3      4         yes    ?IK.......
    4   Epsilon     maybe   D.........
    4   Delta        yes    C.........
    3   Beta         yes    .B--......
    1   Alpha       maybe   .B........
  
  
  
  
  
        +--Epsilon
     +--4
  +--3  +--Delta
  !  !
    +--2  +-----Gamma
    !  !
  --1  +--------Beta
    !
    +-----------Alpha
  
    remember: this is an unrooted tree!
  
  
  requires a total of     14.000
  
  steps in each position:
      0   1   2   3   4   5   6   7   8   9
  *-----------------------------------------
 0!       3   1   5   3   2   0   0   0   0
10!   0
  
  From    To     Any Steps?    State at upper node
                          ( . means same as in the node below it on tree)
  
  
      1                ANCDEFGHIK
    1      2         no     ..........
    2      3        maybe   ?.........
    3      4         yes    .IK.......
    4   Epsilon     maybe   D.........
    4   Delta        yes    C.........
    3   Gamma        yes    ?B..S...??
    2   Beta         yes    .B--......
    1   Alpha       maybe   .B........
  
  
  
  
  
        +--Epsilon
  +-----4
  !     +--Delta
    +--3
    !  !     +--Gamma
  --1  +-----2
    !        +--Beta
    !
    +-----------Alpha
  
    remember: this is an unrooted tree!
  
  
  requires a total of     14.000
  
  steps in each position:
      0   1   2   3   4   5   6   7   8   9
  *-----------------------------------------
 0!       3   1   5   3   2   0   0   0   0
10!   0
  
  From    To     Any Steps?    State at upper node
                          ( . means same as in the node below it on tree)
  
  
      1                ANCDEFGHIK
    1      3         no     ..........
    3      4         yes    ?IK.......
    4   Epsilon     maybe   D.........
    4   Delta        yes    C.........
    3      2         no     ..........
    2   Gamma        yes    ?B..S...??
    2   Beta         yes    .B--......
    1   Alpha       maybe   .B........
  
  
  


REFERENCES

Eck, R. V., and M. O. Dayhoff. 1966. Atlas of Protein Sequence and Structure 1966. National Biomedical Research Foundation, Silver Spring, Maryland.

Farris, J. S. 1983. The logical basis of phylogenetic analysis. pp. 1-47 in Advances in Cladistics, Volume 2, Proceedings of the Second Meeting of the Willi Hennig Society. ed. Norman I. Platnick and V. A. Funk. Columbia University Press, New York.

Felsenstein, J. 1973. Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Zoology 22: 240-249.

Felsenstein, J. 1978. Cases in which parsimony and compatibility methods will be positively misleading. Systematic Zoology 27: 401-410.

Felsenstein, J. 1979. Alternative methods of phylogenetic inference and their interrelationship. Systematic Zoology 28: 49-62.

Felsenstein, J. 1981. A likelihood approach to character weighting and what it tells us about parsimony and compatibility. Biological Journal of the Linnean Society 16: 183-196.

Felsenstein, J. 1983. Parsimony in systematics: biological and statistical issues. Annual Review of Ecology and Systematics 14:313-333.

Felsenstein, J. 1985. Confidence limits on phylogenies with a molecular clock. Systematic Zoology 34: 152-161.

Felsenstein, J. and E. Sober. 1986. Parsimony and likelihood: an exchange. Systematic Zoology 35: 617-626.

Felsenstein, J. 1988. Phylogenies from molecular sequences: inference and reliability. Annual Review of Genetics 22: 521-565.

Fitch, W. M. 1971. Toward defining the course of evolution: minimum change for a specified tree topology. Systematic Zoology 20: 406-416.

Kishino, H. and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of Molecular Evolution 29: 170-179.

Sober, E. 1983a. Parsimony in systematics: philosophical issues. Annual Review of Ecology and Systematics 14: 335-357.

Sober, E. 1983b. A likelihood justification of parsimony. Cladistics 1: 209-233.

Sober, E. 1988. Reconstructing the Past: Parsimony, Evolution, and Inference. MIT Press, Cambridge, Massachusetts.

Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37: 221-244.

For further information please refer to the "main.doc" and "protpars.doc" files from the PHYLIP (Phylogeny Inference Package) distribution Version 3.57c by Joseph Felsenstein (available by anonymous FTP at evolution.genetics.washington.edu in directory pub/phylip).

Printed: November 15, 1996 11:47 (1162)