Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

Software - Readme File


Inference of Ancestral Amino Acid Sequences by the Distance-Based Bayesian Method
(c) Copyright April ,1997 by Jianzhi Zhang and the Pennsylvania State University. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. ANCESTOR is distributed free of charge by:

Jianzhi Zhang
Institute of Molecular Evolutionary Genetics
Department of Biology
322 Mueller Laboratory
The Pennsylvania State University
University Park, PA 16802 USA


Current Address:

Associate Professor of Ecology

and Evolutionary Biology

University of Michigan

Ann Arbor, MI
E-mail: jianzhi@umich.edu

Jianzhi Zhang Homepage

Suggested citation
Zhang J, Nei M (1997) Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. J Mol Evol 44(Suppl 1):S139-S146
     ANCESTOR is designed to infer ancestral amino acid sequences from a set of homologous amino acid sequences whose phylogenetic relationships are known. This package contains one program: ancestor.exe, which is written in C language. The program can be used on IBM PC compatible computers (under Windows).
     First you have to unzip the file ancestor.zip by using the program pkunzip.exe. After that, you will see the following files:

ancestor.c (source code)
ancestor.exe (executable file)
jtt.pro (JTT substitution matrix)
lysozyme.aa (an example data file, see Stewart et al. 1987)
manual (this file)
result (output file from running ancestor lysozyme.aa)

     To install ANCESTOR on your computer's hard disk drive ("C" drive given here, for example), you should create a directory where the files of this package will be present. To do this, type the following c:\md ancestor (Enter)
     To copy the ANCESTOR files onto your hard disk drive, insert the floppy disk containing the programs into your floppy drive ("A" drive given here, for example). Then, enter the following command c:\copy a:*.* c:\ancestor\*.* (Enter)

Input file
     To use the program, you need an input file containing the amino acid sequences and the tree topology of these sequences (see lysozyme.aa for an example). This file begins with two numbers: the number of sequences and the number of amino acid sites (sequence length). The second line will be the name of the first sequence, and the third line will be the first sequence, and so on. Only the one letter code (capitalized) for the 20 amino acids are allowed in the sequences. The sequences should be aligned and gaps or any other symbols be removed already. The last line of the file is the tree topology of the sequences. The tree format is the same as that used in PHYLIP package (Felsenstein 1995). Note that the tree is unrooted, so trification rather than bification is required for the deepest branching node. For example, the topology of the following tree can be expressed by

11 |----------- 1
10 |-----------|
|----------| |---------------- 3
| |------------------------ 2
| |----------------------------- 6
|---------------| |---------- 4
9 | |--------|
| | 13 |
| | |------ 7
12 | |---- 5
14 |----- 8

     Note that in the topology expression, the numbers refer to the order of the present-day sequences given in the input file. Also note that in the topology expression, there are only numbers and ", " without any space.
The tree of the lysozyme sequences in the example data file is(((1,2),3),4,(5,6))(Stewart et al. 1987).

9 |------------------1 langur
8 |-----|
|---| |------------------2 baboon
7 | |
|---| |-------------------3 human
| |
| |--------------------4 rat
| |------------5 cow
10 |---------------------------6 horse

     You should also know the system of the notation of ancestral nodes because in the output file, the ancestral sequences are presented according to these node notations. The deepest node (the trification node specified in your expression of the tree topology) is denoted by N+1, where N is the number of sequences in the tree. In the above lysozyme example with N=6, the deepest node links the groups ((1,2),3), 4, and (5,6). So, "7" is given to the deepest node as shown in the tree. The notation of the nodes can be figured out from the output file, where I describe the branches that connect the nodes. Note that the ancestral (interior) nodes are numbered from N+1 to 2N-2.

     To infer the ancestral sequences from the data file, type c:\ancestor\ancestor filename. For example, to try the lysozyme.aa data, type c:\ancestor\ancestor lysozyme.aa
The computation is the same as described in Zhang and Nei (1997), except that the branch lengths are estimated by the ordinary least-squares method provided by Rzhetsky and Nei (1993), rather than the FITCH algorithm in the PHYLIP package (Felsenstein 1995). Because of sampling errors, the length of a branch may be estimated to be negative. In this case, the branch length is assigned to be zero. Computer simulation has shown that use of this approach does not affect the inference of the ancestral sequences. The JTT-f model of amino acid substitution (see Zhang and Nei 1997) is used in the program.
Output file
     The output of the ancestor.exe is given in the file named "result". The ancestral sequences are presented in three different formats.
(1) Site by site and pathway by pathway. From this format, one can see the probability of a pathway (amino acids at all ancestral nodes) at a given site.
(2) Site by site and node by node. From this format, one can see the most likely amino acid and its probability for a given node at a given site.
(3) The entire sequence for each node. The average probability of the entire sequence is also given.
Limitations of the program
     The program is designed for inferring ancestral amino acid sequences. A program specifically for inferring ancestral protein coding nucleotide sequences is under development.
     Felsenstein, J. (1995). PHYLIP: phylogeny inference package. Version 3.57c. University of Washington, Seattle.

     Rzhetsky, A., and M. Nei (1993). Theoretical foundation of the minimum-evolution model of phylogenetic inference. Mol. Biol. Evol. 10:1073-1095.

     Stewart, C.-B., J. W. Schilling, and A. C. Wilson. (1987) Adaptive evolution in the lysozymes of foregut fermenters. Nature 330:401-404.

     Zhang, J., and M. Nei (1997). Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. J. Mol. Evol. 44:S139-S146.

 Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

| Department of Biology  |  Eberly College of Science |
| Institute of Molecular Evolutionary Genetics | Penn State |
2002 The Pennsylvania State University
This page was last updated 6/11/09 by M. Ricardo.