Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

Software - Readme File


CAPE: Convergent and Parallel Evolution at the Amino Acid Sequence Level
(c) Copyright July 1997 by Jianzhi Zhang and the Pennsylvania State University. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed.  CAPE is distributed free of charge by

Jianzhi Zhang
Institute of Molecular Evolutionary Genetics
and Department of Biology
322 Mueller Laboratory
The Pennsylvania State University
University Park, PA 16802, USA

Current Address:

Associate Professor of Ecology

and Evolutionary Biology

University of Michigan

Ann Arbor, MI
E-mail: jianzhi@umich.edu

Jianzhi Zhang Homepage

Suggested citation:

Zhang J. and S. Kumar (1997) Detection of convergent and parallel evolution at the amino acid sequence level.  Mol. Biol. Evol. 14:527-536.

     CAPE is designed to test convergent and parallel evolution at the amino acid sequence level.  It computes the probabilities that the observed convergent and parallel substitutions are attributable to random chance. This package contains one program: converg2.exe, which is written in C language.  The program can be used on IBM PC compatible computers with windows 95.
     First make sure that the diskette you have received contains the following files.

    converg2.c          (source code)
    converg2.exe       (executable file)
    jtt.pro                  (matrix for JTT substitution model)
    poisson.pro         (matrix for Poisson substitution model)
    lysozyme.aa        (an example data file, see Stewart et al. 1987)
    manual                (this file)

     To install CAPE on your computer's hard disk drive ("C" drive given here, for example), you should create a directory where the files of this package will be present.  To do this, type the following
                c:\md cape  (Enter)

     To copy the CAPE files onto your hard disk drive, insert the floppy disk containing the programs into your floppy drive ("A" drive given here, for example).  Then, enter the following command
                c:\copy a:*.* c:\cape\*.*  (Enter)
Input File
     To use the program, you need a input file containing the amino acid sequences and the tree topology of these sequences (see lysozyme.aa for an example).  This file begins with two numbers: the number of sequences and the number of amino acid sites (sequence length).  The second line will be the name of the first sequence, and the third line will be the first sequence, and so on.  Only the one letter code (capitalized) for the 20 amino acids are allowed in the sequences.  The sequences should be aligned and gaps or any other symbols be removed already.  The last line of the file is the tree topology of the sequences.  The tree format is the same as that used in PHYLIP package (Felsenstein 1995).  Note that the tree is unrooted, so trification rather than bification is required for the deepest branching node.  For example, the topology of the following tree can be expressed by


                                     11 |----------- 1
                    10 |-----------|
         |----------|                 |---------------- 3
         |               |------------------------ 2
         |               |----------------------------- 6
         |---------------|                   |---------- 4
                             9 |       |--------|
                                |       |       13 |
                                |       |            |------ 7
                                   12 |         |---- 5
                                             14 |----- 8

     Note that in the topology expression, the numbers refer to the order of the sequences given in the input file.  Also note that in the topology expression, there are only numbers and ", " without any space.

        The tree of the lysozyme sequences in the example data file is

          (((1,2),3),4,(5,6))   (Stewart et al. 1987)

                 9 |------------------1 langur
         8 |-----|
       |---|       |------------------2 baboon
    7 |    |
  |---|    |-------------------3 human
  |    |
  |    |--------------------4 rat
  |                        |------------5 cow
                      10 |---------------------------6 horse

        You should also know the system of the notation of interior (ancestral) nodes because you will be asked to input the branches (by two nodes of a branch) on which convergent and parallel evolution is to be examined.  The deepest node (the trification node specified in your expression of the tree topology) is denoted by N+1, where N is the number of sequences in the tree.  In the above lysozyme example with N=6, the deepest node links the groups ((1,2),3),  4, and (5,6).  So, "7" is given to the deepest node as shown in the tree.  The notation of the nodes can be figured out from the output file (RESULT) of program ancestor.exe, which you may use to infer the ancestral sequences.  In the file RESULT, I describe the branches by their two ends (nodes).  Note that the ancestral (interior) nodes are numbered from N+1 to 2N-2.
     To compute the probabilities fc (the probability that the observed convergent substitutions are attributable to random chance) and fp (the probability that the observed parallel substitutions are attributable to random chance), you have to first decide on which two branches you are going to examine the convergent and parallel evolution, and determine the observed numbers of convergent and parallel substitutions on these branches.  These
observed numbers can be counted by inferring ancestral amino acid sequences at the interior nodes of the tree.  For this purpose, you may use the program ancestor.exe of package ANCESTOR, distributed by Jianzhi Zhang.

        After you have obtained these information, you can compute fc and fp by type
                c:\cape\converg2 filename
        Foe example, to try the lysozyme.aa data, type
                c:\ancestor\converg2 lysozyme.aa

     You will be asked to choose substitution models.  I have provided the matrices of Poisson and JTT models.  You will also be asked to input the branches to be examined and the observed numbers of convergent and parallel substitutions.
Limitations of the program
     The program is designed for testing convergent and parallel evolution on two branches.  For tests on 3 branches, please contact me.

 Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

| Department of Biology  |  Eberly College of Science |
| Institute of Molecular Evolutionary Genetics | Penn State |
2002 The Pennsylvania State University
This page was last updated 6/10/09 by M. Ricardo.