Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

Software - Read me File


Estimation of Synonymous and Nonsynonymous Branch Lengths from Pairwise Distances

Jianzhi Zhang
Institute of Molecular Evolutionary Genetics
and Department of Biology
322 Mueller Laboratory
The Pennsylvania State University
University Park, PA 16802, USA

Associate Professor of Ecology

and Evolutionary Biology

University of Michigan

Ann Arbor, MI
E-mail: jianzhi@umich.edu

Jianzhi Zhang Homepage

Suggested Citation
       Zhang J., H. F. Rosenberg, and M. Nei (1998) Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:3708-3713.
       BN-BS is designed for estimating branch lengths in terms of synonymous and nonsynonymous substitutions per site, while the tree topology is given. The program uses the modified Nei-Gojobori method (Zhang et al. 1998) to estimate pairwise synonymous and nonsynonymous distances among present-sequences and then estimates branch lengths and their variances by using the ordinary least-squares method. The program is written in C language and can be used on IBM PC compatible computers with the windows95 operating system.
First make sure that the diskette you have received contains the following files.

bn-bs.c (source code)
bn-bs.exe (executable file)
ng-new.c (source code)
ng-new.exe (executable file)
manual (this file)
rnase.seq (example data file)
infile (input file)
outfile (output file)
result (output file)

To install BN-BS on your computer's hard disk drive ("C" drive given here, for example), you should create a directory where the files of this package will be present. To do this, type the following c:\md bn-bs (Enter)

To copy the BN-BS files onto your hard disk drive, insert the floppy disk containing the programs into your floppy drive ("A" drive given here, for example). Then, enter the following command c:\copy a:*.* c:\bn-bs\*.* (Enter)
Input file
       To use the program, you need an input file containing the protein coding DNA sequences (see infile for an example). This file begins with two numbers: the number of sequences and the number of nucleotides per sequence (sequence length). The second line will be the name of the first sequence, and the third line will be the first sequence, and so on. Only A, G, C, T, a, g, c, and t are allowed in sequences. Gaps should be removed and sequences should be aligned beforehand. The last line of the file is the tree topology of the sequences. The tree format is the same as that used in PHYLIP package (Felsenstein 1995). Note that the tree is unrooted, so trification rather than bification is required for the deepest branching node. For example, the topology of the following tree can be expressed by


                             11 |----------- 1
                 10 |-----------|
         |----------|           |---------------- 3
         |          |------------------------ 2
         |               |----------------------------- 6
         |---------------|              |---------- 4
                       9 |     |--------|
                         |     |     13 |
                         |     |        |------ 7
                            12 |      |---- 5
                                   14 |----- 8

       Note that in the topology expression, the numbers refer to the order of the present-day sequences given in the input file. Also note that in the topology expression, there are only numbers and ", " without any space.
The tree of the ribonuclease sequences in the example data file is

                   16|------------1 human-ECP
           |---|     |------------2 chimp-ECP
         14|   |
       |---|   |------------------3 gorilla-ECP
     13|   |
    |--|   |----------------------4 orangutan-ECP
    |  | 
    |  |--------------------------5 macaque-ECP
    |              20|------------6 human-EDN
|---|12      19|-----|
|   |      |---|     |------------7 chimp-EDN
|   |    18|   |
|   |  |---|   |------------------8 gorilla-EDN
|   |  |   |
|   |--|   |----------------------9 orangutan-EDN
|    17| 
|      |--------------------------10 macaque-EDN
|---------------------------------11 tamarin-EDN 
       To compute bs and bn, first prepare the input data file and name it "infile", and then type c:\bn-bs\bn-bs
You will be asked to input the transition/transversion ratio (R), which should be estimated beforehand. Note that R is not the transition/transversion rate ratio, which is usually denoted as kapa. Under Kimura's model, 2R=kapa. If you want to use the original Nei-Gojobori method, input R=0.5. You will then be asked to choose either p-distance or Jukes-Cantor distance. The results are given in the files named "outfile" and "result".
Output file
There are two major output files.
(1) result: gives synonymous and nonsynonymous branch lengths with variances
(2) outfile: gives S, N, s, n, ps, pn, ds, dn, and variances for pairwise distances
In "result", each branch is described by the two nodes of the branch. It is not difficult to figure out which branch is which.
R: transition/transversion ratio. R=0.5 means no transition bias. Note that R is not the transition/transversion rate ratio (which is often denoted by kapa). Under Kimura's model, 2R=kapa.
S: number of synonymous sites of a sequence.
N: number of nonsynonymous sites of a sequence.
s: number of synonymous differences between two sequences.
n: number of nonsynonymous differences between two sequences.
ps: p-distance (proportion) of synonymous difference.
pn: p-distance (proportion) of nonsynonymous difference.
ds: Jukes-Cantor distance of synonymous difference.
dn: Jukes-Cantor distance of nonsynonymous difference.
bs: synonymous branch length.
bn: nonsynonymous branch length.

 Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

| Department of Biology  |  Eberly College of Science |
| Institute of Molecular Evolutionary Genetics | Penn State |
2002 The Pennsylvania State University
This page was last updated 6/11/09 by M. Ricardo.