Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

Software - Readme File


A Method for Computing Conservative and Radical Nonsynonymous Distances
Jianzhi Zhang
Laboratory of Host Defenses
National Institute of Allergy and Infectious Diseases
National Institutes of Health
Building 10, room 11N104
9000 Rockville Pike
Bethesda, MD 20892

Tel.: 301-402-1668
Fax: 301-402-4369
E-mail: jzhang@niaid.nih.gov
Suggested Citation
       Zhang J. (2000)  Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J. Mol. Evol. 50:56-68
       HON-NEW is designed for estimating conservative and radical nonsynonymous distances between protein coding DNA sequences.  The method is modified from  the original method of Hughes, Ota, and Nei (1990)by taking into account the  transition bias.  Three types of amino acid classifications (charge, polarity  and that of Miyata and Yasunaga) are provided. One can also define conservative  and radical amino acid changes by oneself (see next paragraph). The program is  written in C language and can be used on IBM PC compatible computers with  the windows95 operating system.

       One can define amino acid groups so that changes among groups are radical and within  groups are conservative.  To do that, creat a file named self.div.  In this  file, the first line should be the groups of amino acids (e.g., in the case  of charge, there are three groups [-,0,+]), the second line is the number of  amino acids in the first group, a space, and the amino acids in the group.One-letter code of amino acids should be used.  Next line will be the information for the second group.  One only needs to input the information of the first n-1 groups, if there are n groups in total, because the last group can be derived from the information of the first n-1 groups.  See charge.div for an example.
First make sure that the diskette you have received contains the following files.
hon-new.c (source code)
hon-new.exe (executable file)
manual (this file)
rnase.seq (an example data file)
outfile  (output file)
charge.div (amino acid classification by charge)
polarity.div (amino acid classification by polarity)
MY.div (amino acid classification by Miyata and Yasunaga)

       To install HON-NEW on your computer's hard disk drive ("C" drive given here, for example), you should create a directory where the files of this package will be present. To do this, type the following c:\md hon-new  (Enter).

       To copy the HON-NEW files onto your hard disk drive, insert the floppy disk containing the programs into your floppy drive ("A" drive given here, for example). Then, enter the following command  c:\copy a:*.* c:\hon-new\*.*  (Enter)

Input file
       To use the program, you need an input file containing the protein coding DNA sequences with stop codons removed (see rnase.seq for an example). This file begins with two numbers: the number of sequences and the number of nucleotides per sequence (sequence length). The second line will be the name of the first sequence, and the third line will be the first sequence, and so on. Only A, G, C, T, a, g, c, and t are allowed.  Gaps should be removed and sequences should be aligned beforehand.
To compute C, R, c, r, etc., type c:\hon-new\hon-new filename 
For example, to try the rnase.seq data, type c:\hon-new\hon-new rnase.seq

You will be asked to input the transition/transversion ratio (Ts/Tv), which should be estimated beforehand.  If you want to use the original method of Hughes, Ota, and Nei (1990), input Ts/Tv=0.5. The variances and covariances of distances are computed according to Ota and Nei (1994).
Output file
There are several output files with different formats.
(1) outfile: this is most useful, including C, R, c, r, pc, pr, and their variances.
(2) cr.rst:  this file includes covariances, in addition to those quantities given in outfile.
(3) sn.rst: includes S, N, s, n, ps, pn, ds, dn, and their variances and covariances.
Ts/Tv: transition/transversion ratio.  Ts/Tv=0.5 means no transition bias.  Note that R is not the transition/transversion rate ratio (which is often denoted by kapa). Under Kimura's model, 2R=kapa. 
S:  number of synonymous sites of a sequence.
N:  number of nonsynonymous sites of a sequence.
s:  number of synonymous differences between two sequences.
n:  number of nonsynonymous differences between two sequences.
ps: p-distance (proportion) of synonymous difference.
pn: p-distance (proportion) of nonsynonymous difference.
ds: Jukes-Cantor distance of synonymous difference.
dn: Jukes-Cantor distance of nonsynonymous difference.
C:  number of conservative nonsynonymous sites of a sequence.
R:  number of radical nonsynonymous sites of a sequence.
c:  number of conservative nonsynonymous differences between two sequences.
r:  number of radical nonsynonymous differences between two sequences.
pc: p-distance (proportion) of conservative nonsynonymous difference.
pr: p-distance (proportion) of radical nonsynonymous difference.

 Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

| Department of Biology  |  Eberly College of Science |
| Institute of Molecular Evolutionary Genetics | Penn State |
2002 The Pennsylvania State University
This page was last updated 11/18/03 by K. Seasholtz.