GENETIC DISTANCE AND PHYLOGENETIC ANALYSIS
(c) Copyright 1993 by Tatsuya Ota and the Pennsylvania State University. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. DISPAN is distributed free of charge by
|Institute of Molecular Evolutionary Genetics
The Pennsylvania State University
328 Mueller Laboratory
University Park, PA 16802, USA
Department of Biosystems Sciences
School of Advanced Sciences
The Graduate University for Advanced Studies (SOKENDAI)
Hayama, Kanagawa, 240-0193
DISPAN (Genetic Distance and Phylogenetic Analysis) is designed to compute the following:
(1) Average heterozygosity and its standard error for each population
(2) Gene diversity (Ht) and its associate parameters, Hs, and Gst (Nei 1973)
(3) Standard genetic distances (D) between populations (Nei 1972)
(4) Standard errors of standard genetic distances (Nei 1978)
(5) DA distances between populations (Nei et al. 1983)
It also constructs phylogenetic trees(dendrograms) by using the neighbor-joining (NJ) method (Saitou and Nei 1987) and the unweighted pair group-method with arithmetic mean (UPGMA) (Sneath and Sokal 1973) from matrices of either D or DA distances. Bootstrap tests (Efron 1982, Felsenstein 1985) for these trees can be performed.
This software contains two programs: (1) GNKDST.EXE and (2) TREEVIEW.EXE.
The first program is written in C language by T.Ota, whereas the second program is written in modula-2 language by Koichiro Tamura. The first program is a rewritten version of the program originally written in FORTRAN language by A.K. Roychoudhury, Y. Tateno, D. Graur, N.Saitou, and R. Schwartz.
PHYLTST.exe is a self extracting file. This compressed module contains the executables and examples for the program It can be used in DOS, and Windows 3.1, 95, and NT in the DOS box. If you have any problems in correctly installing this program in your directory (or fetching it), please write to the above address.
First make sure that the diskette you have received contains the following files.
The first two files contain the executive programs and the third is data file.
To install DISPAN on your computer's hard disk drive ("C" drive given here, for example), you should create a directory where two executive programs will be present. To do this, type the following
C:\MD DISPAN (Enter)
To copy the DISPAN programs and files onto your hard disk drive, insert the floppy containing the programs into your floppy drive ("A" drive given here, for example). Then, enter the following command
C:\ copy A:*.* C:\DISPAN (Enter)
If your computer does not have a hard disk, you can use the distribution diskette in your "A" drive.
You arrange the data as shown in the example data file, i.e., TEST.DAT. A section bounded by "/*" and "*/" is for comments, and it would be no harm for the program. The line with "#P = " specifies the number of populations or the list of population names. If you want to specify populations by numerical numbers, you can write as "#P = 13" or "#Population = 13". If you want to list population names, you can write as "#P = (one,two,three,four,....,thirteen)". "(" and ")" serves as the beginning and the end of the list. (Do not use ")" for a population name.) "," should be used to separate the population names. Once you list population names, the program count the number of populations from the list. Do not specify the number of populations in this case. The line with "#M = " specifies the number of monomorphic loci or the names of monomorphic loci as in the case of "#P" (see TEST.DAT for example). After you make the above specifications, type gene frequencies for each
locus separately. Data for a locus consist of three sections: 1) a line (comment line) with "@L" at the beginning of each locus, 2) the number of alleles at the locus specified by "#A = ", and 3) gene frequencies and the number of sampled genes in the following lines. The number of alleles can be specified with the same manner as the number of populations/monomorphic loci. Gene frequencies for each locus should be given in the same order for all populations, and they should be followed by the number of genes sampled (i.e., two times the number of diploid individuals sampled). Be sure that the sum of the gene frequencies should not be below 0.9989 or above 1.0011. The gene frequencies for different populations should be presented in the same order for all loci. Being satisfied with the correctness of data file, then start the program GNKDST by typing GNKDST with proper options:
C:\DISPAN\GNKDST -da -ds -fTEST.DAT -g -r1000 -s516 -tn -tu
Options used here are as follows:
Estimation of standard genetic distances and DA distances
-da: DA distance
-ds: standard genetic distance
TEST.DAT as input file
-fTEST.DAT: Note that no space is allowed between -f and
the name of input file
Estimation of Hs, Ht and Gst for each locus
-r1000: The number after -r is the number of bootstrap
replications, and any number can be used. Note that
no space is allowed between -r and the number of
replications. If you do not include -r option, the
bootstrap test will not be performed. Bootstrapping
will be done with loci as units.
Seed number for random number generator
-s516: The number after -s is the seed number for random
number generator. Any integer except zero can be used.
Any positive integer is converted to a negative integer.
Construction of NJ trees and UPGMA trees
-tn: NJ tree
-tu: UPGMA tree
If you do not need any of the options, -da, -ds, -g, -r1000, -s516, -tn, and -tu, delete it. For example, if you want to obtain a NJ tree with standard genetic distances but without a bootstrap test, you should type as follows:
C:\DISPAN\GNKDST -ds -fTEST.DAT -tn
The program will run, check the input file, and report the data specifications in the input file. If the numbers reported are correct, press "Y" and "return" keys to proceed the calculation. Otherwise, press "N" and "return" keys to stop the program to check the input file or to change the input file to another.
Once the program stops, you can get your results by opening the file "GNKDST.DST" and the following results will appear. (All new results are appended to previous results, if a file called "GNKDST.DST" has existed.)
(1) Average heterozygosity and its standard error for each population
(2) Hs, Ht, and Gst over all loci
(3) Hs, Ht, and Gst for each locus(optional)
(4) Standard genetic distances for all pairs of populations(optional)
(5) Standard errors of standard genetic distances for all pairs of populations(optional)
(6) DA distances for all pairs of populations(optional)
While running the GNKDST program, other output files with "TRE" extension may appear. These files are for phylogenetic trees:
DANJ.TRE: Neighbor-joining tree with DA distances
DAUPGMA.TRE: UPGMA tree with DA distances
STDNJ.TRE: Neighbor-joining tree with standard genetic distances
STDUPGMA.TRE: UPGMA tree with standard genetic distances.
(If a file with same name exists before running the program, the file will be overwritten by new result. If you want to keep the old file, you need to rename the previous file.)
You can see the tree structure on screen by typing TREEVIEW as follows:
The screen will show all the files including TRE files. Bring the cursor to the one of the TRE file and press "Enter". On the screen you will find a tree with proper branch lengths and population numbers (names) at the end of each branch. You can make branch swapping according to your desire. To do this, you press "Esc" key and a box will appear on the upper left hand corner of the screen with a number of options. Bring the cursor on "Swap Branches" and press "Enter". An arrow symbol (red arrow symbol in the case of colored monitor) will appear on a nodal point and this arrow symbol can be moved up and down by pressing up and down arrow keys respectively. By placing the arrow symbol on any nodal point and pressing the "Enter" key, the branches can be swapped at the nodal point. By placing the arrow symbol at different nodal points and pressing the "Enter" key, you can rearrange the populations of the tree according to your desire. After being satisfied with the design of the
tree, press the "Esc" key to remove the arrow symbol. To print your desired tree in graphic mode, press the "Esc" key again and bring the cursor to the "Draw with Graphics" and press "Enter". The tree in graphic mode will appear on the screen. To print it, just press the letter "P" and wait. If your computer is connected with printer, you will get printout of the tree. To get out of the program, press the "Esc" key twice and bring the cursor to "Exit Program" and press "Enter". Alternatively, you can get the tree by printing the file called TREEVIEW.OUT, which can be obtained by a option "Save tree" while doing with TREEVIEW program.
Note for UPGMA. Printing the TREEVIEW.OUT file, the branch lengths for all the populations under study may not be found equal, though the branch length from the root should be the same for all populations with a UPGMA tree. To avoid this problem, it is suggested that the graphic mode for printing the tree be used.
Note for NJ. Since NJ is an unrooted tree, you can re-root the tree by pressing the "Esc" key. As before, a box will appear on the upper left hand corner of the screen with a number of options. Bring the cursor on "Move Root" and press "Enter" key. A diamond symbol (green diamond symbol in the case of colored monitor) will appear on any interior branch of the tree. Then by pressing up and down arrow keys, you can put the diamond symbol to any interior branch where you want to re-root the tree and press the "Enter" key. A tree will be formed with your assigned root. To get rid of the diamond symbol, press the "Esc" key.
Note for the bootstrap test. The numbers shown on the tree are the bootstrap numbers. If you want to remove them, you can do so by pressing the "Esc" key. As before, a box will appear on the upper left hand corner of the screen with a number of options. Bring the cursor on "Hide bootstrap number" and press "Enter" key.
Efron (1982) The jackkife, the bootstrap, and other resampling plans. CBMS-NSF. Regional conference series in applied mathematics. No 38. Society for industrial and applied mathematics. Philadelphia, PA.
Felsenstein (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791.
Nei, M. (1972) Genetic distances between populations. Am. Nat. 106:283-292.
Nei, M. (1973) Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci., USA 70:3321-3323.
Nei, M. (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583-590.
Nei, M., Tajima, F., and Tateno, Y. (1983) Accuracy of estimated phylogenetic trees from molecular data. J. Mol. Evol. 19:153-170.
Saitou, N. and Nei, M. (1987) The neighbor-joining method: A new method for reconstructing phylogenetic tree. Mol. Biol. Evol. 4:406-425.
Sneath, P.H.A. and Sokal. R.R. (1973) Numerical Taxonomy. Freeman, San Francisco.
| Department of Biology |
Eberly College of Science |