Instruction for the Program for Computing the Standard Errors of Nucleotide Diversity (ã) and Nucleotide Divergence (d) (SEND.FOR)
This is a FORTRAN 77 program for computing the standard errors of the average number of nucleotide substitutions per site within populations (nucleotide diversity; ã) and between populations (nucleotide divergence; d). The algorithm of this program is presented in Nei and Jin's paper (Mol. Biol. Evol. 6:290-300). Either DNA sequence data or restriction-site data can be used. This program can handle data for up to 50 sequences, 5 populations, and 5 classes of restriction enzymes. The program was written by Li Jin on July 25, 1988.
DNA sequence data
The file of a distance matrix should be prepared before the program is exectued. Either the proportion of nucleotide differences (p|ij|) or the Jukes-Cantor distances (d|ij|) can be used. Note that the DNA sequences belonging to different populations should not be mixed up. That is, all sequences from each population should be grouped together. For example, if there are three sequences from population 1 and two sequences from population 2, the sequences from population 1 should be numbered 1, 2, and 3, whereas the sequences from population 2 be numbered 4 and 5. The distance values between sequences should be entered in the following order: D|12|, D|13|, ..., D|1n|, D|23|, ..., D|2n|, ..., D|n-1,n|, where n is the number of sequences. All distance values should be placed in one column. The following is an example file (all the data used in the input-file are writen in boldface in the following example (TEST.DAT)).
³ D|12| = .0500 ³
³ D|13| = .0507 ³
³ D|14| = .0768 ³
³ D|15| = .0486 ³
³ D|23| = .0380 ³
³ D|24| = .0912 ³
³ D|25| = .1433 ³
³ D|34| = .0253 ³
³ D|35| = .0496 ³
³ D|45| = .0621 ³
In this example, the total number of sequences is five, and sequences 1, 2, and 3 belong to population 1, whereas sequences 4 and 5 belong to population 2.
After having executed the program by typing SEND, you must answer all the questions appearing on the screen.
First, specify the type of data used. There are two options in this program: (1) nucleotide sequences, and (2) restriction-site data. Type 1 for DNA sequence data or 2 for restriction-site data. The computer will than ask you to provide information on the number of sequences and the number of populations.
There are two different kinds of tree-making methods that can be used in this program. Type 1 for UPGMA , or 2 for neighbor-joining method. If both methods are to be used, type 3. The explanation for using these two methods is given in Nei and Jin's paper.
The computer will then ask you about which population each sequences belongs to. On the screen you will see "Population 1 : 1 to ? " . If population 1 includes sequences 1 to 3, then type 3 to indicate that sequence 3 is the last one from population 1. This process should be repeated until all the sequences are classified.
The same questions will be asked for restriction-site data.
In the case of DNA sequence data, the computer will ask you to enter the total number of nucleotides examined. You must type 1 if the data are the proportions of nucleotide differences (p|ij|) or 2 if they are the Jukes-Cantor distances (d|ij|).
Finally the computer will ask you to provide the name of your input-file which you have already prepared.
The following is an example to show how to enter DNA sequence data.
³ Choose type of data to be analyzed: ³
³ (1) nucleotide sequences; ³
³ (2) restriction-site data ³
³ 1 ³
³ Please enter the number of SEQUENCES ³
³ 5 ³
³ Please enter the number of POPULATIONS ³
³ 2 ³
³ Choose type of tree-making method: ³
³ (1) UPGMA; ³
³ (2) Neighbor-joining method; ³
³ (3) Both. ³
³ 1 ³
³ Population 1 : 1 to ? ³
³ 3 ³
³ Population 2 : 4 to ? ³
³ 5 ³
³ Please enter the number of NUCLEOTIDES considered ³
³ 42 ³
³ Choose type of distance to be analyzed ³
³ (1) Proportion of nucleotide differences; ³
³ (2) J-C distance. ³
³ 1 ³
³ Please supply the name of input file ³
³ test.dat ³
|How to read output file
The output file, SEND.OUT, consists of three parts. The first part is the matrix of d|ij| (or p|ij|) or the matrices of m|i| and m|ij|. The second part is the information on the UPGMA tree and/or the neighbor-joining tree with the matrix of the patristic distances and the standardized discrepancy (see Nei and Jin's paper) of the tree(s). The third part is the nucleotide diversity (ã) and divergence (d) and their standard errors.
| Department of Biology |
Eberly College of Science |