Home | CV | Databases | IMEG Seminars | Journals
 
MEP-online | People | Publications | SoftwareText only version



Software - Read me File

 

 
SEND:                              
Instruction for the Program for Computing the Standard Errors of Nucleotide Diversity (ã) and Nucleotide Divergence (d) (SEND.FOR)
 
General Remarks
     This is a FORTRAN 77 program for computing the standard errors of the average number of nucleotide substitutions per site within populations (nucleotide diversity; ã) and between populations (nucleotide divergence; d). The algorithm of this program is presented in Nei and Jin's paper (Mol. Biol. Evol. 6:290-300). Either DNA sequence data or restriction-site data can be used. This program can handle data for up to 50 sequences, 5 populations, and 5 classes of restriction enzymes. The program was written by Li Jin on July 25, 1988.
 
Data Entry
DNA sequence data
     The file of a distance matrix should be prepared before the program is exectued. Either the proportion of nucleotide differences (p|ij|) or the Jukes-Cantor distances (d|ij|) can be used. Note that the DNA sequences belonging to different populations should not be mixed up.  That is, all sequences from each population should be grouped together.  For example, if there are three sequences from population 1  and two sequences from population 2, the sequences from population 1 should be numbered 1, 2, and 3, whereas the sequences from population 2 be numbered 4 and 5. The distance values between sequences should be entered in the following order: D|12|, D|13|, ..., D|1n|, D|23|, ..., D|2n|, ..., D|n-1,n|, where n is the number of sequences. All distance values should be placed in one column.  The following is an example file (all the data used in the input-file are writen in boldface in the following example (TEST.DAT)).

ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³                                                                                                                                                                                                          ³
³  D|12| = .0500                                                                                                                                                                                         ³
³  D|13| = .0507                                                                                                                                                                                         ³
³  D|14| = .0768                                                                                                                                                                                         ³
³  D|15| = .0486                                                                                                                                                                                         ³
³  D|23| = .0380                                                                                                                                                                                         ³
³  D|24| = .0912                                                                                                                                                                                         ³
³  D|25| = .1433                                                                                                                                                                                         ³
³  D|34| = .0253                                                                                                                                                                                         ³
³  D|35| = .0496                                                                                                                                                                                         ³
³  D|45| = .0621                                                                                                                                                                                         ³
³                                                                                                                                                                                                          ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ


In this example, the total number of sequences is five, and sequences 1, 2, and 3 belong to population 1, whereas sequences 4 and 5 belong to population 2.  
     After having executed the program by typing SEND, you must answer all the questions appearing on the screen.      
First, specify the type of data used. There are two options in this program: (1) nucleotide sequences, and (2) restriction-site data.  Type 1 for DNA sequence data or 2 for restriction-site data. The computer will than ask you to provide information on the number of sequences and the number of populations. 
     There are two different kinds of tree-making methods that can be used in this program. Type 1 for UPGMA , or 2 for neighbor-joining method. If both methods are to be used, type 3. The explanation for using these two methods is given in Nei and Jin's paper. 
     The computer will then ask you about which population each sequences belongs to. On the screen you will see "Population 1 : 1 to  ? " . If population 1 includes sequences 1 to 3, then type 3 to indicate that sequence 3 is the last one from population 1. This process should be repeated until all the sequences are classified. 

     The same questions will be asked for restriction-site data. 
     In the case of DNA sequence data, the computer will ask you to enter the total number of nucleotides examined.  You must type 1 if the data are the proportions of nucleotide differences (p|ij|) or 2 if they are the Jukes-Cantor distances (d|ij|).
    Finally the computer will ask you to provide the name of your input-file which you have already prepared. 
    The following is an example to show how to enter DNA sequence data.

ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³                                                                                                                                                                                                                       ³
³    
Choose type of data to be analyzed:                                                                                                                                         ³
³        
  (1) nucleotide sequences;                                                                                                                                                       ³
³          
(2) restriction-site data                                                                                                                                                              ³
³ 1                                                                                                                                                                                                                    ³
³    
Please enter the number of SEQUENCES                                                                                                                             ³
³ 5                                                                                                                                                                                                                    ³
³    
Please enter the number of POPULATIONS                                                                                                                       ³
³ 2                                                                                                                                                                                                                    ³
³  
  Choose type of tree-making method:                                                                                                                                       ³
³         
 (1) UPGMA;                                                                                                                                                                              ³
³          
(2) Neighbor-joining method;                                                                                                                                                ³
³          
(3) Both.                                                                                                                                                                                         ³
³ 1                                                                                                                                                                                                                    ³
³    
Population 1 : 1 to  ?                                                                                                                                                                      ³
³ 3                                                                                                                                                                                                                    ³
³   
 Population 2 : 4 to  ?                                                                                                                                                                      ³
³ 5                                                                                                                                                                                                                   ³
³   
 Please enter the number of NUCLEOTIDES considered                                                                                               ³
³ 42                                                                                                                                                                                                                 ³
³    
Choose type of distance to be analyzed                                                                                                                                 ³
³        
  (1) Proportion of nucleotide differences;                                                                                             ³
³        (2) J-C distance.                                                                                                                                                                        ³
³ 1                                                                                                                                                                                                                   ³
³    
Please supply the name of input file                                                                                                                                           ³
³
test.dat                                                                                                                                                                                                      ³
³                                                                                                                                                                                                                       ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
 

Restriction-site Data
     First make a file of the numbers of restriction sites (m|i|) for all sequences and the numbers of shared sites (m|ij|) between sequences i and j for all pairwise comparisons.  Information on m|i| should be given on diagonal of the matrix (see the example below). Note that one may need more than one files when there are several different calsses of restriction enzymes used (Nei and Jin's paper). Make one file for each class of restriction enzymes. Since a free format is used in READ statement, the data can be entered in any format as long as they are in the right order. In the case of restriction-site data, it is necessary to give information on the number of recognition sites (r) for each class of enzymes. The following is one example of the input-file  for restriction-site data. 
.m:1

ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³                                                                                                                                                                                                                       ³
³  
The number of recognition sites = 6.                                                                                                                                           ³
³                                                                                                                                                                                                                       ³
³                
1    2    3    4    5    6    7                                                                                                                                                   ³
³                                                                                                                                                                                                                       ³
³      
1         38.  33.  32.  34.  32.  29.  31.                                                                                                                                   ³
³      
2              35.  34.  32.  31.  30.  29.                                                                                                                                     ³
³      
3                   35.  33.  32.  30.  29.                                                                                                                                        ³
³      
4                        36.  34.  29.  31.                                                                                                                                          ³
³      
5                             35.  28.  29.                                                                                                                                             ³
³      
6                                  37.  33.                                                                                                                                                ³
³      
7                                       36.                                                                                                                                                  ³
³                                                                                                                                                                                                                       ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ


     The above example is taken from table 1 of Nei and Jin's paper. The values on and above the diagonal are m|i|'s and m|ij|'s for 13 six-base enzymes. Sequences 1, 2, 3, 4, and 5 are from common chimpanzees, and the
others are from pygmy chimpanzees.  
     After having executed the program, the same five questions as those for the case of DNA sequence data will be asked. The computer will also ask you to enter the number of classes of restriction enzymes and the
names of input-file one by one. 
     Here is an example for the restriction-site data.

ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³                                                                                                                                                                                                                       ³
³    
Choose type of data to be analysed:                                                                                                                                         ³
³          
(1) nucleotide sequences;                                                                                                                                                        ³
³          
(2) restriction sites.                                                                                                                                                                     ³
³ 2                                                                                                                                                                                                                    ³
³    
Please enter the number of SEQUENCES                                                                                                                             ³
³ 7                                                                                                                                                                                                                    ³
³    
Please enter the number of POPULATIONS                                                                                                                       ³
³ 2                                                                                                                                                                                                                    ³
³    
Choose type of tree-making method:                                                                                                                                       ³
³          
(1) UPGMA;                                                                                                                                                                               ³
³          
(2) Neighbor-joining method;                                                                                                                                                ³
³          
(3) Both.                                                                                                                                                                                        ³
³ 1                                                                                                                                                                                                                   ³
³    
Population 1: 1 to ?                                                                                                                                                                         ³
³ 5                                                                                                                                                                                                                   ³
³    
Population 2: 6 to ?                                                                                                                                                                        ³
³ 7                                                                                                                                                                                                                   ³
³    
Please enter the number of classes of restriction enzymes                                                                                               ³
³ 1                                                                                                                                                                                                                   ³
³    
Please provide the input-file name of class 1                                                                                                                         ³
³
chimp.6                                                                                                                                                                                                   ³
³                                                                                                                                                                                                                       ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

 
How to read output file
     The output file, SEND.OUT, consists of three parts. The first part is the matrix of d|ij| (or p|ij|) or the matrices of m|i| and m|ij|. The second part is the information on the UPGMA tree and/or the neighbor-joining tree with the matrix of the patristic distances and the standardized discrepancy (see Nei and Jin's paper) of the tree(s).  The third part is the nucleotide diversity (ã) and divergence (d) and their standard errors. 

 Home | CV | Databases | IMEG Seminars | Journals
 
MEP-online | People | Publications | SoftwareText only version


| Department of Biology  |  Eberly College of Science |
 
| Institute of Molecular Evolutionary Genetics | Penn State |
©2002 The Pennsylvania State University
This page was last updated 6/11/09 by M. Ricardo.