Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

Software - Readme File


Restriction Data and Phylogenetic Analysis

(c) Copyright 1994 by Tatsuya Ota and the Pennsylvania State University.  Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed.  RESTDATA is distributed free of charge by:
Institute of Molecular Evolutionary Genetics
The Pennsylvania State University
328 Mueller Laboratory
University Park, PA 16802, USA

Tel: 814-863-7334
Fax: 814-863-7336
E-mail: imeg@psuvm.psu.edu
RESTDATA is designed to compute the following:

For restriction site data:
       The numbers of nucleotide substitutions per site for pairs of DNA sequences and their standard errors (Nei and Tajima 1983).

For restriction fragment data:
       The numbers of nucleotide substitutions per site for pairs of DNA sequences (Nei and Li 1979).  If two or more classes of restriction enzymes are used, the average number of nucleotide substitutions is estimated by the following weighted average of estimates from individual enzymes.
                 mk dk
            d = ---------,
       where mk and dk is the average number of restriction fragments and the number of nucleotide substitutions estimated for the k-th class of restriction enzymes, respectively.
See Nei (1987, Chapter 5 pp96-110) and the references therein for details of the methods Used in this software.

       The software also constructs phylogenetic trees (dendrograms) by using the neighbor-joining (NJ) method (Saitou and Nei 1987) and the unweighted pair group method with arithmetic mean (UPGMA) (Sneath and Sokal 1973) from estimates of the numbers of nucleotide substitutions.  Bootstrap tests (Efron 1982, Felsenstein 1985) for these trees can be performed by resampling restriction enzymes.  The tests are available only for restriction site data, when data from ten or more restriction enzymes are listed separately in the input file.  The test in this program is conducted as described in MEGA (Kumar et al. 1993) rather than in Felsenstein (1985).  Please note that the test is not reliable if only a small number of restriction enzymes are used.
       This software contains two programs: (1) RESTD.EXE and (2) TREEVIEW.EXE. The first program is written in C language by T.Ota, whereas the second program is written in the modula-2 language by Koichiro Tamura. 
Getting started
       First make sure that the diskette you have received contains the following files.


       The first two files contain the executable programs.  The last two files are data files (examples).
       To install RESTDATA on your computer's hard disk drive ("C" drive given here, for example), you should create a directory where the two executable programs will be present.  To do this, type the following

                   C:\MD RESTDATA       (Enter)

       To copy the RESTDATA programs and files onto your hard disk drive, insert the floppy disk containing the programs into your floppy drive ("A" drive given here, for example). Then, enter the following command

             C:\ copy A:*.* C:\RESTDATA        (Enter)

       If your computer does not have a hard disk, you can use the distribution diskette in your "A" drive.

      Arrange the data as shown in the example data files, i.e., TEST1.DAT for restriction site data and TEST2.DAT for restriction fragment data.  A section bounded by "/*" and "*/" is for comments, and it is not used in the program.  The line with "?T = " is for specifying the type of data, i.e., restriction site data or restriction fragment data.  Use "(S)" for restriction site data or "(F)" for restriction fragment data.  The line with "#S" specifies the number of DNA sequences or the list of sequence titles.  If you want to specify the sequences by numerical numbers, you can write as "#S = 13" or "#Sequences = 13".  If you want to list sequence titles, you can write as "#S = (one,two,three,....,thirteen)".  "(" and ")" indicate the beginning and the end of the list, respectively.  Do not use ")" as a part of sequence title.  The symbol "," is used to separate the sequence titles.  Once you list the sequence titles, the program count the number of sequences from the list.  Do not specify the number of sequences in this case.  After you make the above specification, type data for each enzyme separately.  Data for an enzyme consist of four sections: 1) a line (comment line) with "@R" at the beginning of each restriction enzyme.  2) the number of recognition sites (r) for the restriction enzyme on a line with "#R =".  (See Nei 1987 for the definition of r.) 3) The list of numbers of restriction sites (fragments) for the sequences.  4) The list of numbers of shared restriction sites (fragments) between two sequences in the form of upper triangular matrix.

Note for the restriction site data:
       One may list the restriction site data for each restriction enzyme or for each class of restriction enzymes (the group of restriction enzymes with the same r value).  However, you should list the data for each restriction enzyme separately, if you want to conduct bootstrap test.

Note for the restriction fragment data:
       Please combine the data of restriction enzymes with the same r value.  If the data for each restriction enzymes are listed separately, estimation error would become large.

       If you are satisfied with the correctness of data file, then start the program RESTDATA by typing RESTD with a proper option:

              C:\RESTDATA\RESTD -fTEST1.DAT -r1000 -s516 -tn -tu

       The options used here are as follows:
              TEST1.DAT as input file
                     -fTEST1.DAT:         Note that no space is allowed between -f and the
                                                     name of input file
              Bootstrap tests
                     -r1000:              The number after -r is the number of bootstrap
                                          replications, and any number can be used.  Note
                                          that no space is allowed between -r and the
                                          number of replications.  If you do not include -r
                                          option, the bootstrap test will not be performed.
                                          Bootstrapping will be done with enzymes as units.

              Seed number for a random number generator
                     -s516:               The number after -s is the seed number for a
                                          random number generator.  Any integer except
                                          zero can be used.  Any positive integer is
                                          converted to a negative integer.
              Construction of NJ trees and UPGMA trees
                     -tn: NJ tree
                     -tu: UPGMA tree

       If you do not need any of the options, -r1000, -s516, -tn, and -tu, delete it.  For example, if you want to obtain a NJ tree without a bootstrap test, you should type as follows:

              C:\RESTDATA\RESTD -fTEST1.DAT -tn
       The program will run, check the input file, and report the data specifications in the input file.  If the numbers reported are correct, press the "Y" and "return" keys to proceed the calculation.  Otherwise, press the "N" and "return" keys to stop the program to check the input file or to change the input file to another. 
       Once the program stops, you can get your results by opening the file "RESTD.DST" and the results will appear.  (All new results are appended to previous results, if a file called "RESTD.DST" has existed.)
       While running the RESTD program, other output files with "TRE" extension may appear.  These files are for phylogenetic trees:

              UPGMA.TRE: UPGMA tree
              NJ.TRE: Neighbor-joining tree

If a file with the same name exists before running the program, the file will be overwritten by the new result.  If you want to keep the old file, you need to rename the previous file.

Tree editing
       You can see the tree structure on screen by typing TREEVIEW as follows:

                          C:\RESTDATA\TREEVIEW           (Enter)

       The screen will show all the files including TRE files.  Bring the cursor to one of the TRE file and press "Enter".  On the screen you will find a tree with proper branch lengths and population numbers (names) at the end of each branch.  You can make branch swapping according to your desire.  To do this, press "Esc", and then a box will appear on the upper left hand corner of the screen with a number of options.  Bring the cursor on "Swap
Branches" and press "Enter".  An arrow symbol (red arrow symbol in the case of colored monitor) will appear on a nodal point and this arrow symbol can be moved up and down by pressing the up and down arrow keys, respectively.  By placing the arrow symbol on any nodal point and pressing the "Enter" key, the branches can be swapped at the nodal point. By placing the arrow symbol at different nodal points and pressing the "Enter" key, you can rearrange the populations of the tree according to your desire.  After being satisfied with the design of the tree, press the "Esc" key to remove the arrow symbol.  To print your desired tree in graphic mode, press the "Esc" key again and bring the cursor to the "Draw with Graphics" and press "Enter".  The tree in graphic mode will appear on the screen.  To print it, just press the letter "P" and wait.  If your computer is connected with printer, you will get a printout of the tree.  To get out of the program, press the "Esc" key twice and bring
the cursor to "Exit Program" and press "Enter".  Alternatively, you can get the tree by printing the file called TREEVIEW.OUT, which can be obtained by a option "Save tree" while doing with TREEVIEW program. 
       Note for UPGMA.  Printing the TREEVIEW.OUT file, the branch lengths for all the populations under study may not be found equal, though the branch length from the root should be the same for all populations with a UPGMA tree.  To avoid this problem, it is suggested that the graphic mode for printing the tree be used.
       Note for NJ.  Since NJ is an unrooted tree, you can re-root the tree by pressing the "Esc" key.  As before, a box will appear on the upper left hand corner of the screen with a number of options.  Bring the cursor on "Move Root" and press "Enter".  A diamond symbol (green diamond symbol in the case of colored monitor) will appear on any interior branch of the tree.  Then, by pressing the up and down arrow keys, you can put the diamond symbol to any interior branch where you want to re-root the tree and press the "Enter" key. A tree will be formed with your assigned root.  To get rid of the diamond symbol, press the "Esc" key. Note for the bootstrap test.  The numbers shown on the tree are the bootstrap values.  If you want to remove them, you can do so by pressing the "Esc" key.  As before, a box will appear on the upper left hand corner of the screen with a number of options.  Bring the cursor on "Hide bootstrap number" and press "Enter".

     Efron, B. (1982) The jackknife, the bootstrap, and other resampling plans.  CBMS-NSF. Regional conference series in applied mathematics.  No 38.  Society for industrial and applied mathematics.  Philadelphia, PA.

     Felsenstein, J. (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791.

     Kumar, S., K. Tamura, and M. Nei (1993)  MEGA: Molecular Evolutionary Genetics Analysis, version 1.01.  The Pennsylvania State University, University Park, PA 16802.

     Nei, M. (1987) Molecular Evolutionary Genetics.  New York: Columbia University.

     Nei, M. and W.-H. Li (1979) Mathematical model for studying genetic variation in terms of restriction endonucleases.  Proc. Natl. Acad. Sci. USA 76:5269-5273.

     Nei, M. and F. Tajima (1983) Maximum likelihood estimation of the number of nucleotide substitutions for restriction sites data.  Genetics 105:207-216.
     Saitou, N. and M. Nei (1987) The neighbor-joining method: A new method for reconstructing phylogenetic tree. Mol. Biol. Evol. 4:406-425.

     Sneath, P.H.A. and R.R. Sokal (1973) Numerical Taxonomy. Freeman, San Francisco.

 Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

| Department of Biology  |  Eberly College of Science |
| Institute of Molecular Evolutionary Genetics | Penn State |
2002 The Pennsylvania State University
This page was last updated 11/18/03 by K. Seasholtz.