Home | CV | Databases | IMEG Seminars | Journals
 
MEP-online | People | Publications | SoftwareText only version



Software - Read me File

 

 
RESTSITE:
GENERAL DOCUMENTATION FOR RESTSITE PROGRAMS v1.2

Programs for analyzing restriction site or fragment data. 
Copyright (c) 1990,1991 by Joyce C. Miller.  All Rights Reserved.  This package contains a series of programs  designed to analyze restriction site or restriction fragment data for use in molecular systematics studies.
 
Disk Contents
     These disks contain executable versions of the programs, documentation, sample data, and tree-making programs -- in short, everything you need to start analyzing data right now.

     Your disks should contain the following files:

 

"DOC": This directory contains the documentation for the distance programs (but  the tree-making programs):
README.1ST -- a brief introduction to the programs.
GENERAL.DOC -- general documentation; you are reading it right now.
READTEXT.DOC -- documentation for the program that reads text data files into binary data files.
RESTSITE.DOC -- documentation for the program that pools the data in the binary data files and calculates the distances between OTUs.
WRITETXT.DOC -- documentation for the program that reads binary data files into text data files.
REPORT.DOC -- documentation for the program that reads the pooled data files created by RESTSITE and writes their contents to text files.
COMBINE.DOC -- documentation for the program that combines two pooled data files into one.
REMDATA.DOC -- documentation for the program that removes the data found in one OTU from another OTU.
REMFILE.DOC  -- documentation for the program that removes an OTU from the data set.
01_TO_RS.DOC -- documentation  for the program that converts data from 0/1 matrices to RESTSITE format.
RS_TO_01.DOC -- documentation for the program that converts data from RESTSITE to 0/1 matrix format.
CALCD.DOC  -- documentation for the program that will calculate F-hat and d-hat via equations 5.53 - 5.55 of Nei (1987) from data entered by the user.
UPGMA.DOC -- documentation for the program that produces Unweighted Pair-Group Method trees from distance matrices.
NJTREE.DOC -- documentation for the program that produces Neighbor-Joining trees from distance matrices.
   
"PROGRAMS" : This directory contains the compiled versions of the programs:
   
READTEXT.EXE -- a program that  reads text data files into binary data files.
RESTSITE.EXE -- a program that pools the data in the binary data files and calculates the distances between OTUs.
WRITETXT.EXE -- a program that reads binary data files into text data files.
REPORT.EXE -- a  program that reads the pooled data files created by RESTSITE and writes their contents to text files.
COMBINE.EXE -- a program that combines two pooled data files into one.
REMDATA.EXE -- a  program that removes the data found in one OTU from another OTU.
REMFILE.EXE -- a program that removes an OTU from the data set. 01_TO_RS.EXE -- a program that converts data from 0/1 matrices to RESTSITE format.
RS_TO_01.EXE -- a program that converts data from RESTSITE to 0/1 matrix format.
CALCD.EXE -- a program that will calculate F-hat and d-hat via equations 5.53  - 5.55 of Nei  (1987) from data entered by the user.
UPGMA.EXE -- a UPGMA tree-making program.
NJTREE.EXE -- a Neighbor-Joining tree-making program.
   
"SAMPLE" : This directory has sample data and its output, so that you can check to be sure the program is running correctly on your computer:
   
SAMPLE1.TXT  -- a file with sample data in text form.
SAMPLE2.TXT -- another file with sample data in text form.
SAMPLE.LST -- a sample "list" file.
SAMPLE.OUT -- the output of the sample data after analysis.
   
"SOURCE" : This directory contains the source code and header files for the distance programs (but not the tree-making programs), and some general advice to computer programmers:
   
PROGRAMR.DOC -- information for anyone who wants to alter these programs or re-compile them on another type of computer.
READTEXT.C  -- source code for the program that reads text data files into binary data files.
WRITETXT.C -- source code for the program that reads binary data files into text data files.
RESTSITE.C -- source code for the  program that pools the data in the binary data files and  calculates the distances between OTUs.
REPORT.C   -- source code for the program that reads the pooled data files created by RESTSITE and writes their contents to text files.
COMBINE.C -- source code for the program that combines two pooled data files into one.
REMDATA.C -- source code for the program that removes the data found in one OTU from another OTU.
REMFILE.C -- source code for the program that removes an OTU from the data set.
01_TO_RS.C -- source code for the program that converts data from 0/1 matrices to RESTSITE format.
RS_TO_01.C -- source code for the program that converts data from RESTSITE to 0/1 matrix format.
CALCD.C   -- source code for the program that will calculate F-hat and d-hat via equations 5.53 - 5.55 of Nei (1987) from data entered by the user.
RSTYPES.H -- header file containing the type definitions, symbolic constants, and upper limits for the programs on these disks.
RSFUNCS.H -- header file containing some commonly-used functions.
RSERRORS.H -- file containing the error messages used by the programs on these disks.
UPGMA.C    -- source code for the UPGMA.EXE program.
NJTREE.C  -- source code for the NJTREE.EXE program.

               I  have included this last directory in case some enterprising soul wishes to re-compile and run these programs on a different kind of computer, or a non-MS-DOS operating system.

 
Getting Started
     First things first.  Make a copy of these disks and keep the originals in a safe place. The "SAMPLE" directory includes two sample text data files ("SAMPLE1.TXT" and "SAMPLE2.TXT"), and an output file called "SAMPLE.OUT".  Please take time to analyze the data in the sample data files and compare the results to "SAMPLE.OUT" to be sure the programs are working properly on your computer. Contrary to popular belief,  not all IBM "clones" are in fact, clones.  Some do their math differently, and by analyzing the sample data, you'll find out if everything  is running correctly.  Just follow the three steps below.  The data set in  "SAMPLE1.TXT" and "SAMPLE2.TXT" contains real restriction site data from Ferris, Wilson, and Brown (1981), and Ferris, et  al., (1981).  Mitochondrial DNA from ten individuals of Pan troglodytes (Common chimpanzee) and three of P.  paniscus (Pygmy chimp) was cut with thirteen 6-base pair enzymes, two 5.33-base pair enzymes, and one 4-base enzyme. Restriction maps were made, and the sites are listed in the two files.  To analyze this data, follow these steps:

          Step 1:  The data must be converted into binary (machine-readable) form.  Type:

                                  READTEXT SAMPLE1.TXT TEST1.DB
                     The program READTEXT will translate the data in "SAMPLE1.TXT" from text form into binary form, and write it to the file "TEST1.DB".  Repeat this step for "SAMPLE2.TXT":

                                  READTEXT SAMPLE2.TXT TEST2.DB

          Step 2:  Now that the data is in machine-readable form, it can be analyzed.  Type:

                                RESTSITE SAMPLE.LST TEST.OUT 1SL

                     The program RESTSITE will analyze the data files listed in "SAMPLE.LST" ("TEST1.DB" and "TEST2.DB") and will print the results to file "TEST.OUT".  Specifically, it will pool the data on KEY1 (by species, in this case), and calculate the genetic distances between the different taxa that were found under KEY1.

          Step 3:  Print "SAMPLE.OUT" and "TEST.OUT", and compare the two files to be sure that the distances calculated are the same.

          For more specific information on how the programs "READTEXT" and "RESTSITE" work, please refer to the documentation (".DOC") file for each program.
 
Power Users
     The programs are presently configured to handle up to 200 different OTUs, and up to 500 probe/enzyme combinations.  Other limits of interest are the maximum of 50 restriction fragments or sites per individual per probe/enzyme combination, and the maximum of 50 different restriction sites or fragments found across all individuals in one OTU per probe/enzyme combination. If your data set approaches or exceeds these limits, please contact me.   These are minor problems, and can be fixed in a matter of minutes by altering the appropriate symbolic constants in the file "RSTYPES.H".  I wrote these programs to handle large data sets, and to be expandable to even larger data sets, so let me know if you need a "bigger" program.
 
Distribution of these programs
     I distribute these programs free of charge.  If you would like a copy, simply send me one of the following  sets of diskettes (formatted please):
               -- four 5-1/4" double-sided, double-density diskettes,
               -- two 5-1/4" high-density diskettes (1.2Mb),
               -- two 3-1/2" diskettes (1.44Mb)
     Please tell me a little bit about your project, and what kind of data you want to push through these programs.   This information will help me in improving and refining the programs in the future.  And of course, I am open to comments and suggestions on ways to make the programs more useful.   Something that I am working on now is a "haplotype" program that will go through the data files and determine how many different restriction pattern profiles there are in each OTU, and how many individuals are in each (the "nucleon diversity" or "haplotype diversity" of Nei & Tajima (1981) Genetics 97:145-163).  This program is a ways off, however.
     Feel free to give out copies of these disks for temporary use.  Of course, you may not sell the programs or the header files or any parts of them.  If you do give away a copy, have the recipient send me their address and ask me for an "official" copy.  Most public-domain software gets altered as it is passed around, and I'd like everyone  who needs it to have an unadulterated copy.

                                Dr. Joyce C. Miller  
                                Whitehead Institute  
                                9 Cambridge Center   
                                Cambridge, MA 02142  
                                Tel: 617-253-8582    
 
References
    
Ferris, S. D., A. C. Wilson, and W. M. Brown (1981) Evolutionary tree for apes and humans based on cleavage maps of mitochondrial DNA.  Proc. Natl. Acad. Sci. USA 78:2432-2436.

     Ferris, S. D., W. M. Brown, W. S. Davidson, and A. C. Wilson (1981) Extensive polymorphism in the mitochondrial DNA of apes.  Proc. Natl. Acad. Sci. USA 78:6319-6323.

 Home | CV | Databases | IMEG Seminars | Journals
 
MEP-online | People | Publications | SoftwareText only version


| Department of Biology  |  Eberly College of Science |
 
| Institute of Molecular Evolutionary Genetics | Penn State |
2002 The Pennsylvania State University
This page was last updated 6/11/09 by M. Ricardo.