Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

Software - Read me File


Modified Suzuki and Gojobori's method for detecting positive and negative selection at individual codon sites

(c) Copyright July 2000 by Chen Su and the Pennsylvania State University. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. It is distributed free of charge by
Chen Su
Institute of Molecular Evolutionary Genetics
and Department of Biology
322 Mueller Laboratory
The Pennsylvania State University
University Park, PA 16802, USA

Email: cxs513@yahoo.com
Suggested Citation
Suzuki, Y., and T. Gojobori. 1999. A method for detecting positive selection at single amino acid sites. Mol Biol Evol 16: 1315-1328.
Su, C. 2000. SGI: Modified Suzuki and Gojobori's method for detecting positive and negative selection at individual codon sites. The Pennsylvania State University, University Park, PA, USA.
Suzuki and Gojobori's (1999) method was designed for detecting positive and negative selection at single codon sites. According to their method, for each codon site, the probabilities of nonsynonymous changes and synonymous changes are computed, and the total numbers of changes, the numbers of synonymous changes and the numbers of nonsynonymous changes are counted. Under the assumption of a binomial distribution, if no selection is involved, the numbers of nonsynonymous changes should equal to the expected values. If for a certain site, the actual number of nonsynonymous changes is significantly higher than the expected values, positive selection is assigned. If the actual synonymous changes is higher, purifying selection is assigned.

This program implements the above method with two minor modifications. First, in the above method, the tree topology estimated from the synonymous substitutions is used for estimating the branch lengths, the ancestral states, and so on. However, I found that the tree topology estimated this way is not reliable. For example, when the number of sequences is large and the number of sites is small, the uncorrected p-distance often gives better results (see Nei and Kumar 2000). In this program, you can simply input a tree topology and get the results. If you are not sure if the tree topology you have is correct, you can try different ones and compare the results.

Second, the above method uses a (unweighted) parsimony method to estimate the ancestral states. If different pathways are possible, they are treated as having an equal probability of occurrence. An alternative way is to estimate the ancestral states by using Zhang and Nei's (1997) distance method, since it is simpler and gives equally good results.        

This program is written in Perl and for the ancestor reconstruction part, it makes system calls to Zhang's program anc-gene.
(For detail of the anc-gene program, use the upper link of 'Ancestral Sequences'.) This program has been tested on IBM PC compatible computers.

First make sure that you have the following files.

SGI.pl (perl source code) 
  nsyn.exe (executable file)
syncha.exe (executable file)
VH.dat (sample input)
VH.xls (sample excel file)
VH.dat.out (sample output)
SGI.txt (this file)

To install SGI on your computer's hard disk drive ("C" drive given here, for example), you should create a directory where the files of this package will be present. To do this, type the following c:\md SGI (Enter) 

To copy the files onto your hard disk drive, insert the floppy disk containing the programs into your floppy drive ("A" drive given here, for example). Then, enter the following command c:\copy a:*.* c:\SGI\*.* (Enter)

Because perl compiles during the run-time, you should have a perl compiler on your local drive. The version I use is 5.0. But I believe an early version would be fine too. If you do not have a perl compiler installed, you can download it for free at http://www.perl.com/pub.

To run the program,
1. open the file SGI.pl using a text editor such as Notepad, and replace the name of the input file (VH.dat) with your file;
2. save SGI.pl and close it; 
3. type c:\SGI\perl SGI.pl;
4. follow the instructions on the screen.
Output file
The output file will have an extension of .out with the name of your input. For example, if your input file is named myseq.dat, the output file will be myseq.dat.out.

You can then copy the output to an excel spreadsheet and calculate the probabilities. (See VH.xls for an example). If, for a certain site, the value of 1-p (the last column in the example file) is higher than 0.95, than this site is under positive or purifying selection, depending on whether you are computing nonsynonymous changes or synonymous changes.

 Home | CV | Databases | IMEG Seminars | Journals
MEP-online | People | Publications | SoftwareText only version

| Department of Biology  |  Eberly College of Science |
| Institute of Molecular Evolutionary Genetics | Penn State |
2002 The Pennsylvania State University
This page was last updated 6/11/09 by M. Ricardo.