Let's say you have identified a putative expansin gene sequence. How do you
know it is a new expansin, and if new, how do you assign it a gene number that
will not conflict with existing or future expansin gene designation?
Is it a new expansin? Here are the steps:
 | Run BLASTP using your protein sequence as the query and the nonredundant
sequence database at NCBI or its mirrors. If you find an identical or nearly
identical sequence in the same species and with an expansin name, then your
search is done. It has already been identified and annotated. As long as the
name conforms to the updated nomenclature
rules, you must use this name. If the name does not conform, then open a
dialog with the author (depositor) to get the name updated to current
nomenclature rules. If necessary, contact
D Cosgrove for advice. |
 | If your sequence is from a species with a lot of public genomic data, it
may already be named even though the sequence and gene name does not show up
in GenBank. Species that have already been deeply examined include
Arabidopsis and rice, and these sequences should be in GenBank. Other
species with nearly complete genomes include Populus,
Physcomitrella, and Selaginella; these sequence are not yet
in GenBank, but are found in specialized databases that must be queried
individually. Other species with extensive genomic data include maize,
tomato, wheat, soybean, Medicago....the list is continually growing.
There is a high likelihood that the expansins in these species have already
been identified and named. Publications may be in press already. Check
the expansin gene tables to see if your
sequence is already publically registered. Check with
D Cosgrove to reserve a name for your
sequence, if you think it is new. |
 | If your sequence is from a species with little genomic data and if your
GenBank search comes up empty, then probably it is a new sequence and should
be named following the standard nomenclature
rules. It is best to publish the sequence in GenBank as soon as possible, to
prevent others from using the same name for a different sequence, or
assigning a different name to the identical sequence. |
 | Although in most cases it is straightfoward to determine whether a gene
sequence is already represented in the public databases, sometimes it
becomes more complicated, eg. for highly similar but not identical sequences
when different cultivars of the same species are involved or when the
species is polyploid. Then, careful analysis is needed to avoid naming
collisions or confusions, such as assigning multiple gene names to different
alleles of the same gene. |
|