AltExtron - gb147 - gene data

AltExtron - gb147 - gene data

The gene data, for each organism, is provided in three (compressed) files: 
	
    a nucleotide sequence file
    a protein sequence file
    an information file
	
Not all of the genes have protein sequences given.
The information file format is detailed below.
See other notes at bottom of this page.
Homo sapiens ae_gb147_human_genes.fna.gz ae_gb147_human_genes.faa.gz ae_gb147_human_genes.info.gz
Mus musculus ae_gb147_mouse_genes.fna.gz ae_gb147_mouse_genes.faa.gz ae_gb147_mouse_genes.info.gz
Rattus norvegicus ae_gb147_rat_genes.fna.gz ae_gb147_rat_genes.faa.gz ae_gb147_rat_genes.info.gz
Drosophila melanogaster ae_gb147_dros_genes.fna.gz ae_gb147_dros_genes.faa.gz ae_gb147_dros_genes.info.gz
Caenorhabditis elegans ae_gb147_elegan_genes.fna.gz ae_gb147_elegan_genes.faa.gz ae_gb147_elegan_genes.info.gz
Arabidopsis thaliana ae_gb147_arab_genes.fna.gz ae_gb147_arab_genes.faa.gz ae_gb147_arab_genes.info.gz
Danio rerio ae_gb147_zfish_genes.fna.gz ae_gb147_zfish_genes.faa.gz ae_gb147_zfish_genes.info.gz
Gallus gallus ae_gb147_chicken_genes.fna.gz ae_gb147_chicken_genes.faa.gz ae_gb147_chicken_genes.info.gz
Xenopus laevis ae_gb147_frog_genes.fna.gz ae_gb147_frog_genes.faa.gz ae_gb147_frog_genes.info.gz
Bos taurus ae_gb147_cow_genes.fna.gz ae_gb147_cow_genes.faa.gz ae_gb147_cow_genes.info.gz
Anopheles gambiae ae_gb147_mosquito_genes.fna.gz ae_gb147_mosquito_genes.faa.gz ae_gb147_mosquito_genes.info.gz

The information file contains entries of the type:

>IDB1078032
LOCUS       AB005803               15499 bp    DNA     linear   PRI 20-MAR-1999
DEFINITION  Homo sapiens DNA for histidine-rich glycoprotein, complete cds.
ACC:    AB005803.1      (1801..14648)
GI:     2280513
ORG:    Homo sapiens
FET:    CDS             join(2301..2483,5205..5321,6208..6298,7892..8058,9055..9135,11357..11458,13312..14148)
ELMS:   uu:1..500,e1:501..683,i1:684..3404,e2:3405..3521,i2:3522..4407,e3:4408..4498,i3:4499..6091,e4:6092..6258,i4:6259..7254,e5:7255..7335,i5:7336..9556,e6:9557..9658,i6:9659..11511,e7:11512..12348,ud:12349..12848,
AFETS:
PRODUCT:        histidine-rich glycoprotein,
EVIDENCE:       not available
PROTEIN_ID:     BAA21613.1,
END


where the fields have the following meanings:

>       altExtron gene identifier, always of the form IDB#.
LOCUS & DEFINITION as from the GenBank flat files
ACC:    The GenBank accession number, version, and the position of 
        the gene (not including complement information - see FET). 
GI:     The GenBank Gene Index identifier
ORG:    Organism (always Homo sapiens in this data set)
FET:    The feature in the GenBank annotation used to define the gene.
ELMS:   The positions of the exons (e#:) and introns (i#:) in the local 
        sequence, including potentially up to 500 nts of unknown upstream
        (uu:) and downstream (ud:) sequence - based on FET.
AFETS:  Any other CDS or mRNA feature descriptions in the GenBank annotation.
PRODUCT:        The gene product if and as parsed from GenBank.
EVIDENCE:       "experimental", "not_experimental" or "unknown"
PROTEIN_ID:     The protein ID (may be more that one)
END             A useful tag for parsing purposes.


NOTES

* The genes provided here are that subset of genes extracted from GenBank
  that end up with one or more transcript confirmed introns/exons.