The start up gene data

1. The gene data set (extracted from Genbank 117) is provided in three major
(compressed) files:
	
    human_genes.eds.info.Z  	- containing information about each gene
    human_genes.eds.fna.Z   	- containing the gene sequences
    human_genes.eds.faa.Z   	- containing the protein sequences
	
Not all of the genes have protein sequences given.

The information file contains entries of the type:

>IDB60265
ACC:    X87344  94885..100219
GI:     1054740
ORG:    Homo sapiens
TAXA:   Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
FET:    mRNA            complement(join(94885..95043,95943..96084,96314..96443,97054..97185,98244..98311,100160..100219))
ELMS:   e1:1..60,i1:61..1908,e2:1909..1976,i2:1977..3034,e3:3035..3166,i3:3167..3776,e4:3777..3906,i4:3907..4135,e5:4136..4277,i5:4278..5176,e6:5177..5335,
AFETS:  CDS             complement(join(94916..95043,95943..96084,96314..96443,97054..97185,98244..98311,100160..100219))
PRODUCT:        not available
EVIDENCE:       unknown
PROTEIN_ID:     CAA60784.1
PROT:           given
END


where the fields have the following meanings:

>	Our local identifier.
ACC:    The GenBank accession number and the position of the gene (not including
        complement information). 
GI:     The GenBank Gene Index identifier
ORG:    Organism (always Homo sapiens in this data set)
TAXA:   Taxa.
FET:    The feature in the GenBank annotation used to define the gene.
ELMS:   The positions of the exons and introns in the local sequence
AFETS:  Any other CDS or mRNA feature descriptions in the GenBank annotation.
PRODUCT:        The gene product if parsed.
EVIDENCE:       "experimental", "not_experimental" or "unknown"
PROTEIN_ID:     The protein ID
PROT:           "given" means the protein sequence is given in the local files.
END