AltExtron - gb147 - flatfiles of transcript confirmed exons

AltExtron - gb147 - flatfiles of transcript confirmed exons

Data on all the observed exons is given in a flat file. Note that the intron flat files have similar, but not identical, format and are described separately.


Homo sapiens ae_gb147_human_exons.flat.gz
Mus musculus ae_gb147_mouse_exons.flat.gz
Rattus norvegicus ae_gb147_rat_exons.flat.gz
Drosophila melanogaster ae_gb147_dros_exons.flat.gz
Caenorhabditis elegans ae_gb147_elegan_exons.flat.gz
Arabidopsis thaliana ae_gb147_arab_exons.flat.gz
Danio rerio ae_gb147_zfish_exons.flat.gz
Gallus gallus ae_gb147_chicken_exons.flat.gz
Xenopus laevis ae_gb147_frog_exons.flat.gz
Bos taurus ae_gb147_cow_exons.flat.gz
Anopheles gambiae ae_gb147_mosquito_exons.flat.gz
Format of the exon flat file

By way of example, consider the entry:

>IDB60061(2441..2541)
CDS:    inCDS, p0, tf0
TYPE:   GT-AG A3
ELM:    e5(1..101 101)
NUMT:   294
GB:     D88010.1  2466..2566
FSAI:   CTTATTAATGTTTGATAATGTTAGGTCATTTTGGGTGGTTTTCTTGAATTGCACCAAATTTTATTTTTAG
FSAE:   gataaggatgctaaattccgtctgattctaatagagagccggattcaccgt..
FSDE:   ..ttggctcgatattataagaccaagcgagtcctccctcccaattggaaata
FSDI:   GTAAGTATCAACTCTTTTGTCGTTGTTATCAAGAATAGGAGTCAGCCAGTAGTAAAAGTCCTAGTAGTAA
OVIN:   i:2185..2474,  GB(2210..2499),  i4(1..256 256)e5(1..34 101)
OVEX:   e:2475..2541,  GB(2500..2566),  e5(35..101 101)
CNTX:   ~1..11,165..213,396..474,2015..2184,2441..2541,3171..~3204
CNTX:   ~1..23,165..213,396..474,2015..2184,2441..2541,3171..~3204
SSIS:   5.64, 7.41
END

Examining the fields in turn:

>IDB60061(2441..2541)

This is the first field and gives the gene identifier and position of the observed exon within the gene.

CDS:    inCDS, p0, tf0

Describes if the exon is compleatly (inCDS), is partially (partCDS), or is not (notCDS) within the annotated coding sequence. If the exon is within annotated coding sequence the start phase of the exon (the phase of the 5' flanking intron) is given as either 'p0', 'p1', 'p2', or where the phase cannot be determined, 'p-'. The phase is determined by comparing the first context given (see below) with the annoated coding sequence. Also given with the 'tf' prefix are the possible start phases that lead to translation of the exon without the introduction of a stop codon within the exon.

TYPE:   GT-AG A3

This field describes which of the six donor site groups this exon has been clasified as belonging to. This will be one of the groups; 'GT-AG A3', 'GT-AG G3', 'GT-AG Y3', 'GT-AG weak', 'GC-AG', 'AT-AC' and also any 'Annotated-Non-Canonical' events (which are usually annotation errors).

ELM:    e5(1..101 101)

This field describes how the observed exon compares with the annotated introns and exons. In this case the observed exon is an annotated form.

NUMT:   294

The number of transcripts observed to confirm this exon.

GB:     D88010.1  2466..2566

The GenBank/EMBL/DDBJ accession and exon location. In cases where the gene is on the complement strand of the annotaed sequence, this is signified with 'complement(position)'.

FSAI:   CTTATTAATGTTTGATAATGTTAGGTCATTTTGGGTGGTTTTCTTGAATTGCACCAAATTTTATTTTTAG

Up to 70 nts of flanking sequence from the acceptor/upstream inon.

FSAE:   gataaggatgctaaattccgtctgattctaatagagagccggattcaccgt..

Up to 70 nts of sequence from the 5' end of the exon.

FSDE:   ..ttggctcgatattataagaccaagcgagtcctccctcccaattggaaata

Up to 70 nts of sequence from the 3' end of the exon.

FSDI:   GTAAGTATCAACTCTTTTGTCGTTGTTATCAAGAATAGGAGTCAGCCAGTAGTAAAAGTCCTAGTAGTAA

Up to 70 nts of sequence from the donor/downstream exon.

OVIN:   i:2185..2474,  GB(2210..2499),  i4(1..256 256)e5(1..34 101)
OVEX:   e:2475..2541,  GB(2500..2566),  e5(35..101 101)

Each of the overlapping intron (OVIN) and overlapping exon (OVEX) fields may occur 0 or more times, and each occurence describes an intron / exon that is observed and that shares sequence with the current exon.

CNTX:   ~1..11,165..213,396..474,2015..2184,2441..2541,3171..~3204
CNTX:   ~1..23,165..213,396..474,2015..2184,2441..2541,3171..~3204

The 'context' field occurs 1 or more times, and describes the context(s) in which this exon was observed. That is, in this case, that exon 2441..2541 was seen (first CNTX) in one or more transcripts with 4upstream introns and 1 downstream intron. In the second CNTX, one or more transcripts show the same flanking introns and exons apart from the use of an alternative donor site for the first exon. The use of '~' indicates that this position has been determined by the termination of a gene transcript match for which the exact position of the splice site has not been determined. Such a position may be supposed not to extend a long way past a splice site (into an intron), but may be just about anywhere within an exon.

SSIS:   5.64, 7.41

Splice Site Information Scores. The 5' and 3' splice site information scores (acceptor, donor in the case of an exon).

END

The END tag signifies the end of the entry.