AltExtron - gb147 - flatfiles of transcript confirmed exons
Data on all the observed exons is given in a flat file. Note that the intron flat files have similar, but not identical, format and are described separately.
| Homo sapiens | ae_gb147_human_exons.flat.gz |
| Mus musculus | ae_gb147_mouse_exons.flat.gz |
| Rattus norvegicus | ae_gb147_rat_exons.flat.gz |
| Drosophila melanogaster | ae_gb147_dros_exons.flat.gz |
| Caenorhabditis elegans | ae_gb147_elegan_exons.flat.gz |
| Arabidopsis thaliana | ae_gb147_arab_exons.flat.gz |
| Danio rerio | ae_gb147_zfish_exons.flat.gz |
| Gallus gallus | ae_gb147_chicken_exons.flat.gz |
| Xenopus laevis | ae_gb147_frog_exons.flat.gz |
| Bos taurus | ae_gb147_cow_exons.flat.gz |
| Anopheles gambiae | ae_gb147_mosquito_exons.flat.gz |
By way of example, consider the entry:
>IDB60061(2441..2541) CDS: inCDS, p0, tf0 TYPE: GT-AG A3 ELM: e5(1..101 101) NUMT: 294 GB: D88010.1 2466..2566 FSAI: CTTATTAATGTTTGATAATGTTAGGTCATTTTGGGTGGTTTTCTTGAATTGCACCAAATTTTATTTTTAG FSAE: gataaggatgctaaattccgtctgattctaatagagagccggattcaccgt.. FSDE: ..ttggctcgatattataagaccaagcgagtcctccctcccaattggaaata FSDI: GTAAGTATCAACTCTTTTGTCGTTGTTATCAAGAATAGGAGTCAGCCAGTAGTAAAAGTCCTAGTAGTAA OVIN: i:2185..2474, GB(2210..2499), i4(1..256 256)e5(1..34 101) OVEX: e:2475..2541, GB(2500..2566), e5(35..101 101) CNTX: ~1..11,165..213,396..474,2015..2184,2441..2541,3171..~3204 CNTX: ~1..23,165..213,396..474,2015..2184,2441..2541,3171..~3204 SSIS: 5.64, 7.41 END
Examining the fields in turn:
>IDB60061(2441..2541)
This is the first field and gives the gene identifier and position of the observed exon within the gene.
CDS: inCDS, p0, tf0
Describes if the exon is compleatly (inCDS), is partially (partCDS), or is not (notCDS) within the annotated coding sequence. If the exon is within annotated coding sequence the start phase of the exon (the phase of the 5' flanking intron) is given as either 'p0', 'p1', 'p2', or where the phase cannot be determined, 'p-'. The phase is determined by comparing the first context given (see below) with the annoated coding sequence. Also given with the 'tf' prefix are the possible start phases that lead to translation of the exon without the introduction of a stop codon within the exon.
TYPE: GT-AG A3
This field describes which of the six donor site groups this exon has been clasified as belonging to. This will be one of the groups; 'GT-AG A3', 'GT-AG G3', 'GT-AG Y3', 'GT-AG weak', 'GC-AG', 'AT-AC' and also any 'Annotated-Non-Canonical' events (which are usually annotation errors).
ELM: e5(1..101 101)
This field describes how the observed exon compares with the annotated introns and exons. In this case the observed exon is an annotated form.
NUMT: 294
The number of transcripts observed to confirm this exon.
GB: D88010.1 2466..2566
The GenBank/EMBL/DDBJ accession and exon location. In cases where the gene is on the complement strand of the annotaed sequence, this is signified with 'complement(position)'.
FSAI: CTTATTAATGTTTGATAATGTTAGGTCATTTTGGGTGGTTTTCTTGAATTGCACCAAATTTTATTTTTAG
Up to 70 nts of flanking sequence from the acceptor/upstream inon.
FSAE: gataaggatgctaaattccgtctgattctaatagagagccggattcaccgt..
Up to 70 nts of sequence from the 5' end of the exon.
FSDE: ..ttggctcgatattataagaccaagcgagtcctccctcccaattggaaata
Up to 70 nts of sequence from the 3' end of the exon.
FSDI: GTAAGTATCAACTCTTTTGTCGTTGTTATCAAGAATAGGAGTCAGCCAGTAGTAAAAGTCCTAGTAGTAA
Up to 70 nts of sequence from the donor/downstream exon.
OVIN: i:2185..2474, GB(2210..2499), i4(1..256 256)e5(1..34 101) OVEX: e:2475..2541, GB(2500..2566), e5(35..101 101)
Each of the overlapping intron (OVIN) and overlapping exon (OVEX) fields may occur 0 or more times, and each occurence describes an intron / exon that is observed and that shares sequence with the current exon.
CNTX: ~1..11,165..213,396..474,2015..2184,2441..2541,3171..~3204 CNTX: ~1..23,165..213,396..474,2015..2184,2441..2541,3171..~3204
The 'context' field occurs 1 or more times, and describes the context(s) in which this exon was observed. That is, in this case, that exon 2441..2541 was seen (first CNTX) in one or more transcripts with 4upstream introns and 1 downstream intron. In the second CNTX, one or more transcripts show the same flanking introns and exons apart from the use of an alternative donor site for the first exon. The use of '~' indicates that this position has been determined by the termination of a gene transcript match for which the exact position of the splice site has not been determined. Such a position may be supposed not to extend a long way past a splice site (into an intron), but may be just about anywhere within an exon.
SSIS: 5.64, 7.41
Splice Site Information Scores. The 5' and 3' splice site information scores (acceptor, donor in the case of an exon).
END
The END tag signifies the end of the entry.