AltExtron - gb147 - flatfiles of transcript confirmed introns
Data on all the observed introns is given in a flat file. Note that the exon flat files have similar, but not identical, format and are described separately.
| Homo sapiens | ae_gb147_human_introns.flat.gz |
| Mus musculus | ae_gb147_mouse_introns.flat.gz |
| Rattus norvegicus | ae_gb147_rat_introns.flat.gz |
| Drosophila melanogaster | ae_gb147_dros_introns.flat.gz |
| Caenorhabditis elegans | ae_gb147_elegan_introns.flat.gz |
| Arabidopsis thaliana | ae_gb147_arab_introns.flat.gz |
| Danio rerio | ae_gb147_zfish_introns.flat.gz |
| Gallus gallus | ae_gb147_chicken_introns.flat.gz |
| Xenopus laevis | ae_gb147_frog_introns.flat.gz |
| Bos taurus | ae_gb147_cow_introns.flat.gz |
| Anopheles gambiae | ae_gb147_mosquito_introns.flat.gz |
By way of example, consider the entry:
>IDB60041(1640..2350) CDS: inCDS, p0 TYPE: GT-AG G3 ELM: i3(1..711 711) NUMT: 16 GB: AB021866.1 1682..2392 FSDE: gcggaccgtggagtcgtcacttcgggcacaagtgcccttcgagcagattctcagccttccagagctcaag FSDI: GTGCAAGCGCTCCCCTCCTTTGACACCTCTCCCACCACTCCCTCCCTGCTAGACCCCCTAACTCCATCTG.. FSAI: ..CTCTCAAGTTTCTGGTAGGCTTTAATGAGCGTGTGACCTGGGCCACGTCCTGTGGCGTTTGTTCTCCTAG FSAE: gccaaccccttcaaggagcgaatctgcagggtcttctccacatccccagccaaagacagccttagctttg OVIN: i:218..2350, GB(260..2392), i2(1..1313 1313)e3(1..109 109)i3(1..711 711) CNTX: ~1..51,183..217,1531..1639,2351..2501,2641..2759,2832..~2917 CNTX: ~1579..1639,2351..2501,2641..2759,2832..2920,3350..~3371 SSIS: 4.95, 8.71 GGCC: 0.557 END
Examining the fields in turn:
>IDB60041(1640..2350)
This is the first field and gives the gene identifier and position of the observed intron within the gene.
CDS: inCDS, p0
Describes if the intron is, or is not, within the annotated coding sequence. If the intron is within annotated coding sequence the phase of the intron is given as either 'p0', 'p1', 'p2', or where the phase cannot be determined, 'p-'. The phase is determined by examining both the position of the intron in the annotated CDS and the context of the intron (see below).
TYPE: GT-AG G3
This field describes which of the six donor site groups this intron has been clasified as belonging to. This will be one of the groups; 'GT-AG A3', 'GT-AG G3', 'GT-AG Y3', 'GT-AG weak', 'GC-AG', 'AT-AC' and also any 'Annotated-Non-Canonical' events (which are usually annotation errors).
ELM: i3(1..711 711)
This field describes how the observed intron compares with the annotated introns and exons. In this case the observed intron is an annotated intron.
NUMT: 16
The number of transcripts observed to confirm this intron.
GB: AB021866.1 1682..2392
The GenBank/EMBL/DDBJ accession.version and intron location. In cases where the gene is on the complement strand of the annotaed sequence, this is signified with 'complement(position)'.
FSDE: gcggaccgtggagtcgtcacttcgggcacaagtgcccttcgagcagattctcagccttccagagctcaag
Up to 70 nts of flanking sequence from the donor/upstream exon (Flanking Seq. Donor Exon).
FSDI: GTGCAAGCGCTCCCCTCCTTTGACACCTCTCCCACCACTCCCTCCCTGCTAGACCCCCTAACTCCATCTG..
Up to 70 nts of sequence from the 5' end of the intron.
FSAI: ..CTCTCAAGTTTCTGGTAGGCTTTAATGAGCGTGTGACCTGGGCCACGTCCTGTGGCGTTTGTTCTCCTAG
Up to 70 nts of sequence from the 3' end of the intron.
FSAE: gccaaccccttcaaggagcgaatctgcagggtcttctccacatccccagccaaagacagccttagctttg
Up to 70 nts of sequence from the acceptor/downstream exon.
OVIN: i:218..2350, GB(260..2392), i2(1..1313 1313)e3(1..109 109)i3(1..711 711)
This field may occur 0 or more times, and each occurence describes an intron that is observed and that shares sequence with the current intron. There is a similar field for exons 'OVEX' of which there are 0 in this example entry.
CNTX: ~1..51,183..217,1531..1639,2351..2501,2641..2759,2832..~2917 CNTX: ~1579..1639,2351..2501,2641..2759,2832..2920,3350..~3371
The 'context' field occurs 1 or more times, and describes the context(s) in which this intron was observed. That is, in this case, that intron 1640..2350 was seen (first CNTX) in one or more transcripts with 2 upstream introns and 2 downstream introns. In the second CNTX, the intron is seen in one or more transcripts with three upstream introns. The use of '~' indicates that this position has been determined by the termination of a gene transcript match for which the exact position of the splice site has not been determined. Such a position may be supposed not to extend a long way past a splice site (into an intron), but may still be just about anywhere within the exon.
SSIS: 4.95, 8.71
Splice Site Information Scores. The 5' and 3' splice site information scores (donor, acceptor in the case of an intron).
GGCC: 0.557
The Gene G+C content.
END
The END tag signifies the end of the entry.