Isoforms may be observed through either comparision of observed and overlapping exons or comparision of observed and overlapping introns. The trick here is to be sure that a pair of overlapping introns (or exons) are not the result of a cassette exon event, or some other type of alternative splicing. A pair of observed and overlapping introns (or exons) are considered to be isoforms of eachother when, at both ends, the observed flanking exons or exon fragments (or introns) also overlap with eachother.
1. Isoform data as lists, tables and flatfiles
These data files contain information on all the introns and exons that are seen to have isoforms (both 'normal' and 'alternative' forms are given without distinction).
Each of the data sets above can be downloaded as a simple list of the intron / exon identifiers, as a table, or as a flatfile (see the pages on the intron / exon flat and table files for descriptions).
2. Isoform data in 'normal' / 'alternative' pairs
It is convinient to differentiate one intron (or exon) as the normal form, and the other as the alternative form. For the practical purpose of making this distinction we choose an intron (or exon) that is an annotated form as the normal form. In cases where both introns (or exons) are not annotated forms, we choose the form with the greatest transcript coverage to act as the normal form. When there are multiple isoforms of a particular intron (or exon), we only consider each alternative form in relation to the normal form, thus avoiding consideration of all the possible combinations of alternative pairs.
The two files below contain information on all the observed intron and exon isofom pairs.
These files contain entries of the form:
IDB64914(1031..10872) Atr p1 0 nt 12 nt altInt(1031..10884)
where the fields have the following meanings:
The first field, 'IDB64914(1031..10872)', is the gene identifier and position of the 'normal' form of the intron (or exon).
The second field contains one or two of the following abreviations:
| Dtr | Donor site truncation |
| Dex | Donor site extension |
| Dmd | Donor site modification |
| Atr | Acceptor site truncation |
| Aex | Acceptor site extension |
| Amd | Acceptor site modification |
Where the putative 'normal' form is not an annotated form, the isoforms are described as donor and/or acceptor site modifications (rather than as truncations or extensions). ALSO NOTE THAT the categorisation of isofoms as involving 'truncation' or'extension', REFERS TO THE EFFECT ON THE EXON.
So, in the case above the categorisation is of an Acceptor site truncation, even though the alternative intron is bigger than the normal form.
The phase of the intron (or the start phase for exons) is given as 'p0', 'p1', 'p2' or 'p-' meaning phase 0, phase 1, phase 2, or phase not determined.
The next two fields ('0 nt 12 nt') give the number nucleotides that differ at each end of the isoform.
Finally, the position of the alternative intron (or exon) is given.
3. Isoform data and changes to coding sequence
For the intron isoform data, we have examined the effect of the isoform on the coding sequence (so far as is possible with the observed transcripts).
By way of example, consider the entry:
>IDB60061
8.37 9.86 193 i1(1..141 141)
-1.87 9.86 1 e1(12..23 23)i1(1..141 141)
RMHAPGK
RMK
10.56 5.64 386 i4(1..256 256)
10.56 4.90 1 i4(1..256 256)e5(1..34 101)
RKDKDAKFRLILIES RIHRLARYYKTKRVLPPNWKYESSTASALVA*
RKRA GFTVWLDIIRPSESSLPIGNMNHLQPLPWSH
This describes two intron isoforms in the gene IDB60061; one of intron 1 and the other of intron 4. For each intron the three numbers (ie '8.37 9.86 193') describe the strength of the donor (5') splice site (in bits), the strength of the acceptor (3') splice site, and the number of transcripts observed to demonstrate this intron.
In the first case above, an exon truncation event removes 12 nucleotides from the transcript, resulting in the removal from the translated form of the amino acids 'HAPG'. In this case, the frame is preserved, and only the amino acids around the alternative event are shown. In the second case (acceptor site truncation of intron 4) the frame is not preserved, and so, in addition to the translation of the indel sequence, downstream translation is also given (after the space). The downstream protein sequence is determined from examination of the "context" of the intron in the transcripts in which it is observed. That is, the downstream translation is only evaluated as far as any transcript showing this form defines. Further, translations are truncated at stop codons, when they ocur.
fc - 16/8/2001