I. NOTES ON SSAHA OUTPUT: I.1 NOTES 1: An explanation on the SSAHA hits. Though for some matches, the pid is shown as 75%, the actual BLAST output would show much more identity as in the following example. SSAHA OUTPUT: >IDB60489-736-1670-splicePos-70 5 60 10.129000001-130000000 579153 579208 FF 56 100.00 [const] >IDB60489-736-1670-splicePos-70 75 130 10.129000001-130000000 580203 580258 FF 42 75.00 [const] SEQUENCE: >>IDB60489-736-1670-splicePos-70 AGCCATGGCGGTGGAAGGAGGAATGAAGTGTGTCAAGTTTTTGCTCTACGTTCTCCTGCTGGCCTTCTGC GCCTGTGCAGTGGGATTGATTGCCATTGGTGTAGCGGTTCAGGTTGTCTTGAAGCAGGCCATTACCCATG BLAST OUTPUT: >10.129000001-130000000 Length = 1,000,000 Plus Strand HSPs: Query: 1 AGCCATGGCGGTGGAAGGAGGAATGAAGTGTGTCAAGTTTTTGCTCTACGTTCTCCTGCT 60 AGCCATGGCGGTGGAAGGAGGAATGAAGTGTGTCAAGTTTTTGCTCTACGTTCTCCTGCT Sbjct: 579149 AGCCATGGCGGTGGAAGGAGGAATGAAGTGTGTCAAGTTTTTGCTCTACGTTCTCCTGCT 579208 Query: 61 GGCCTTCTGCG 71 GGCCTTCTGCG Sbjct: 579209 GGCCTTCTGCG 579219 Query: 71 GCCTGTGCAGTGGGATTGATTGCCATTGGTGTAGCGGTTCAGGTTGTCTTG 121 GCCTGTGCAGTGGGATTGAT GCCATTGGTGTAGCGGTTCAGGTTGTCTTG Sbjct: 580199 GCCTGTGCAGTGGGATTGATCGCCATTGGTGTAGCGGTTCAGGTTGTCTTG 580249 Query: 122 AAGCAGGCCATTACCCATG 140 AAGCAGGCCATTACCCATG Sbjct: 580250 AAGCAGGCCATTACCCATG 580268 I.2 NOTES 2: Checked that the 5p and 3p exonic region match to the same chr contig. For the following cases, they match to the adjacent contigs. However, the conit coordinates indicate that the gene in question crosses-over two contigs. >IDB63069-2524-3626-splicePos-70 1 70 3.89000001-90000000 999286 999355 FF 56 100.00 [alt] >IDB63069-2524-3626-splicePos-70 71 139 3.90000001-91000000 846 914 FF 56 100.00 [alt] >IDB66046-6316-16154-splicePos-70 3 58 4.104000001-105000000 989269 989324 FF 56 100.00 [const] >IDB66046-6316-16154-splicePos-70 79 134 4.105000001-106000000 379 434 FF 56 100.00 [const] >IDB63069-2524-3623-splicePos-70 1 70 3.89000001-90000000 999286 999355 FF 56 100.00 [const] >IDB63069-2524-3623-splicePos-70 70 139 3.90000001-91000000 845 914 FF 56 100.00 [const] >IDB64217-10438-18654-splicePos-70 1 70 6.5000001-6000000 939 1008 RF 70 100.00 [const] >IDB64217-10438-18654-splicePos-70 72 127 6.4000001-5000000 989619 989674 RF 56 100.00 [const] I.3 NOTES 3: Checked that all splice junctions from one IDB entry match to the same chr region. There are few entries which come from adjacent contigs - the gene in each of these cases crosses-over the adjacent contigs. >IDB64181-12784-17724-splicePos-70 10 65 5.14000001-15000000 991453 991508 FF 56 100.00 [const] >IDB64181-12784-17724-splicePos-70 77 132 5.14000001-15000000 997165 997220 FF 56 100.00 [const] >IDB64181-26620-40192-splicePos-69 7 62 5.15000001-16000000 3151 3206 FF 56 100.00 [const] >IDB64181-26620-40192-splicePos-69 69 138 5.15000001-16000000 18495 18564 FF 70 100.00 [const] II. NOTES ON BLAST OUTPUT: Some of the splice junctions, for which SSAHA did not show matches to both the 5p and 3p exonic regions, showed up good matches when BLAST was used. They are listed in the data file under a separate section. Some of the matches are not very good (> 95% or so). They may be attributed to sequence quality issues. Such entries are appropriated annotated in the data set. III. Those splice junctions that did not pass through either SSAHA or BLAST checks, are listed in a separate data file. For some of the splice junctions, either the 5p or the 3p exonic region showed good match with the mouse genome region. For some, the 5p and 3p exonic regions matched as a long contiguous match. FOr some, the matches were either weak or not present at all. Whereever there was a match (either in the 5p or 3p or as a long contiguous match), the mapping was to the same chr region with which other related splice junctions of the gene mapped to.