Creating the Reference Sequence Set
A set of perl tools to create a reference sequence set for transcriptome sequencing is available from Cambridge Institute for Medical Research.
The script creates a reference sequence with following components:
- Full gene sequence
- Alternative splice junctions
- Genome with known genes masked
The perl script will download reference sequences and gene annotations from the web and construct the set of sequences for mRNA alignment. The process can take up to 24hrs to run due to download time.
This perl script contains two parameters coded as:
use constant LENGTH => '40'; use constant FLANK => '1000';
The LENGTH parameter is the number of bases from each exon to include in the alternative splice junction sequences. This should be set to be 4-5 bp less than your read length.
The FLANK parameter specifies the length of gene flanking region to be included in the whole gene sequence.
Thanks to Vincent Plagnol of JDRF/WT Diabetes & Inflammation Laboratory, Cambridge Institute for Medical Research, for his help and the above Perl scripts.