Key Features . #5
- Supporting Illumina, PacBio, BGI and Nanopore reads
- Reference Indexing on-the-fly
- Supporting reference sequences with IUPAC base codes to reduce reference bias
- Utilizing base qualities information throughout the alignment process
- Penalizing splice junctions differently depending on annotation support and splicing signals
- Powered by splice- and quality-aware Needleman-Wunsch and Seed Chaining algorithms
Usage .
novoSplice is very simple to run, and it doesn't require any indexing or pre-processing steps. Simply pass the genome file(s), annotation file, and the path to the output directory and novoSplice will do the rest.
novoSplice --fasta genomeFiles --gff3/--gtf annotationFile -1 mate1.fastq -2 mate2.fastq -o dir/runPrefix
- Users may pass compressed or uncompressed FASTA/GTF/GFF/FASTQ files.
- Users may pass one FASTA file or a directory of FASTA files.
- Users may inject SNVs to the genome sequences using novoutil iupac.
- Users may pass --tune option to tune novoSplice to better suit a different platform, library or instrument.
Important Options .
--ignoreErrors | Ignore errors in FASTA/GFF3/GTF files | |
--index | [gene, exon] | Which genetic features to be indexed in the main hash table, default is gene. |
--tune | [ONT-cDNA, ONT_Direct, PacBio] | Tune novoSplice options to better suit various platforms, instruments or protocols, default is Illumina. |
--strand | [forward, reverse] | Strandness filter, default is unstranded |
--nOverhangASJ | [1-32] | Minimum number of overhang bases to be used as anchor to detect an annotated splice junction, default is 3 |
--intronLen | [8- ] | Minimum length of a gap to be treated as a novel intron instead of deletion, default is 16, and the maximum value is 99,999,999 |
--pTSJ_GTAG | [0-DEL[ | Penalty for a splice junction between an annotated donor and acceptor supported by an annotated transcript with splicing signal GTAG, default is 0. There are similar options for other splicing signals and for each level of annotation support. Unannotated splice junctions between annotated exons: pESJ_GTAG, pESJ_GCAG, pESJ_ATAC, pESJ_other DEL is the penalty for deleting a gap with length --intronLen |
--pNovelTran | [0-255] | Penalty for a novel transcript when each individual splice junction is supported by an annotated transcript, but there is no transcript to explain all the splice junctions in the read/pair as a whole, default is 48 |
--pPseudoGene | [0-255] | Penalty for read/pair aligned to untranscribed, untranslated pseudogene, default is 30 |
--stdout | Write output alignments to standard output | |
--cigarM | Report CIGAR field with M to represent match/ mismatch instead of =/X | |
--wiser | Tune novoSplice internal options to improve accuracy | |
--faster | Tune novoSplice internal options to reduce run-time |
You may check novoSplice’s manual or pass --help to see the full list of options.
Benchmarking .
We used SimBA suite of tools to simulate RNA-seq reads and benchmark RNA-seq aligners. The simulated data sets were generated by SimCT for different species, mutation rates and read lengths. BenchCT is then called to assess the RNA-seq aligners for mapping to reference genome and discovering annotated splice junctions.
We present the results of the alignment process of five different RNA-seq specialized aligners, namely STAR, HISAT2, GSNAP, subJunc and novoSplice. The data sets are from five different species; Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster and Caenorhabditis elegans with read length equals 100 bases and number of reads around 160 million paired-end reads. Details of each run can be found in novoSplice’s manual. Besides that, we are working on other benchmarking studies and we are interested in collaborating with researchers in this regard.
nReads: Number of reads, nJunctions: Number of splice junctions, TP: Number of true positives, FP: Number of false positives, FN: Number of false negatives, TP MAPQ > 3: Number of true positives with MAPQ > 3
© Novocraft Technologies Sdn Bhd