3′ Adapter stripping is used where the length of the DNA fragments being sequenced are likely to be shorter than the read length. This will be the case when selecting small, or micro, RNA for sequencing and can also happen to a lesser degree on almost any run of the Illumina GA as some small fragments will get through the Gel selection process.
The Novoalign adapter stripping is aggressive in that it can remove very short lengths of adapter, down to as little as 1 bp of adapter. This aggressiveness is deliberate in that removing the odd extra base is seen as preferable to calling a mismatches for a few bp of adapter that might not be stripped in less aggressive approaches.
Adapter stripping is performed by gapped (since version 3.0) alignment of the adapter sequence to the read. The alignment uses base qualities to calculate mismatch penalties
Command Line Options
-a | Turns on adapter stripping using the default adapter sequence ‘Gex Adapter 2’ , “TCGTATGCCGTCTTCTGCTTG”. The GEX Adapter 2 sequence is commonly but not always used. You should check to ensure that the right adapter sequence has been specified. The -a option is also used for paired-end adapter stripping please refer to Paired End Short Fragment Detection and Adapter Stripping. |
-a sequence | Turns on 3′ adapter stripping using the specified adapter sequence. A list of Illumina/Solexa adapter sequences can be found atSolexa Library Primer Sequences and also at Seqanswers.com. |
3′ adapter stripping is normally used with small RNA as the small RNA are in size range 18-22bp and the reads are usually 32 or 36bp. This means the reads will have extended into the adapter and it needs to be removed in order to align the read.
Short fragments can also occur in with normal DNA, RNA and on paired end reads. Mostly this is not a problem, but for the odd paired end run we have seen average fragment lengths around 80bp with quite a lot of fragments shorter than the read length. If this happens with one of your runs it may be a good idea to turn adapter stripping on.
Adapter Alignment
An ungapped alignment is performed between the adapter sequence and the read. If the best alignment has a score > 7 then the adapter is stripped based on starting location of the alignment.
Scoring for the adapter alignment uses a match reward & mismatch penalty
match reward | 7 + 10log10(1-Perr) where Perr is calculated from the base quality. |
mismatch penalty | 7 + min(-30, 10log(Perr/3)) |
Match & mismatch penalties will also be affected by base quality calibration and, when using _prb.txt files with 4 base probabilities as read files, each base will have it’s individual penalty based on the prb quality value.
With this scheme it’s quite possible that a single base could be stripped from the read.