![]() |
![]() |
||||||||||||||||||||||
Paired End Short Fragment Detection and Adapter StrippingIf a read extends into adapter by only a few bases it may still align , but with mismatches and indels in the adapter region which can then contribute to false positive alignments and incorrect SNP calls. As fragments get shorter and the amount of adapter increases it's more likely the read will fail to align. Novoalign V2.05 and later include an option in licensed versions to detect short fragments and to strip the adapter sequence from them. Command Line Options
Example specifying adapters from the SeqAnswers document novoalign ... -a "AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG" "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA"... If in doubt about the adapters used or you want to check if there are any reads with adapter present you can use the following command to linux shell command to print possible adapter sequences: grep -E AGATCGGAAGAGC[ACGT]{10} readfile_1.fastq | sed "s/.*\(AGATCGGAAGAGC.*\)/\1/" | lessDo this for read 1 & 2 files and check if adapters match the defaults.
Without Adapter Stripping@HWI-EAS261_4:1:1:968:1074/1 L GTTTCAGTGCATCACAGTTCATCTTCTAACCCCAGAGTCAGAAGA IIIIIIII$IIIIIIIIIIIIIIIIIII1III+9I%IIIIF>;;< U 80 71 >CD5-chr11_S_11_60625543_60652895 3369 F . . . 33A>C 43+A @HWI-EAS261_4:1:1:968:1074/2 R TCTGACTCTTGGGTTAGAAGATGAACTGTGATGGACTGAAACAGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII3IIIIIIIIII? NM @HWI-EAS261_4:1:1:1037:1480/1 L GAAAAAGACCCTGGAAGCAGTTAGCAGAATAGTGTGATAATGAGA IIIIIIII&IIIIIIIIIIIIIIIIIIIIIIIEIII8I4II>C7> NM @HWI-EAS261_4:1:1:1037:1480/2 R CATTATCACACTATTCTGCTAACTGCTTCCATGGTCTTTTTCCGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII@IIIIIIII':- U 58 99 >LAP3-chr4_I_2_4_17190620_17192520 1740 R . . . 1A>T 2G>C 3T>G With Adapter Stripping@HWI-EAS261_4:1:1:968:1074/1 L GTTTCAGTGCATCACAGTTCATCTTCTAACCCCAGAGTCAGA IIIIIIII$IIIIIIIIIIIIIIIIIII1III+9I%IIIIF> U 95 150 >CD5-chr11_S_11_60625543_60652895 3369 F . 3369 R 33A>C @HWI-EAS261_4:1:1:968:1074/2 R TCTGACTCTTGGGTTAGAAGATGAACTGTGATGGACTGAAAC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII3IIIIIIII U 0 150 >CD5-chr11_S_11_60625543_60652895 3369 R . 3369 F @HWI-EAS261_4:1:1:1037:1480/1 L GAAAAAGACCCTGGAAGCAGTTAGCAGAATAGTGTGATAATG IIIIIIII&IIIIIIIIIIIIIIIIIIIIIIIEIII8I4II> U 43 150 >LAP3-chr4_I_2_4_17190620_17192520 1743 F . 1743 R 11A>C @HWI-EAS261_4:1:1:1037:1480/2 R CATTATCACACTATTCTGCTAACTGCTTCCATGGTCTTTTTC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII@IIIIIIII U 0 150 >LAP3-chr4_I_2_4_17190620_17192520 1743 R . 1743 F Adapter Stripping ProcessThe reads are prefixed with adapter sequence and then aligned against each other using Needleman Wunsch Global alignment. The alignment allows mismatches and indels and uses quality based scoring similar to the alignment algorithms used to align reads against the reference with the difference that we now have base qualities for the two sequences in the alignment rather than just for one. We then take the highest scoring alignment and if its identity exceeds 90% it is used to determine the amount of adapter to be trimmed.![]() In order for adapter to be trimmed the two reads of a pair both need to align to the same amount of adapter sequence and to align to the reverse complement of each other. False positives are highly improbable. Created by colin. Last Modification: Wednesday 13 of October, 2010 08:54:05 MYT by colin. |
|||||||||||||||||||||||
Powered by TikiWiki CMS/Groupware |









