• Jump to Content
.: Paparan Dokumen
Tutup Tetingkap :.
Home
|
Products
|
Downloads
|
Services
|
Support
|
Documentation
|
About Us
|
Contact Us
|
Buy Now
Search:


Paired End Short Fragment Detection and Adapter Stripping

TOC Previous page Parent page Next page
Structure   Novocraft Technologies  >  Novoalign User Guide  >  Adapter Stripping  >  Paired End Short Fragment Detection and Adapter Stripping
DNA fragments that are shorter than the read length will cause reads to extend into adapter sequence and can be a non-trivial portion of reads especially with longer reads from the GA II.
If a read extends into adapter by only a few bases it may still align , but with mismatches and indels in the adapter region which can then contribute to false positive alignments and incorrect SNP calls. As fragments get shorter and the amount of adapter increases it's more likely the read will fail to align.

Novoalign V2.05 and later include an option in licensed versions to detect short fragments and to strip the adapter sequence from them.

Command Line Options
-aTurns on adapter stripping using the default adapter sequences, "AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAG", and "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA". These default adapter sequences have been seen on many libraries however we have also seen "AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG" on end of read 1 of some libraries. You should always check to ensure that the correct adapter sequence has been specified.
The -a option is also used for single-end adapter stripping please refer to 3' Adapter Stripping - Single End Reads.
-a sequence1 sequence2Turns on paired-end adapter stripping using the specified adapter sequences. The first adapter sequence is the adapter on 3' end of fragments that will be read as part of read 1 on short fragments. The second adapter sequence is the 5' adapter that will be read as part of read2. The second adapter sequence is optional and is only required if the two adapters differ. It is also not required to specify the full sequence, we recommend at least 12bp to ensure good specificity. A list of Illumina/Solexa adapter sequences can be found at Solexa Library Primer Sequences (external link) and also at Seqanswers.com (external link).


Example specifying adapters from the SeqAnswers document
novoalign ... -a "AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG" "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA"...


If in doubt about the adapters used or you want to check if there are any reads with adapter present you can use the following command to linux shell command to print possible adapter sequences:
grep -E AGATCGGAAGAGC[ACGT]{10} readfile_1.fastq | sed "s/.*\(AGATCGGAAGAGC.*\)/\1/" | less

Do this for read 1 & 2 files and check if adapters match the defaults.

Example1
Without Adapter StrippingWith Adapter Stripping
#       Paired Reads:     2500
#      Pairs Aligned:     2033
#     Read Sequences:     5000
#            Aligned:     4318
#   Unique Alignment:     3273
#   Gapped Alignment:      118
#     Quality Filter:        0
# Homopolymer Filter:        0
#       Elapsed Time: 23,860s
# Fragment Length Distribution
#	From	To	Count
#	42	47	20
#	48	53	37
#	54	59	40
#	60	65	44
...
#       Paired Reads:     2500
#      Pairs Aligned:     2112
#     Read Sequences:     5000
#            Aligned:     4458
#   Unique Alignment:     3376
#   Gapped Alignment:      112
#     Quality Filter:        0
# Homopolymer Filter:        0
#       Elapsed Time: 25,391s
# Fragment Length Distribution
#	From	To	Count
#	24	29	1
#	30	35	14
#	36	41	27
#	42	47	36
#	48	53	37
#	54	59	40
#	60	65	44
...
Data sample from 45bp paired end Human cDNA, mean fragment length 137 with standard deviation of 45

Example 2
Without Adapter StrippingWith Adapter Stripping
#       Paired Reads:     5000
#      Pairs Aligned:     3788
#     Read Sequences:    10000
#            Aligned:     8194
#   Unique Alignment:     6995
#   Gapped Alignment:       42
#     Quality Filter:       26
# Homopolymer Filter:       37
#       Elapsed Time: 494,338s
# Fragment Length Distribution
#	From	To	Count
#	42	48	75
#	49	55	154
#	56	62	187
...
#       Paired Reads:     5000
#      Pairs Aligned:     3906
#     Read Sequences:    10000
#            Aligned:     8398
#   Unique Alignment:     7135
#   Gapped Alignment:       34
#     Quality Filter:       59
# Homopolymer Filter:       43
#       Elapsed Time: 438,171s
# Fragment Length Distribution
#	From	To	Count
#	28	34	14
#	35	41	48
#	42	48	94
#	49	55	153
#	56	62	186
#	63	69	243
...
Data sample from 45bp Bisulphite treated Human DNA, mean fragment length of 94bp and standard deviation of 30


Without Adapter Stripping

@HWI-EAS261_4:1:1:968:1074/1	L	GTTTCAGTGCATCACAGTTCATCTTCTAACCCCAGAGTCAGAAGA	IIIIIIII$IIIIIIIIIIIIIIIIIII1III+9I%IIIIF>;;<	U	80	71	>CD5-chr11_S_11_60625543_60652895	3369	F	.	.	.	33A>C 43+A
@HWI-EAS261_4:1:1:968:1074/2	R	TCTGACTCTTGGGTTAGAAGATGAACTGTGATGGACTGAAACAGA	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII3IIIIIIIIII?	NM
@HWI-EAS261_4:1:1:1037:1480/1	L	GAAAAAGACCCTGGAAGCAGTTAGCAGAATAGTGTGATAATGAGA	IIIIIIII&IIIIIIIIIIIIIIIIIIIIIIIEIII8I4II>C7>	NM
@HWI-EAS261_4:1:1:1037:1480/2	R	CATTATCACACTATTCTGCTAACTGCTTCCATGGTCTTTTTCCGA	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII@IIIIIIII':-	U	58	99	>LAP3-chr4_I_2_4_17190620_17192520	1740	R	.	.	.	1A>T 2G>C 3T>G

With Adapter Stripping

@HWI-EAS261_4:1:1:968:1074/1	L	GTTTCAGTGCATCACAGTTCATCTTCTAACCCCAGAGTCAGA	IIIIIIII$IIIIIIIIIIIIIIIIIII1III+9I%IIIIF>	U	95	150	>CD5-chr11_S_11_60625543_60652895	3369	F	.	3369	R	33A>C
@HWI-EAS261_4:1:1:968:1074/2	R	TCTGACTCTTGGGTTAGAAGATGAACTGTGATGGACTGAAAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII3IIIIIIII	U	0	150	>CD5-chr11_S_11_60625543_60652895	3369	R	.	3369	F
@HWI-EAS261_4:1:1:1037:1480/1	L	GAAAAAGACCCTGGAAGCAGTTAGCAGAATAGTGTGATAATG	IIIIIIII&IIIIIIIIIIIIIIIIIIIIIIIEIII8I4II>	U	43	150	>LAP3-chr4_I_2_4_17190620_17192520	1743	F	.	1743	R	11A>C
@HWI-EAS261_4:1:1:1037:1480/2	R	CATTATCACACTATTCTGCTAACTGCTTCCATGGTCTTTTTC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII@IIIIIIII	U	0	150	>LAP3-chr4_I_2_4_17190620_17192520	1743	R	.	1743	F

Adapter Stripping Process

The reads are prefixed with adapter sequence and then aligned against each other using Needleman Wunsch Global alignment. The alignment allows mismatches and indels and uses quality based scoring similar to the alignment algorithms used to align reads against the reference with the difference that we now have base qualities for the two sequences in the alignment rather than just for one. We then take the highest scoring alignment and if its identity exceeds 90% it is used to determine the amount of adapter to be trimmed.

In order for adapter to be trimmed the two reads of a pair both need to align to the same amount of adapter sequence and to align to the reverse complement of each other. False positives are highly improbable.


Created by colin. Last Modification: Wednesday 13 of October, 2010 08:54:05 MYT by colin.

History

Sidebar

Novo Wiki

Toggle Documentation
Novocraft User Guide
Novoalign NGS Quick Start Tutorial
Toggle Forums
Release Notices
User Support Forum
Known Issues
Toggle FAQs
Novoalign FAQ
Toggle Novoalign Trial Feedback
Toggle Citations

Wiki/Forum Search

in:

Login

Login as…


Error CapsLock is on.
(for 1 month)
[ Register | I forgot my password ]
Powered by Tikiwiki Powered by PHP Powered by Smarty Powered by ADOdb Made with CSS Powered by RDF
Powered by TikiWiki CMS/Groupware |