Loading...
 

Support Help

Forums > Support> Problem with mismatchs reporting in output format

Problem with mismatchs reporting in output format

In the documentation the mismatch reporting format says:

the offset is 1 based position of difference relative to the 'Aligned Offset'

It seems the offset is the position of the mismatch in the target sequence relative to the aligned offset position. Does this not mean that multiple adjacent insertions into the query sequence are not representable?

For instance:

target: AG--TCTCTC
query: AGCCTCTCTC

This is impossible to represent in the mismatch format described, is it not? I tried aligning a test sequence and it seems to me novoalign will not return a match with 2 inserted bases in the query.

Andrew


Hi Andrew,

In this example the mismatch is at first base

target: AGTCTCTC
query: GCTCTCTC

and will be reported as 1A>G so to get position in reference sequence we take the alignment offset + 1 - 1

In your example

target:  AG--TCTCTC
query:  AGCCTCTCTC


The current version of Novoalign will report it as 3+CC, earlier versions would report it as 3+C 3+C which was a bit awkward.

Here's an example:
@Streptococcus_suis_1787813_1787978_c/2/5bp S TCTCCATACTTGGCTCAAAAAGGAAATTATTGTGGAGGATT IIIIIIIIIIIIII,+IIIII8ID.IIIIIIIII2IIIAI( U 66 135 >Streptococcus_suis 1787813 F . . . 16+CAAAA
I hope this helps,
Colin


Show posts:
 
Show HelpHelp