Loading...
 

Support Help

Forums > Support> support for unaligned BAM

support for unaligned BAM

Hi,

Does novoalign support unaligned BAM file as input.

We are interested in novoalign. However, we will implement unaligned BAM file to store unaligned reads. Since compared to fastq format, BAM format can store meta information (including read group, platform etc) of the reads and the size of it is smaller than fastq file. So it is important that the aligner can treat with this kind of format. As we know, BWA can using unaligned BAM as input. Do you have any plan to implement this into novoalign?

thanks

Ying


Hi,

We haven't planned it and yours is the first request. I'll have a look at it, extracting the reads and aligning them is no issue, most of the work will be carrying over any bam header info and the RG for each pair.

Colin


I'll second this request--this is a feature we would be interested in.

Thanks,

Kevin


Hi,

We've just posted a new release with support for unaligned BAM.

novoalign -d ... -f unaligned.bam -F BAMSE or BAMPE

You need to use the -F option, there's no automatic detection of bam filetype. For paired end the pairs need to be adjacent in the file, and first mate followed by second mate. It definitely won't work with a bam sorted on coordinates. Any alignment records flagged as non-primary are skipped, this shouldn't be case on unaligned BAM files but could happen if using an aligned BAM as input.

Supports @RG record in the bamfile, the ID from the first @RG record is applied to all reads. An @RG on the -o SAM option will override the @RG in the BAM file. Insert size is taken from -i option, not from @RG header.

I think that's all, let me know if you have any problems or ideas to improve it.

Best, Colin


Hi Colin,

Thanks for quick improvement! I have downloaded the new version of novoalign and tried this function. I have some questions about this.

1) I used picard "FastqToSam" to generate unaligned BAM file. For paired end data, it only generates one unaligned BAM. How do I do for option "-f" in novoalign, shall I do "novoalign -d ... -f unaligned.bam unaligned.bam -F BAMPE"?

The command I used to call novoalign on unaligned bam file is:
novoalign -d novoalignRef_V2.07.13 -f unalignedBam.bam -F BAMPE -o SAM -r R -i PE 250.50

And I got manual page showing up. I guess that is because I only provide one unaligned bam file but define format as BAMPE. Is it right?

2) What's the default quality score format for "BAMPE/BAMSE"? Is it STDFQ or ILM1.8? The data I used is ILM1.8.
3) Just to make sure: my unaligned bam file is sorted on name. The novoalign can deal with this kind of data, right?

thanks,

Ying

> Hi,
>
> Does novoalign support unaligned BAM file as input.
>
> We are interested in novoalign. However, we will implement unaligned BAM file to store unaligned reads. Since compared to fastq format, BAM format can store meta information (including read group, platform etc) of the reads and the size of it is smaller than fastq file. So it is important that the aligner can treat with this kind of format. As we know, BWA can using unaligned BAM as input. Do you have any plan to implement this into novoalign?
>
> thanks
>
> Ying


Hi Ying,

1) Your command should work except for the period in 250.50

The bam file is only specified once.

2) The default is STDFQ, I didn't check what Picard FastqToSam does but SAM spec requires quality to be in Sanger format so I assume Picard will convert it.

3) Sorted on name is OK, just don't sort on coordinate. There's also no need to sort on name and FastqToSam will be much faster if you choose unsorted.

Colin


Hi Ying,

The delimiter between mean and the standard deviation should be either a comma or a space. So -i 250,50 or -i 250 50 will do.

Cheers, Colin


Hi Colin,

Thanks. It works now. There is a small bug. novoalign changed sequence ID, it cut the first letters of the ID (From PCUS-319... to CUS-319...).

Here is the command I used:
novoalign -d novoalignRef_V2.07.13 -f unalignedBam.bam -F BAMPE -o SAM -r R -i PE 250,50 > alignedBam_novo.sam

The IDs in unalignedBam.bam:

PCUS-319-EAS487_0001:7:1:1000:1007#0 77 * 0 0 * * 0 0 TCTACGCCAAGGAGATTCCTGAGTACCGGAAGATCGTGCAGCGCTACTACAAGCAGATCCAGGACATGAAGCCGCT BB@CBBBB?ABAAA=BB@BA9AB>@@=B?>@@A


Woops, missed that. Code in SAM format reporter was removing the first character of headers. This normally removes the @ from fastq headers.

Fixed for SAM format in next release.

In Native report format we will prepend header with an @ to be compatible with fastq input.


Hi,

I have a BAM file that contains aligned and unaligned reads obtained from with the command
samtools view -h -F 2 -b bwa-output.bam > novoalign-input.bam
and that I sorted by queryname afterwards.

My question: Is it safe to assume that novoalign considers the strand flags (0x10 and 0x20) of an aligned BAM file which is used as input? So will it use the original sequence if a read appears with its reverse complement sequence?

Christof


Hi Christof,

Yes, that's a valid assumption. If flag 0x1 is set the read is reverse complemented.

We also use the QUAL field for base qualities and do not check for an OQ:Z: tag so you get recalibrated qualities if you've done calibration.

Colin


Show posts:
 
Show HelpHelp