We recently sequenced a substrain of C57BL/6J, the inbred mouse strain currently used as the mouse reference genome. We are currently using the NovoalignCSMPI to align the reads.
I am a newbie in NGS analysis. I have experience working with MAQ and BWA where you can mention maximum gaps and mismatches allowed at the seed or whole read level. I am not sure how it works for the NovolalignCS. I would highly appreciate if you could answer the following questions:
Some Introduction: We have 21 lanes of data ( 21 CSfasta and quality files worth 6Ox coverage). This data was generated from the same mate-pair library from a single male mouse liver. We have a cluster system with 12 nodes excluding
headmaster and each node has 16 processors. Each node has 64 GB of RAM or 4GB of RAM per processor. Each node has around 500 gb of scratch. We have installed MPICH2 on every node.
My first question is related to how to use NovoalignCS on our cluster and minimize the running time:
1) Normally I prefer using 1 node for 1 lane of data and split data across 16 processor in each node. This way i can align 12 lanes (12 nodes) of data at a time.
Usage: mpiexec -f hosts.txt -np novoalignMPI -d -f >report.novo 2>run.log