Running Novoalign on HPCs
Some HPCs such as SGI Altix UV come with a very large number of processors and a very large global shared memory. These systems can present performance problems when running Novoalign because of the fact that normally only one copy of the index is loaded into memory and shared between all Novoalign threads. On a typical HPC and running Novoalign with >16 Cores it is likely that the memory interconnect will become overloaded and you may get a drop in performance as extra threads are added.
On these servers, either run Novoalign with -c8, -c12, -c16 or similar such that Novoalign and the Index will reside on a single physical CPU/Memory subsystem.
You can also run NovoalignMPI to take advantage of the extra CPUs.
If you have, say, a 96 Core system that is comprised of 8 boards each with 2 6-core processors then you can run NovoalignMPI with 8 slaves and 12 threads per slave. This will optimise the performance and reduce the load on the memory interconnect.
For this to be effective we also have to disable memory mapping of the index file so that each slave loads its own copy of the index, option -mmapoff.
i.e.
NSLAVES=8 mpiexec -np ${{NSLAVES + 1}} -f hostsfile novoalignMPI -c 12 -mmapoff -d ....