May 07

The secret behind fast assembly of Next Generation Sequencing data

Tag: Development, TechnologyGoerlitz @ 11:58 am

CLC Genomics WorkbenchWith CLC Genomics Workbench we’re introducing a new assembly pipeline, which is not only sophisticated and highly scalable - but also very fast. In benchmark tests we have assembled half a million 454 reads against the full E.coli reference genome, in around 2 minutes on a Dual-core laptop with 1 gigabyte RAM, which is very fast compared to similar solutions in the market. This speed-up increases when using a computer with more cores and RAM.

A normal computer processor does one calculation at a time at a very high pace, which is ideal for a lot of work. With bioinformatics you often experience the need to do repeat calculations with large data sets, so ideally you want to do calculations in parallel. The software acceleration in CLC Genomics Workbench is done through a high-performance computing technology called SIMD (Single Instruction, Multiple Data), which enables a standard computer processor to execute similar calculations in parallel. So instead of doing one calculation at a time, you can do multiple and thereby greatly speed up the computing process.

When doing reference assembly of Next Generation Sequencing data, you essentially need to compare a vast amount sequence reads to a single reference genome, which makes it highly ideal to use parallel computing techniques. And that’s exactly what we have done with CLC Genomics Workbench to make the Next Generation Sequencing assembly as fast as possible.

That’s our little/fast secret!

Leave a Reply

You must be logged in to post a comment.