An article in the MIT Technology Review recently made me aware that ZS Genetics has entered the contest for the Archon X price for Genomics (100 human genomes in 10 days).
ZS Genetics are developing a unique approach to DNA sequencing based on electron microscopy of labeled DNA strands. The technology can potentially sequence very long and contiguous strands of DNA (kilo bases) and can therefore deliver haplotype phase data and help to resolve complex repeat regions.
In comparison to other novel DNA sequencing methods it is a static method based on an image of a fully synthesized DNA strand whereas other technologies are based on the recording of the DNA synthesis process as it occurs.
The Genome Reference Consortium (GRC) has recently been formed. The goal of this group is as stated on their website “to correct the small number of regions in the reference that are currently misrepresented, to close as many remaining gaps as possible and to produce alternative assemblies of structurally variant loci when necessary“. GenomeWeb has a bit more background.
From the perspective on our company who designs assembly algorithms, this is of course exciting news. Reference assembly is evidently very dependent on the quality of the reference sequence/genome and any improvements on this side will make our lives easier. An open question is how to represent information such as structural variation? As of now, assembly is performed against a single reference sequence which is clearly too simple a format to describe complex structural variation. We will follow the work of the consortium closely, they have their work cut out for them, with all the new structural variation that is continuously being discovered in the human genome.
According to CBC, scientists have found 17 living relatives of a centuries-old “iceman” whose remains were discovered in a melting glacier in northern British Columbia, Canada, nine years ago.
Chief Diane Strand of the Champagne and Ashihik First Nations led a project to search for the young man’s living relatives. She said 241 native people from British Columbia, the Yukon and Alaska gave DNA samples for testing and the results produced 17 positive matches. All of those 17 people, and potentially their families, have the same common female ancestor as Kwaday himself.
Click here to read the full article.
From The Human Genome Organisation’s abstract on the project:
Nitrogen and carbon content in whole bone and collagen-type residue extracted from both bone and muscle indicated good preservation of proteinaceous macromolecules. Restriction enzyme analysis of mitochondrial DNA (mtDNA) determined that the British Columbia frozen remains belong to haplogroup A, one of the Native American mtDNA haplogroups. Data obtained by PCR direct sequencing of the mtDNA control region, and by sequencing the clones from overlapping PCR products were duplicated by an independent laboratory. The comparison of the mtDNA sequence with those of North American, South American, Central American, East Siberian, Greenlander and Northeast Asian populations indicates that the remains share a mtDNA type consistent with different groups of Native Americans.
Click here to read the original 2002 abstract on the “iceman”.
With CLC Genomics Workbench we’re introducing a new assembly pipeline, which is not only sophisticated and highly scalable - but also very fast. In benchmark tests we have assembled half a million 454 reads against the full E.coli reference genome, in around 2 minutes on a Dual-core laptop with 1 gigabyte RAM, which is very fast compared to similar solutions in the market. This speed-up increases when using a computer with more cores and RAM.
A normal computer processor does one calculation at a time at a very high pace, which is ideal for a lot of work. With bioinformatics you often experience the need to do repeat calculations with large data sets, so ideally you want to do calculations in parallel. The software acceleration in CLC Genomics Workbench is done through a high-performance computing technology called SIMD (Single Instruction, Multiple Data), which enables a standard computer processor to execute similar calculations in parallel. So instead of doing one calculation at a time, you can do multiple and thereby greatly speed up the computing process.
When doing reference assembly of Next Generation Sequencing data, you essentially need to compare a vast amount sequence reads to a single reference genome, which makes it highly ideal to use parallel computing techniques. And that’s exactly what we have done with CLC Genomics Workbench to make the Next Generation Sequencing assembly as fast as possible.
That’s our little/fast secret!

At the recent Bio-IT World Conference in Boston, USA, we held a competition where the lucky winner could walk away with a MacBook Air and an extensive collection of CLC bio software, including CLC Genomics Workbench.
The lucky winner was Sébastien Vachenc from Laboratoires Fournier - a Solvay Pharmaceuticals company. He was of course most thrilled when he learned he had won the first prize in the popular competition and looks forward to working with CLC Genomics Workbench, once it’s released!