Does Solexa have problems with amplifying A/T rich regions? I just read a really interesting paper by Hillier et al, from Nature Methods which claims that this might be the case.
The paper is entitled “Whole-genome sequencing and variant discovery in C. elegans” and reports the use of Solexa technology to re-sequence two C.elegans specimens for variant discovery. The paper demonstrates the use of the Solexa technology for re-assembly of the C.elegans genome, especially when paired-end information is used.
However, it points to a general lack of coverage in A/T rich regions (see figure 2 of the supplementary material) which leaves a number of zero size gaps in the assembly - places where reads sit shoulder to shoulder but simply do not overlap. Having found these problematic A/T rich regions, the authors went back and took a look across the genome, where they found a general correlation between A/T content and read coverage. This correlation was stronger when examining a 200 bp window than when examining a 32 bp window. 200 bp corresponds to the size of the amplicons that are amplified during the cluster generation step prior to sequencing and 32 bp corresponds to the number of cycles in the actual sequencing by synthesis procedure. This finding made Hillier et al. conclude that failure to amplify A/T rich regions during cluster generation is the cause of the low coverage (other reasons for the bias such as hairpin formation were also explored but discarded).
Unfortunately the authors did not pursue a chemical explanation for the phenomenon and did not investigate other Solexa datasets for a similar trend. Therefore it is premature to say whether this is a general phenomenon of the Solexa technology but it is definitely something that warrants the attention of people like us that are designing assembly- and variant detection algorithms.
Our collaborators at Beijing Genomics Institute, Shenzen recently announced the launch of the giant panda genome project. Through the use of next generation sequencing technology, the aim is to complete the panda genome within only six months.
The researchers involved will use the results to answer a number of questions regarding the animals biology. These include, the exact phylogenetic position of the panda and the genetics underlying the pandas extraordinary metabolism. Furthermore, results will be used in the analysis of panda population genetics and conservation biology.
I picked this up via Genome Tehcnology. According to an article at Blomberg.com Church has launched an ambitious project to sequence the coding regions of 100.000 humans, a number that may even increase to a million.
The plan is to tie the genomic information to phenotypic information and health records of the sequenced individuals to create a unique data resource from which novel links between genetic variation and disease can be learned.
Google have been one of the first companies to support the project and apparently have plans of making their Google Health project a front-end to the collected data.
A new study in PNAS by Sugarbaker et al. describes deep sequencing of tumor cDNA using 454 sequencing technology. In the four examined patients, 15 nonsynonymous mutations were discovered: 7 were point mutations, 3 were deletions, 4 were exclusively expressed as a consequence of imputed epigenetic silencing, and 1 was putatively expressed as a consequence of RNA editing. Interestingly, each patient had a different mutation profile, and no mutated gene was previously implicated in the examined cancer type.
The Wall Street Journal has a piece on the study with some interesting perspecives.
In Sequence has a post on a new ordered array technology developed by Xiaohua Huang’s group at The University of California, San Diego.
The ordered array approach uses a magnet to direct the assembly of DNA particles into a grid-like pattern on a microfluidic chip and offers a promising alternative for a number of high-throughput sequencing platforms that currently use random arrays of DNA molecules.
Ordered arrays could alleviate the problems faced by random arrays, such as low density and low imaging efficiency, and a demand for complex image analysis to recognize the shape, location, and intensity of signals on the chip. Since imaging is a severe bottleneck in most next generation sequencing technologies improvements in this area could dramatically increase throughput.