Mar 11

Evaluating the Illumina/Solexa Genome Analyzer for whole genome re-sequencing

Tag: Research, Technologyrforsberg @ 13:54

Does Solexa have problems with amplifying A/T rich regions? I just read a really interesting paper by Hillier et al, from Nature Methods which claims that this might be the case.

The paper is entitled “Whole-genome sequencing and variant discovery in C. elegans” and reports the use of Solexa technology to re-sequence two C.elegans specimens for variant discovery. The paper demonstrates the use of the Solexa technology for re-assembly of the C.elegans genome, especially when paired-end information is used.

However, it points to a general lack of coverage in A/T rich regions (see figure 2 of the supplementary material) which leaves a number of zero size gaps in the assembly - places where reads sit shoulder to shoulder but simply do not overlap. Having found these problematic A/T rich regions, the authors went back and took a look across the genome, where they found a general correlation between A/T content and read coverage. This correlation was stronger when examining a 200 bp window than when examining a 32 bp window. 200 bp corresponds to the size of the amplicons that are amplified during the cluster generation step prior to sequencing and 32 bp corresponds to the number of cycles in the actual sequencing by synthesis procedure. This finding made Hillier et al. conclude that failure to amplify A/T rich regions during cluster generation is the cause of the low coverage (other reasons for the bias such as hairpin formation were also explored but discarded).

Unfortunately the authors did not pursue a chemical explanation for the phenomenon and did not investigate other Solexa datasets for a similar trend. Therefore it is premature to say whether this is a general phenomenon of the Solexa technology but it is definitely something that warrants the attention of people like us that are designing assembly- and variant detection algorithms.

6 Responses to “Evaluating the Illumina/Solexa Genome Analyzer for whole genome re-sequencing”

  1. inhumataq says:

    We are seeing the same thing with some bacterial genomes…

    The problem isn’t due to different prep methods, and it isn’t machine specific.

  2. rforsberg says:

    There is a comment on this over at the Evolgen blog http://scienceblogs.com/evolgen/2008/03/not_all_nextgen_sequencing_tec.php

  3. fadista says:

    Check this website for papers dealing with Next-Gen Sequencing: http://www.genomeweb.com/newspics/InSequence_Papers_Feb08.htm
    You probably need to register to get access but it´s for free.

  4. inhumataq says:

    to expand on the previous post:

    We have resequenced a previously sequenced bacterial genome in order to figure out how to make de novo sequencing work with “next” generation” sequencing technologies. We noticed that parts of the genome weren’t covered by this resequencing (shorter gaps though, not whole genes, and we know that they are there) but I don’t know if these are exactly AT rich regions. This first sequencing pass was 20x and so we tried adding in another 20x coverage…done on a different machine by different people using different DNA preps…and still we couldn’t capture the missing pieces.

  5. rforsberg says:

    inhumataq, thanks for sharing your observations, it would be really great if you could do a simple plot of the read coverage as a function of the A/T content of the reference genome.

  6. john.bull says:

    In fact this is a sample prep issue. different libraries can show a biased GC selection, and low copexity libraries can deplete certain templates resulting in duplicates. You have to ensure you have good representation in your libraries, and remove the thermal gel melting step (which is where the AT/GC selection occurs). There should be revised protocols bouncing around.

Leave a Reply

You must be logged in to post a comment.