Leaders of the 1000 Genomes Project announced today that three firms that have pioneered development of new sequencing technologies have joined the international effort to build the most detailed map to date of human genetic variation as a tool for medical research. The new participants are: 454 Life Sciences/Roche, SOLiDApplied Biosystems and Solexa/Illumina Inc..
The three companies each have agreed to sequence the equivalent of 75 billion DNA bases in the program’s pilot phase, generating the equivalent of 25 human genomes each over the coming year, according to GenomeWeb.
Update: There is a nice piece on this over at the Genetic Future blog
The Genome Reference Consortium (GRC) has recently been formed. The goal of this group is as stated on their website “to correct the small number of regions in the reference that are currently misrepresented, to close as many remaining gaps as possible and to produce alternative assemblies of structurally variant loci when necessary“. GenomeWeb has a bit more background.
From the perspective on our company who designs assembly algorithms, this is of course exciting news. Reference assembly is evidently very dependent on the quality of the reference sequence/genome and any improvements on this side will make our lives easier. An open question is how to represent information such as structural variation? As of now, assembly is performed against a single reference sequence which is clearly too simple a format to describe complex structural variation. We will follow the work of the consortium closely, they have their work cut out for them, with all the new structural variation that is continuously being discovered in the human genome.
To follow up on a previous post, researchers funded by the Human Microbiome Project have now launched the Human Oral Microbiome Database (HOMD). HOMD is intended as a service to researchers that are investigating the role of microbes in human health and disease, with particular emphasis on the oral environment. It is anticipated that the database can serve as a model for the gut, skin, and vaginal databases for the Human Microbiome Project. GenomeWeb has more.
An old Danish proverb says that when then the manger is empty the horses bite each other. This idea has now been put to use by Anthony Sinskey’s research group at MIT and is described in a report by the MIT Technology Review. Sinskey’s group had previously produced the genome sequence of the soil-dwelling bacteria known as Rhodococcus fascians. Looking at the genome, they were surprised to find that this organism, not known for its antibiotic-producing powers, harbored a number of genes involved in the metabolism of antibiotic-like compounds. However, none of these genes seemed to be expressed when the bacteria was grown in the lab. To bring out the worst in the bacteria, the group decided to grow the bacteria in competition with a Streptomyces bacteria. After performing selection experiments, one strain of the Rhodococcus bacteria was shown to excrete a novel antibiotic compound, dubbed rhodostreptomycin, which belongs to the same class of antibiotics as streptomycin, a tuberculosis drug.
The inference of the exact molecular mechanisms responsible for the new compound are still under way, but one fascinating preliminary finding is that the selected Rhodococcus strain seem to have assimilated a large chunk of DNA from the competing Streptomyces strain.
This is a fascinating example of the new picture of bacterial genomics that is emerging as a result of improved sequencing technology - for an introduction, I recommend this review by Raskin et al.
Applied Biosystems have released data to the public from the genome sequencing of a Yoruba Nigerian HapMap sample. In their press release, AB claim that the data were generated using only 7 runs of the SOLiD system and at a total sequencing costs of less than 60.000$.
The data covered the genome 12 fold and paired end information provided a physical coverage of a 100 fold, i.e. the coverage stemming from the inserted but not sequenced part of paired end reads. Millions of SNP’s and a large number of structural variations were identified from the data.
As an amusing aside, AB gave these funny facts about the dataset:
- If all 36 billion bases were spread out at 1 millimeter apart, they would extend 36,000 kilometers, or more than 4,000 times the height of Mt. Everest, which at 8,848 meters above sea level, is the highest mountain on Earth.
- If all 36 billion bases were spread along the Great Wall of China at 1 millimeter apart, this would equate to spanning the 5,000 kilometer wall more than 7 times.
- If a person were to proofread the 36 billion bases in this dataset at one letter per second for 24 hours-per-day, it would take 1,200 years to read the entire data set.
- If each base represented one individual in the world population, the dataset would account for more than 5 times the entire world population of 6.8 billion people.
- This dataset, at 36 billion bases of DNA sequence, is equivalent to 360 times all of the 100 million visible stars in the Earth’s galaxy.