Genomics – Easy As Pi: Diy Parallel Cluster Computers In Big Data Genetic Research PDF
This book has been inspired by recent convergence of two sciences, both of which are my life-long passions, both of which for the first time this year are becoming affordable to a an average person: genomics and cluster computers.
The field of genomics has exploded in last few years beyond belief, the original human genome sequencing project, finished in year 2000, took 13 years and $3 billion to complete. Today, the cost of sequencing of the whole genome is approaching $800 (in bulk) and can be done in couple of hours.
The genome research has been concentrated around the prestigious institutions with generous grants that could afford access to newest sequencing technology. The positive outcome of the research sponsored by the public funds is that the results are also public and anyone can have access to genetic sequence information from the Web base databases and FTP sites. With a quick search you can get sequences of many organisms ranging from common bacteria, yeast, corn, wheat, fruit flies, mouse, rats, extinct mammals, monkeys, apes, Neanderthal and many humans. Sequencing the next genome take hours now and there are thousands of them sequenced now, as you read it.
For couple of hundreds of dollars you can determine presence of some interesting sequences using companies like 23andMe, best of all you can download the raw data of your test and start comparing it against other genomes or databases of genes immediately.
At the same time the medical field is learning about hundreds of thousands on proteins and trying to figure out which genetic sequences code for them. Doctors are discovering the genetic association of many diseases and individual drug interactions.
Each human genome is composed of 3.3 billion letters (base-pairs), comparing it against multiple other genomes requires some serious processing power. There are other organisms such as loblolly pine (Pinus taeda) that have 23 billion base pairs in their DNA, that is 7 time more than human! Due to the sheer amount of the data being generated every day there is a vast opportunity for new software tools and new applications of that knowledge.