Cracking the Code of Life at Light Speed
The biological code of mankind—three billion pairs of chemical “bases,” twisted clockwise into DNA’s iconic double-helix—was declared fully transcribed, at long last, 15 years ago. On April 14, 2003, the leaders of the Human Genome Project, an effort backed by the U.S. government, proclaimed their sequence “essentially complete.” Including the preliminary work, the task had taken 13 years and cost American taxpayers $2.7 billion.
These days, thanks to technical leaps that put Moore’s Law to shame, scientists are sequencing practically everything in sight. Consider the mystery posed by a geriatric bat. “Usually with animals,” says Mike Hunkapiller, “the bigger they are, the longer they live. There’s a bat that’s that big”—his thumb and forefinger curl into a circle about the size of a quarter—“that lives 50 years.” Studying the bat’s DNA to suss out its secret could someday, perhaps, prolong human life. “There’s all kinds of reasons why a lot of these animal models are not just for basic research,” he says. “They’re to understand what you learn from them about us.”
Mr. Hunkapiller, age 67, has been in the forefront of genomics for a long time. As a postdoc at Caltech in the 1980s, he helped invent the first automated DNA-sequencing machine. In the 1990s he led a company whose sequencers drove not just the public Human Genome Project but also its privately funded competitor. Today he is CEO of Pacific Biosciences, known as PacBio, which aims to bring heavy sequencing artillery to the lab-coated grunts in the scientific trenches.
As the price of reading DNA has plummeted, large-scale sequencing projects have geared up, pledging to get the genomes of 5,000 species of arthropods, or 10,000 kinds of birds, or one of every vertebrate on Earth. For three years running, PacBio has held a public contest to find the “World’s Most Interesting Genome,” which is then sequenced on the company’s dime. Last year’s winner, a feral Australian dog called the desert dingo, defeated a Malaysian pit viper, a “solar powered” sea slug, and the bombardier beetle, which squirts its enemies with a caustic, boiling-hot liquid.
“There’s a program in China, from one of our customers, doing 100 ants,” Mr. Hunkapiller says. “We talked about that at one of our quarterly conference calls, and people were saying, ‘Well, who cares about 100 ants?’ ” In answer, he cites two groups that care a lot: farmers in Australia, losing millions of dollars in crops, and residents of the American South, trying to extinguish infestations of fire ants. Comparing the genomes of 100 ants may reveal differences that humans can exploit. The same holds true for other pests, such as the mosquitoes that spread the Zika virus. “You want to know,” Mr. Hunkapiller says, “what targets can I go after to kill that thing and control it, without causing damage to other organisms.”
PacBio’s latest model of sequencer is about the size and shape of a standard kitchen refrigerator, albeit a $350,000 one. To imagine how the machine works, start with a hole about 1/1,000th the width of a human hair. At the bottom is an enzyme that in nature helps copy DNA. Here it does the same job, but in a way that allows researchers to eavesdrop. The enzyme is fed a loop of DNA to duplicate, along with free-floating bases—the A’s, C’s, G’s and T’s that make up the genetic code—to use as raw material. Unlike in nature, however, the four flavors of bases have each been tagged with a different-colored fluorescent dye.
Now the real trick: A laser is aimed at the hole, but the beam’s wavelength makes it too big to fit through. If this idea seems strange, think of the protective window-grate on your microwave oven. The principle is the same: The microwaves are too wide to escape, which is why the oven nukes that sad Lean Cuisine and not your hungry face.
In the PacBio machine, the laser light peeks into the hole just enough to put a spotlight on that industrious little enzyme. Whenever the enzyme grabs a fluorescent base and incorporates it into the DNA strand, it glows with color. The machine records the light, blinking two or three times a second, and converts that data into the DNA sequence.
Serious bandwidth comes from putting the process into massive parallel: A PacBio chip 1.25 inches square has one million holes, flashing with light. The next version, due out in 2019, is expected to have eight million holes. At that point, the company says its cost in supplies and chemicals to create a finished human genome, from scratch, should drop from roughly $15,000 to something like $1,500.
But the big competitive advantage touted by PacBio is that its machine spits out long DNA sequences. The average read has climbed to 15,000 base pairs, with some pushing up toward 100,000 before the enzyme peters out. For comparison, during the Human Genome Project, the average fragment was on the order of 500 base pairs.
That difference matters particularly when assembling a full genome for the first time, without the aid of a reference. To put the length of the human genome in perspective, if a cell’s DNA were unraveled from its 23 pairs of chromosomes, it would stretch about 6 feet. Once it’s blasted at random into minuscule segments that are then sequenced, a computer has to figure out, piece by piece, how to fit them together again.
An analogy would be to try reconstructing “War and Peace” from a couple of dozen shredded books. Yet the chore is far more tedious even than that, since the human genome reads, for three billion letters, like this: “atgtctggctctgttccccagactggagtgcggcgac . . .” The supercomputer that assembled one of the initial human sequences examined 26 million fragments and made 500 quintillion—that’s 500 million trillion—base-to-base comparisons.
Having puzzle pieces now that are 30 times as big certainly helps. In addition, reading longer fragments allows scientists to tackle very repetitive genomes that otherwise would be difficult to assemble. The sequence for bread wheat, at 15 billion base pairs, was finished last year.
Then in January researchers reported they had cracked the biggest genome yet, the 32 billion pairs belonging to the axolotl, also called the Mexican salamander. Mr. Hunkapiller says there’s no grand theory to explain why some genomes are so much longer than others, but in amphibians the repetition may contribute to their ability to regenerate. “An axolotl is a classic example: You can cut a leg off and it grows back. You can’t do that with your leg, right?” he says. “They tend to have redundancy in their genome, and there’s some thought that that has something to do with the ability to do that.”
The same difficulty of highly repetitive DNA also exists in parts of the human genome. That’s why the 2003 announcement of its “effective completion” mentioned about 400 gaps “that cannot be reliably sequenced with current technology.” Some of these have since been filled; the official human reference sequence is up to “build 38.” Still, blank spots remain at the ends of chromosomes (called telomeres), as well as at the axis points where the two arms of a chromosome meet (called centromeres). Could those uncharted areas have hidden functions, particularly ones that implicate disease? At this stage, we just don’t know, Mr. Hunkapiller says. “Those repeats are not all identical. There are variations,” he says. “They’re clearly important. I mean, telomeres are involved in cellular aging. You can only go through so many generations of most cells because of that.”
When the human genome was first decoded, the hopes for a quick medical payoff were high. One prediction was that by 2010 every newborn baby would come home from the hospital with its DNA sequence on a DVD. Mr. Hunkapiller blames the hype on competitive jockeying as two teams raced to finish the human sequence first.
The government-funded effort, led by Francis Collins (now director of the National Institutes of Health), originally planned to complete its work by 2005. But then in 1998 Mr. Hunkapiller’s parent company, impressed by the speed of his new sequencers, decided to take on the job itself. The result was a new firm called Celera, led by J. Craig Venter. It planned to beat the government operation by four full years—and to release its data in tranches only after giving its paying subscribers, like pharmaceutical companies, a good look.
Soon the two sides were trading barbs. Mr. Collins said the Celera genome would be “the CliffsNotes or the Mad Magazine version.” Mr. Venter said the public project was “putting good money after bad.” In the end, the two men agreed to—or were forced into—a truce. At a White House ceremony in 2000, hosted by Bill Clinton and Tony Blair, they appeared jointly to announce that drafts of both sequences had been completed. Mr. Hunkapiller, though invited, was stuck at home with chickenpox.
An advantage of the public-private race was that it hurried along the sequencing. A disadvantage, in Mr. Hunkapiller’s telling, was the escalating salesmanship. “Saying that you’d learn enough about the genome to cure all diseases was nuts,” he says. Did the hyperbole really get that far? “Oh, yeah, Craig was pretty close—and so was Francis Collins, to be fair.”
Fifteen years later there’s a lot of medicine yet to be wrung out of genomics. A startling fact is that when a patient is suspected of having an unknown genetic illness, the “solve rate” is 50% or less. Mr. Hunkapiller says PacBio’s machines can help by detecting what are called “structural variants,” changes to DNA that may involve hundreds or even thousands of base pairs, making them difficult to pick up with earlier technology. Last year a group at Stanford was able to diagnose a young man whose heart had repeatedly grown benign tumors. One of his genes on Chromosome 17 was missing 2,200 base pairs.
For multifactor diseases, it appears to be a matter of drawing signal from the noise. Two years ago it was front-page news when researchers found a gene variant that significantly increased a person’s risk of schizophrenia—from 1% to 1.25%, an absolute change too small to be very meaningful. “In the case of genetic diversity, because there’s so much of it, you need statistical power, which is large numbers of samples,” Mr. Hunkapiller says. “But you have to know how to sort things out so that you’re not comparing apples to oranges to pears to axolotl.” The NIH will soon launch a program to sequence DNA from one million people living in the U.S., and Mr. Hunkapiller thinks that data could help, provided it includes decent patient histories and other ancillary material.
Cancer is another opportunity for sequencing, given how scrambled its DNA can be. “If you look at a cancer cell line, you wonder: How is this thing alive?” Mr. Hunkapiller says, “Because you have so much jumbling of bits of chromosomes here and there, and big chunks lost, and big chunks—you’ve got dozens of copies of that region.” Some cancer variants can be targeted by specific drugs, and Mr. Hunkapiller says others can help appraise prognosis. “Is this going to be a really bad prostate cancer, or is it going to be relatively benign?” he says. “More and more, sequencing is being done to figure that out.”
An area of research to watch, Mr. Hunkapiller says, is the idea of using blood tests to detect cancer early by spotting infinitesimal amounts of DNA released when tumor cells die. “It’s not going to be trivial to do that, just because you’re looking at a tiny needle in a big haystack,” he says. “But there’s probably cases where that’s going to help, because there are some cancers that are just—you never find them early enough. The symptoms aren’t there and you can’t really go in and biopsy. Pancreatic cancer is an example of that. By the time you find out about it, unless it’s a particular type, you’re in trouble.”
And as for those newborn babies? Mr. Hunkapiller’s freshest grandson, 3-month-old Asher, received elective genetic testing for 193 different conditions. “Yeah, well, you’ve got two molecular biologists as parents in that case,” he says with a laugh. “Fortunately, they didn’t learn anything that they were scared of from that.” Scanning a couple of hundred genes, granted, isn’t quite the promised sequence on a DVD. But eventually, Mr. Hunkapiller insists, new moms and dads who want the whole 9 yards—or, rather, the whole 2 yards and three billion letters—will have that option, too. “I wouldn’t say it’s going to be in the next two years,” he says. “Twenty? Probably.”
By Kyle Peterson
Source: the wall street journal