Can hypothesis‐driven research survive the sequence‐data deluge?

This satirical Opinion questions funding trends that perpetuate massive infrastructures the value of which to science has become increasingly less obvious over time. These funding trends are especially dubious in the case of Genome Centers, which (perhaps unintentionally) discourage hypothesis-driven research in favour of mindless data generation and speculative hand-waving about its relevance.

The crucial question, Whatney said in a telephone interview, is to determine whether the centaur produced in the Zucchini Lab qualifies as a model organism or as a human subject.
NHGRI awarded $1.5 million to ethicists at Case Western Reserve University 'to enhance guidelines for ethical human-animal chimera research'. 1 Whatney added that Zucchini et al., by putting the human posterior aft, created an animal that may produce less methane than a conventional centaur. 'All project proposals will be required to state their impact on climate change'.
Hal Garmisch, Ph. D., L.M.N.O.P., a colleague of Dr. Whatney, joined the call while travelling to a predatory conference in Kuala Lumpur. 'Many of the most important medical discoveries result from accidents, for example, when a lab assistant leaves the window open and a mold wafts in and kills bacteria in a petri dish'. Garmisch agreed that the new design of the half-horse half-man, although a surprise, could be beneficial. 'Surprise is often a better teacher than science in medicine'.
White Papers are now in preparation on the feasibility of creating a Minotaur (part human and part bull); an onocentaur (part human, part donkey), a harpy (part woman, part bird); a satyr (part human, part goat); and a mermaid. NHGRI anticipates funding the pushmi-pullyu (pronounced 'push-me-pull-you'), a cross between a unicorn and a gazelle with two heads (one of each) at opposite ends of its body. This genome could be used to model schizophrenia, capgras syndrome and other maladies.
Animals whose genomes have been sequenced point to the pushmi-pullyu as a promising model organism for psychiatric research. Dr. John Dolittle, Puddleby-on-the-Marsh, is at the right.
According to its website, NHGRI was created in 1989 to manage the role the National Institutes of Health (NIH) would play in the Human Genome Project (HGP). 2 The HGP, which began in 1990 at a cost of $5 billion (in inflation-adjusted dollars), was expected to take 15 years but was finished sooner and under-budget in part because microbiologist Craig Venter sequenced parts of his own genome for lunch money and carfare. 3 The HGP was declared complete in April 2003.
By sequencing parts of his own genome, Venter had circumvented the careful protocols for donor selection, anonymization, de-identification and privacy which regulate publicly funded research. He had not properly secured his informed consent as required under the Common Rule which protects human subjects of research; he had violated his own privacy. Since his project did not receive federal funds, however, he could not be prosecuted for this offense.
Starting in 2011, ethicists and others worked on a new Common Rule to prevent this sort of abuse. 4 To ensure the privacy of genomic data against all contingencies, however abstruse and far-fetched, the proposed new Rule would apply to nearly all genomic research, including projects that were not federally funded, making ethics often more challenging than science. 5 The completion of the HGP early and under-budget left NHGRI with another grand challenge: an embarras de richesse. The last thing a federal agency wants to do is close out a fiscal year with a surplus because it may be dinged for the difference the following year.
'There remains a compelling need to sequence the genomes of many more organisms', NHGRI announced at a news release dated 7 November 2003 and budgeted $459 million to establish and support through 2006 the Large-Scale Sequencing Research Network. 6 This moved the goalposts and gave the usual PIs another first-and-ten.
In November 2003, Hamish Farfel, Ph. D., A.C.G.T., who was interviewed for this article, operated a food truck on the NIH campus after his postdoc expired. He worried that his best-selling 'all-beef' bratwurst might be sequenced and exposed. He chatted up his customers. Why not sequence 'man's best friend'? According to Farfel, the lunchtime crowd loved the idea. Dogs are very popular. NHGRI got about $30 million out the door to sequence the genome of a boxer named 'Tasha'. 7 Venter had already sequenced his pet poodle Shadow, but only for a small fraction of the cost. 8 A dog is prepared for sequencing at the NHGRI Dog Genome Project.
In 2004, NHGRI financed the sequencing of the honey bee, zebra fish, chicken, rat, cow and other organisms. In August 2004, it announced that its Large-Scale Sequencing Research Network would tackle the genomes of 18 new organisms, including the 13-lined ground squirrel, the megabat, the microbat, the tree shrew, the bushbaby, the hyrax, the Lorax, the pangolin, the sloth and the Northern white-cheeked gibbon. 9 Naomi Bogglehead, Ph.D., A.S.A.P., who agreed to be interviewed under conditions of de-identification, purchased the food truck from Farfel in 2010 after rotating through several postdoc, teaching assistant, adjunct, affiliate, laboratory assistant and other academic positions at major universities. She explained that when large-scale sequencing required hundreds of millions of dollars, NHGRI supported it. She did not know where else the money could have come from. But as sequencing costs fell, it became less and less obvious where that kind of money could go.
Bogglehead saw sequencing costs plummet and surmised that her career in genome science was doomed. 'I invested in a food truck in order to meet my student debt. It was the right decision. Many of my fellow postdocs worked Illumina HiSeq 2500s and went broke because sequencing had become too cheap to meter'.
'The less research costs, the less it is worth pursuing', Farfel explained. 'What's the "overhead" on a $200 genome?' Farfel recalled that sequencing costs in 2004 were high enough that NHGRI was able to drop $30 million on Tasha. 10,11 Because costs fell, the Broad Institute received only half as much ($15 million) in 2007 to sequence the horse. 12 Agencourt Bioscience Corp., Beverly, Mass., got only $10 million to sequence the domestic cat. Sequencing costs continued to plummet. It costs about $1000 or less to sequence an animal genome today. 13 Bogglehead chatted with the NHGRI lunch crowd. She learned that even before the Zucchini centaur, NHGRI faced two new grand challenges. Several of her customers had read an article in Discover Magazine (April 2, 2010) that lamented what was called the 'Yet-Another-Genome Syndrome' 14 The author wrote, 'There's a certain kind of headline I have become sick of: "Scientists Have Sequenced the Genome of Species X!"'. A compelling need to sequence the genomes of many more organisms no longer existed, if it ever had.
The writer grumbled that his e-mail inbox was deluged daily with press releases of the following form: 'Scientists have sequenced the genome of species X. Their research, published today in the Journal of Terribly Important Studies, will lead to new insights about this important species. Maybe it will even cure cancer or eliminate world hunger!' Bogglehead was not surprised that genomic data had become more expensive to store than to produce. Researchers estimated that the genomic data with which NHGRI must now cope exceeds the exabytes managed by Facebook, Google, Twitter or YouTube. 15 There seems to be no legal way to get rid of sequence data, moreover, if they are produced with public funds.
A famous genome scientist is shown on his yacht with sequenced animals watching the Cloud where genomic data are kept.
Bogglehead said the lunch crowd described sequence data as 'exhaust' since they are generated because they can be and not to test hypotheses. As the exhaust expands, it may spew a million correlations, but no inking of causality. 'It's not the classic problem of searching for a needle in a haystack by adding more hay. It's that all the needles turn into hay', Bogglehead said. 16 Farfel believes there is still a bright future for genome sequencing. 'The company that sequences your genome for free and links you to all the information available about your variants will create an advertising empire bigger than Facebook or Google'.
The lunch crowd was apprehensive for another reason. They wanted to 'de-visit' the hype that had brought the big bucks to NHGRI. The Large-Scale Sequencing Research Network was up for another renewal in 2011. The grand challenge was to justify this. In an unsavoury survey of the lunch line, most respondents agreed with the statement, 'It is economically rational to shutter the Large-Scale Sequencing establishment as a sunk cost'.
Laboratory technicians who gathered over lunch were concerned not just about the data deluge but also about the lack of progresswhat is called 'translation'from byte to bench to bedside. Those who bought bratwurst or salami sandwiches generally thought that medical progress goes the other wayfrom surprises, successes and failures in a clinical setting to laboratory analyses and then to questions or hypotheses that genomic science might answer. 17 Science follows medicine. Those who thought the 'translation' goes from science to medicine rather than the other way around generally ordered egg salad or corned beef with mayonnaise.
NHGRI's program director for the Large-Scale program said in a 2010 interview, 'We have long understood that the ultimate aim is for genomic information to be relevant in the clinic and to individuals'. 18 Good to know. The Large-Scale Sequencing Research Network needed about $100 million for each of the next several years to keep on sequencing. The program director said that 'the main point is that we will continue our successful Large-Scale Sequencing Program'. Seq and you shall fund.
The 10 000 monkey genome project. But he also conceded 'that it is less and less possible to divorce the production of sequence from its analysis and what it is to be used for'. According to many observers, however, this divorce remains altogether possible, even irreconcilable. In a 2018 press release, NIH stated that it had 'to address a major barrier to incorporating genomic medicine into healthcare, which is a lack of evidence about the relationship between gene variants and diseases'. 19 Why this lack of evidence? 'Genomics is a way to do science, not medicine', said Harold Varmus, then president of the Memorial Sloan-Kettering Cancer Center in New York. 20 In 2011, NHGRI announced a new $416 million, fouryear plan, renamed the Genome Sequencing Program, with roughly the same resources as its predecessor, the Large-Scale Sequencing Program, and roughly the same grantees. 'With the investments made over the last decade, and the new ones that we are now announcing, a path towards the realization of genomic medicine can be envisioned', the NHGRI Director said in a statement in December of that year. 21 A path to the realization of genomic medicine envisioned.
The path towards the realization of genomic medicine had already taken a surprising turn. Garmisch, when he returned from Kuala Lumpur after he was unable to find the conference, received a joyful welcome at the food truck. He had prepared a conference paper on 23andme and other direct-to-consumer (DTC) genotyping firms. 23andme genotyped, that is sequenced the genetic variants, of customers who mailed in saliva samples and in return received updates about their levels of risk for 254 heath conditions. 22 Time Magazine in 2008 declared 'The Retail DNA Test' to be the 'Best Invention of the Year'. 23 'NHGRI and 23andme compete in the same business', Garmisch explained, 'which is to make the connection between genomics and healthcare'. A woman in a laboratory coat, while waiting for a bratwurst, opined, 'Well, 23andme does not pay "indirect costs" to universities and research centers. It competes on its own dime. This is what made Venter so hated and suspect'.
'No "indirect costs"that's unethical!' laughed a colleague. 'Next thing you know, they'll produce something that benefits people and makes a profit. Immoral!' 'The profit motive may do better than the funding motive', Garmisch said.
While NHGRI re-upped the Large-Scale Sequencing Program, the Food and Drug Administration (FDA) warned 23andme that it made unsubstantiated claims because its Personal Genome Service (PGS) linked the genotypes of individuals to health risks and health care. In 2013, FDA ordered 23andme to stop marketing the PGS. 24 Human genome sequencing, which in 2003 was described as a revolution in medicine and, in 2008, was named the Invention of the Year, in 2013 produced a cease-and-desist order. If a path towards the realization of genomic medicine could be envisioned, if evidence about the relationship between gene variants and disease could be found, FDA had not seen any sign of it. After they dissolved the tape, they purified, centrifuged and sequenced the biological sample. Later they performed the same experiment on a trip between Portland, Maine, and Moncton, New Brunswick (Trip B).
These researchers identified genome sequences from thousands of plants, insects, and bacteria and calibrated the differences between the windscreen splatter on the two trips.
People waiting for sandwiches quickly saw the funding implications. How did the two splatter communities (Trip A vs. Trip B) differ in their structure and function? If the experiment were repeated with different results, would these represent changes in the Trip A and Trip B splatter communities or entirely different communities? In other words, under what changes in its constituent microbiota does a windscreen splatter community remain the same microbiome and when does it collapse or change into a different one?
'The genius of terms like "community", "ecosystem", "biome" and "network" lies in their inscrutability', a customer commented. 'The presumptive need to understand these kinds of concepts, with all their opacity and unintelligibility, will keep us microbiologists in business for a hundred years in defining our conceptsjust like ecologists'.
'If we could base genomic science on concepts this obscure and confused, we'd be looking at all kinds of research opportunities to try to find what they mean', another enthused.

NIH from the perspective of its fundees
The first replied, 'This bratwurst is full of bacteria. When they get in my stomach, they become part of me, part of who I am, a second genome. You are what you eat'.
'That's a billion-dollar idea!' exclaimed a third customer. 'If windscreen splatter forms a functional and identifiable system or community, what about the microbiota in one's nostril or between one's toes? There's no money in calling it "toe jam". What we call "snot" is actually the human microbial nostril splatter community'. It seemed that every microbiologist looking for HMP support got on the 10:1 bandwagon. Numbers that high worked. A 2015 report released by the National Science and Technology Council (NSTC) noted that annual Federal investment into microbiome research tripled over FY 2012-2014, with more than $922 million invested during this 3-year period. 27 An article in the Boston Globe reported that during the leadup to the HMP, a microbiologist 'explained that "we are ten parts microbe, and one-part human. We are clearly outnumbered". The figure has been repeated by researchers in scientific journals, books, and TED talks. Scientists joke that it is practically illegal not to cite some version of the figure in a presentation on the human microbiome'. 28 In 2016, after journalists and others not looking for grants ridiculed boilerplate pronouncements about how 'we' are mostly microbial, scientific journals started accepting articles which argued that that the 10:1 ratio was overstated by an order of magnitude. 29 If we are not outnumbered by the microbial splatter on and in us, what to do with the humongous data exhaust the HMP had created? Different algorithms which pester the data in different ways will suggest thousands of associations and correlations between microbial data and diseases. The folks waiting for sandwiches, however, were not impressed.
Naomi's mother, Lita Bogglehead, age 67, had come in to run her daughter's food truck because Naomi had an interview that day with a commercial genomics company where she hoped to work.
The elder Bogglehead, like a great many seniors, suffered from GERD and acid reflux. 'A billion federal dollars probably has been spent on studying the gut microbiome community, whatever that is', the elder Bogglehead observed, 'but it hasn't done a dime's worth of difference for us seniors. We still belong to the Pepcid Generation'.
'Every evening after dinner, I sneak up to my room and "drop" antacid'.
Note: All but one of the illustrations were taken from, The story of Doctor Dolittle, being the history of his peculiar life at home and astonishing adventures in foreign parts, by Hugh Lofting, New York : Frederick A. Stokes Company, 1920. The book can be downloaded at https://archive.org/details/storyofdoctordol00loft/page/n3. The shmoo originally by Al Capp is easily identifiable. Any resemblance with Dr. Francis Collins, M.D., Ph.D. is purely coincidental.