Last March, scientists at Harvard’s Wyss Institute encoded data into human DNA, fitting 42 copies of the Library of Congress’ digital archive into a single gram.
DNA holds massive advantages over all other forms of storage, as we’ve learned from paleontology: just try reading a hard drive that’s been buried in a cave for 430,000 years. That’s exactly what computational biologists are doing when they read the DNA of hair, feathers and bones from the field. Researchers at ETH Zurich have predicted a lifespan of DNA data in excess of 1 million years by coating the DNA in glass, and then developed software to correct any resulting data degradation. Best of all, DNA won’t go obsolete –unless humans do.
The March experiment at Harvard was the largest sum of data encoded in DNA yet. Though they didn’t make use of the full 215 million gigabytes (roughly 14,000 blu-ray discs), they managed to encode 2GB, enough for “a full computer operating system, an 1895 French film, “Arrival of a train at La Ciotat,” a $50 Amazon gift card, a computer virus, a Pioneer plaque and a 1948 study by information theorist Claude Shannon.”
The data was encoded by San Francisco based startup Twist Bioscience, which sent back a small vial full of DNA dust, which was reassembled by the Harvard team with a sequencing computer. They were able to re-open the data without a single glitch, which also shows promise for making innumerable loss-free copies of data.
The tech is far from consumer-ready, with the cost of 2GB pegged at $9,000 for encoding and decoding. Price is prohibitive: Microsoft, which last year announced plans to store data in DNA within the next few years, encoded 200GB at a cost of about $800,000 (for the same cost, you could store 200GB on Apple’s iCloud for 22,000 years). But the tech is in an incredibly early stage, and past examples hold promise: the cost of DNA sequencing has dropped from the first human genome’s price tag of $2.7 billion 14 years ago, today startups are competing to do it for less than $100.
From 01 to GATC
The technology is simple in concept. As ETH Zurich researcher Robert Grass described the fundamentals to the American Chemical Society: “A little after the discovery of the double helix architecture of DNA, people figured out that the coding language of nature is very similar to the binary language we use in computers,” said Grass. “On a hard drive, we use 0s and 1s to represent data, and in DNA, we have four nucleotides: A, C, T and G.”
Essentially, DNA data storage means taking those 0’s and 1’s of computer data and translating them into the A, C, T and G of DNA strands. These strands are assembled into a specific order and read back in sequences, using the same computers and software used to analyze human DNA for health or research in other fields.
Just as DNA “knows” how to create an arm instead of a foot, DNA can be “told” to arrange itself in a sequence that makes sense to that computer. Once scientists figured out how to arrange the DNA sequence and to create software to read the result, it was an easy step to encode an Amazon gift card, a short movie, or any other type of data into that sequence.
A Hacking Cough
Now that we’ve already outpaced our wildest science fiction fantasies, leave it to humankind to ask: can we hack DNA to hack a computer? The answer is yes.
Researchers at the University of Washington claim to have encoded malware into human DNA. On a logical level, this isn’t a revelation: anything you can encode can hide bad data, and you’ll never know until it’s decoded on the other end. But there’s something unsettling about a trojan horse virus embedded into a literal horse, or a computer virus encoded into an actual virus.
In July 2017, Harvard Scientists published a paper in science journal Nature describing how a film sequence — a galloping horse filmed by cinematic pioneer Eadweard Muybridge — could be encoded into living bacteria (E.coli) using CRISPR, a technology that essentially “remixes” DNA. Each cell was given a single pixel or set of pixels, which were then reassembled by computer with about 90% accuracy.
Poetry is in their genes
DNA hacking isn’t just for espionage, of course. Artists are also using DNA to explore some incredible things.
In 2011, Canadian poet Christian Bok announced that his collaborators at the University of Calgary had successfully encoded lines of poetry into a Deinococcus radiodurans microbe as part of his Xenotext poem. Bok has an eye toward the future in his work, once tweeting, “I am still amazed that poets insist on writing about their divorces, when robots are taking pictures of orange, ethane lakes on Titan.”
Bok had reached some degree of success for a set of poems, each containing just one vowel. A similar approach to poetry would inform the Xenotext works, very much based on the process of data storage: transcribe a cipher to a set of words, and use those words to correspond to sequences of G, A, C, and T. Under that rubric, The Xenotext was specifically written to “constitute a set of instructions, all of which cause the organism to manufacture a viable, benign protein in response.” In 2011, the microbe responded, and a work documenting the process was published.
The astounding thing about this work is its blend of biology and poetry that borders on cryptographic. The text makes sense, and the first line (“Any style of life is prim”) is encoded in such a way as to create a microbe that glows a deep red color and produces a protein which, decoded, generates a sequence that forms the second line: “The faery is rosy of glow.”
Deinococcus radiodurans was selected because it has survived extreme radiation testing, proving capable of surviving nuclear radiation at levels that would destroy most other organisms. Bok hopes the poem will be stored within this hardy bacteria for eternity, but nature seems to have other ideas. Repeated attempts to ensure the mutation stays within the DNA have failed, as the microbe keeps erasing portions of the poem whenever it reproduces.
The research we’re talking about isn’t quite biological computing. It’s a form of biological storage, encoded and decoded by normal machines. Rough, but still amazing, biological computers have been a proven concept since at least 2012, when California and Israeli researchers encoded and read an image using nothing but DNA enzymes and adenosine triphosphate, the chemical responsible for “delivering” information to those enzymes. The resulting concoction is basically a tube of liquid. Inside that liquid, biological reactions take place to reassemble information, in this case, a jpeg file, which can be read on the other end.
That’s yielded such medical breakthroughs as living organs inside a computer chip, allowing for cruelty-free animal testing by providing biological replicas of organs to test, instead of testing on animals. Add a chemical to the chip and you can see exactly what a human organ would do in response.
The latest breakthroughs could borrow from this approach, too. The Harvard scientists working on bacteria for photographs and movies hope to someday use it as a sort of photographic surface, creating bacterial snapshots to record itself and its environment over time. That would allow scientists to crack open its genome and understand more about how that bacteria has changed or adapted. In that case, if microbes ever did decide to write their own poems, we’d finally be able to read them.
For more from the edge of science, art, and tech, follow swissnex San Francisco on Twitter.
Photo: DNA lab by the University of Michigan School for Environment and Sustainability, CC-BY 2.0 via Flickr.