Will bacteria become the next thumb drives?

by Reed Stubbendieck (@bactereedia)

The text from the Dispilio tablet [Source]
In 1994, a wooden tablet was unearthed from the swamps near Dispilio, a neolithic settlement in modern-day Greece. The Dispilio tablet was carbon-dated to ~5200 BC and is considered to be among the world’s oldest examples of recorded information, having lasted for >7000 years (see image above). For comparison, without intervention the lifespan of most modern digital storage media ranges from 5 to 20 years (side note: have you backed up your data recently?). However, while the longevity of the Dispilio tablet is impressive, living cells have been storing information in DNA for 3.8 billion years. In my previous post, I discussed the potential for cells to store large amounts of information. Today, I want to cover some recent examples of how scientists and engineers are tapping into this immense storage potential.

My favorite example of using cells to store information comes from a paper published last year (2017) in Nature. In this paper, the authors used CRISPR-Cas technology to introduce DNA into cells and store images. Recently, CRISPR-Cas technology has gained fame for its applications in genome engineering, including a dubiously alleged ability to hide genetically modified criminals from law enforcement. However, in its natural context, the CRISPR-Cas system functions as an adaptive immune system for archaea and bacteria. It’s this feature that the authors co-opted for information storage, which I will discuss below.

Though we often think of viruses as disease-causing agents of humans and other Eukaryotes, bacteria suffer from a far greater number of viral infections. In fact, viruses of bacteria, also known as bacteriophages (or simply phages), are the most abundant biological entities on Earth. Estimates place the global number of phages at 1030, which collectively cause 1023 infections of bacteria each second. For comparison, Avogadro’s number is 6.022×1023, meaning that there nearly one mole of phage infections globally per six seconds (or one round of combat in D&D)!

Bacteria are not powerless to stop phage infections. One mechanism that bacteria use to prevent infections is the CRISPR-Cas system. Though the specific molecular details are beyond the scope of this article (see here, if interested), I would like to take a brief moment to explain how the CRISPR-Cas system functions in bacterial cells. During infection, a bacterial cell may capture small pieces of the phage genome and insert them into a region of the chromosome called the CRISPR array. Subsequently, if the bacterium survives, it uses these captured DNA sequences to generate an immune response against future infections from the same phage. Importantly, the cell inserts new DNA sequences into the CRISPR array in a predetermined position. Thus, the CRISPR array stores a history of infection in linear order, which is passed to both daughter cells when the bacterium divides.

By taking advantage of the ability of the CRISPR array to store new DNA sequences, one research group stored the information to reconstruct images inside of Escherichia coli cells. Instead of infecting E. coli cells with phages, the researchers generated large numbers of synthetic DNAs called oligonucleotide protospacers and tricked the cells into incorporating the custom DNAs into the CRISPR arrays. At the beginning of each of the protospacers was a 4 base pair sequence the authors called a “pixet”. The pixet defined the set of pixels described by the following 28 base pairs of the protospacer, where each of the nucleotides (A, T, G, and C) corresponded to a different shade of gray. By introducing 112 protospacers into the population of E. coli cells, the authors were able to store a 56 × 56 pixel 784 byte grayscale image of a human hand in the bacteria. To access the data, the researchers used high throughput DNA sequencing technology and determined the DNA sequences of many different CRISPR arrays from the population of bacteria. By using a custom algorithm, the researchers were able to decode the information from the CRISPR arrays and they digitally reassembled the original image (see image below).

Retrieval of an image of a hand stored in bacterial DNA [Source]
This research group was not satisfied by encoding a single image. Instead, they wanted to store a movie. Specifically, the researchers encoded five frames of Plate 626 from Animal locomotion. An electro-photographic investigation of consecutive phases of animal movements by Eadweard Muybridge from between 1872-1875. To store this animation, the researchers split each frame into protospacer sequences as above, but instead of introducing all of the information at once, the DNA encoding each individual movie frame was successively introduced into the population of E. coli cells. Recall that the CRISPR array stores a history of infection in linear order. Using this approach, each cell stored a piece of each of the five frames. By sequencing the entire CRISPR array from the population of bacteria and splitting the spacer sequences by order of appearance, the authors were able to reconstruct each frame from the movie (see .gif below).

Movie of a galloping horse stored in bacterial DNA [Source].
One caveat of the above examples is that the images decoded from the E. coli genomes were not perfect reproductions, which is evident from several spurious pixels in the reconstructed movie. The authors found that the differences between the encoded and reproduced frames was most often due to changes in the protospacer sequence by DNA synthesis errors, DNA sequencing errors, or mutation. This latter finding highlights a limitation of storing information inside of cells. In the opening, I mentioned that cells have been using DNA to store information for 3.8 billion years. But, unlike the information encoded in the inscriptions on the Dispilio tablet, this information storage is imperfect. DNA mutates and cells evolve. This process is essential for continuing life but is inconvenient for perfect information archival.

Engineers at Microsoft have recently developed their own form of DNA storage technology. Instead of using cells, the engineers store information in isolated DNA molecules and, under special conditions, these molecules are predicted to last for >2000 years. Though etchings on preserved wood have still exceeded the current longevity estimations of DNA storage, I think we’ll find a more effective solution for perfect information archival before those DNA molecules degrade in the year 4000!