Biological Thumb Drives

by Reed Stubbendieck (@bactereedia)

“Every cell in your body contains seven hundred and fifty megs of data,” the engineer said. “For comparison, one of your fingers holds as much information as the entire internet. Of course, your information is repeated and redundant, but the fact remains that cells are capable of great storage.”

Legion: Skin Deep, Chapter 5

legion-skin-deep-by-brandon-sanderson3.jpg
Legion: Skin Deep, cover

Note: This post contains no plot spoilers for Legion: Skin Deep.

In Brandon Sanderson’s novella Legion: Skin Deep, Stephen Leeds is tasked with finding a corpse and the information it “knows”. Before his death, eccentric engineer Panos Maheras encoded crucial information in his cells. Leeds needs this information to prevent a pandemic caused by a new virus. There are a ton of fun ideas to explore in this story, but today I want to focus on the idea of storing information inside of cells. As referenced above, an engineer states that each cell contains 750 megabytes (MB) of data. So, in my first post, I want to explore the following two questions:

  1. How much information is contained within a single cell?
  2. Can we store the entire internet in a human finger?

Before we begin, let’s have a brief refresher on DNA. Inside of most animal and plant cells, there is a tiny organ (an “organelle”) called the nucleus. The nucleus contains the cell’s DNA in the form of chromosomes, which is also known as the nuclear genome. Each chromosome contains a double-stranded DNA molecule wrapped tightly around many different proteins. Information is encoded within a double-stranded DNA molecule via the nucleotide base pairs. A single strand of DNA is made from a sequence of the four nucleotides: adenine (A), cytosine (C), guanine (G), and thymine (T), which pair with T, G, C, and A, respectively on the complementary strand of DNA. In a human’s haploid (i.e., single) set of 23 chromosomes, there are ~3 billion base pairs of DNA. To determine the storage capacity of 3 billion base pairs, we will need to take a brief trip from the world of carbon to the world of silicon.

In computing, the basic unit of information is called a bit, and can have a binary value of 0 or 1. If we translate the language of nucleotides into the language of bits, it takes 2 bits to encode each base pair uniquely (AT, CG, GC, and TA as 00, 01, 10, and 11, respectively). This conversion will allow us to directly determine how much information is contained in our DNA. For a single set of 23 chromosomes, we calculate:

eq1

It is important to note that estimates of the total length of the haploid human genome vary from 2.9 to 3.2 billion base pairs. Thus, the information content of the haploid human genome is between 691 – 763 MB, which bound the value of 750 MB (see update, below) given by the engineer in the novella. However, most human cells contain a diploid (i.e., two) sets of chromosomes. We inherit 23 chromosomes each from our father and mother for a total of 46 chromosomes. Therefore, each cell contains 1430 MB of information stored in the nuclear genome. In addition, human cells also contain organelles called mitochondria, which have their own genomes. The mitochondrial genome is much smaller than the human genome at ~16,000 base pairs. Unlike the nucleus, each cell contains many mitochondria. For our calculation we’ll estimate that the average cell contains 1000 molecules of mitochondrial DNA:

eq2

That’s not a lot compared to the nuclear genome. Therefore, the total information stored in each cell is ~1.4 GB. This data could be stored on a USB drive that costs < $3!

But can we store the entire internet on a finger? If we estimate the number of cells in a human finger and then use our value of 1.4 GB per cell, we can calculate a finger’s DNA information content. For a male, the average finger width and length are 20 and 110 mm, respectively. Assuming that a finger is roughly cylindrical in shape, the density of human tissue is close to that of water (1 g/cm3), a 70 kg human body contains 1013 cells, and that the distribution of cells in the body is uniform across all body sites, we can estimate the data storage of a human finger as follows:

eq3

7 exabytes (EB) is an extremely large amount of information. In fact, a single EB can store nearly 43 million Blu-ray discs. However, the internet contains much more information: in 2012 it was estimated that 1 EB of data was created on the internet daily. Therefore, though the potential information stored on a finger is impressive, it will not store the entire internet. If we use every cell in the human body, we can store 14 zettabytes (ZB) (615 billion Blu-ray discs). With this much storage, we can hold the entire internet. At least, for a little while longer, because global internet traffic is estimated to reach 3.3 ZB per year in 2021!

In conclusion, even though the engineer was a little enthusiastic about the storage potential of a human “thumb drive”, he was correct that cells have a great capacity to store large amounts of data! In a future post, I will explore how scientists have already engineered cells and DNA to store music, movies, and books.

Update 05/22/2018: In the calculations for this post, I used the classic definition that defined 1 MB as 220 (1,048,576) bytes instead of 106 (1,000,000) bytes. In modern parlance, the former is now defined as a Mebibyte (MiB) and the latter is a MB (see here for a discussion of the historical differences). If we use the modern definition of a MB, then the calculation matches the 750 MB value given in the novella. Note, this value is still in reference to a haploid genome.