what are genomes

What is a genome?

A genome is an organism’s complete set of genetic instructions. Each genome contains all of the information needed to build that organism and allow it to grow and develop. 

Our bodies are made up of millions of cells? (100,000,000,000,000), each with their own complete set of instructions for making us, like a recipe book for the body. This set of instructions is known as our genome and is made up of DNA?. Each cell in the body, for example, a skin cell or a liver cell, contains this same set of instructions:

  • The instructions in our genome are made up of DNA.
  • Within DNA is a unique chemical code that guides our growth, development and health.
  • This code is determined by the order of the four nucleotide bases that make up DNA, adenine, cytosine, guanine and thymine, A, C, G and T for short.
  • DNA has a twisted structure in the shape of a double helix.
  • Single strands of DNA are coiled up into structures called chromosomes?.
  • Your chromosomes are located in the nucleus? within each cell.
  • Within our chromosomes, sections of DNA are “read” together to form genes?.
  • Genes control different characteristics such as eye colour and height.
  • All living things have a unique genome?.
  • The human genome is made of 3.2 billion bases of DNA but other organisms have different genome sizes.

If printed out the 3.2 billion letters in your genome would:

  • Fill a stack of paperback books 200 feet (61 m) high
  • Fill 200 500-page telephone directories
  • Take a century to recite, if we recited at one letter per second for 24 hours a day
  • Extend 3,000 km (1,864 miles), that’s about the distance from London to the Canary Islands, Washington to Guatemala or from New Delhi to Hanoi. 


The term genome was coined in 1920 to describe “the haploid chromosome set, which, together with the pertinent protoplasm, specifies the material foundations of the species” [1]. The term did not catch on immediately (Fig 1). Though Mendelian genetics was rediscovered in 1900, and chromosomes were identified as the carriers of genetic information in 1902 [2], it was not known in 1920 whether the genetic information was carried by the DNA or protein component of the chromosomes [3]. Furthermore, the mechanism by which the cell copies information into new cells [4] and converts that information into functions [5] was unknown for several decades after the term “genome” was coined.

Today, however, we are awash in genomic data. A recent release of the GenBank database [7], version 210.0 (released on October 15, 2015), contains over 621 billion base pairs from 2,557 eukaryal genomes, 432 archaeal genomes, and 7,474 bacterial genomes, as well as tens of thousands of viral genomes, organellar genomes, and plasmid sequences (, on December 13, 2015). We also now have much broader and more detailed understandings of how the genome is expressed and how different biological and environmental factors contribute to that process. Even so, almost a century after coining the term, the standard definition of the genome remains very similar to its 1920 predecessor. For example, on its Genetics Home Reference website, the National Institutes of Health (NIH) definition reads: “An organism’s complete set of DNA, including all of the genes, makes up the genome. Each genome contains all of the information needed to build and maintain that organism” (, on February 1, 2016).

With a greater understanding of genomic content, diversity, and expression, we can now reassess our basic understanding of the genome and its role in the cell. For example, closer scrutiny of the NIH definition reveals that its two halves are mutually exclusive; that is, the “complete set of DNA” cannot be “all of the information needed to build and maintain (an) organism.” Of course, this was probably meant to be a simplified definition for both scientists and nonscientists. While it is useful to continue thinking of the genome as a physical entity encoding the information required to maintain and replicate an organism, our present understanding shows that this definition is incomplete.Go to:

Examples of Physical Transience in Genomes

Many diverse genetic systems challenge the material definition of the genome as “the complete set of chromosomes” [1] or “an organism’s complete set of DNA” ( Perhaps the most familiar and straightforward example of a genome’s physical impermanence occurs in the retroviral infection cycle. Upon infection, retroviruses convert their single-stranded RNA genomes into double-stranded DNA. These intermediate DNA copies of the genome are integrated into the host cell and, thus, no longer constitute a separate physical entity from the host’s genome. As an integrated DNA sequence, transcription into mRNA can both express retroviral genes and also reconstitute the original single-stranded (ss)RNA genome. Other types of viruses share similar features. Many temperate phages and viruses integrate into the host’s genome, removing themselves and lysing the host cell only after certain conditions are met. The hepadnaviruses, including Hepatitis B, infect the cell as double-stranded DNA, but are transcribed after infection into single-stranded RNA and subsequently follow a similar course as the retroviruses, wherein they are reverse transcribed back into DNA [8].

The chemical conversions of these genomes between different nucleic acids offer cogent examples that challenge our assumption of the physical permanence of genomes. It is tempting to explain this physical transience as another eccentric quirk of viruses. Many viruses, after all, do not have genomes composed of double-stranded DNA, a feature that already flouts the NIH definition given earlier. But an equally cogent example of the physical impermanence of a genome is found in the eukaryotic genus Oxytricha [911], a group of ciliates that are distantly related to Tetrahymena and Paramecium [12].

Like other ciliates, Oxytricha possesses two distinct versions of its genome, a germline version and a somatic version. Oxytricha’s germline genome is an archive of approximately 1 Gb of DNA sequence containing approximately one-quarter million embedded gene segments. These DNA pieces assemble following sexual recombination to form the somatic, expressed chromosomes (Fig 2). Thousands of these gene segments are present within the germline chromosomes in a scrambled order or reverse orientation, such that their reassembly requires translocation and/or inversion with respect to one another [13]. The resulting somatic genome, containing protein-coding sequences in the correct order, contains just 5%–10% the original sequence of the germline genome. This somatic genome resides on over 16,000 unique “nanochromosomes” that typically bear single genes and have an average size of just 3.2 kb [14]. These nanochromosomes also exist in high copy number, averaging approximately 2,000 copies per unique chromosome [14,15].

An external file that holds a picture, illustration, etc.
Object name is pgen.1006181.g002.jpg

Fig 2The transfer of genomic information from DNA to RNA in Oxytricha trifallax.

The physical transition of genomic information from DNA to RNA and back to DNA occurs after mating in the ciliate, Oxytricha trifallax. RNA templates (wavy green line) and piRNAs (green dashes) derive from RNA transcripts of the previous generation’s somatic DNA nanochromosomes before the old somatic nucleus degrades. A mitotic copy of the new, zygotic germline genome provides precursor DNA segments (numbers 1–4) that are retained in the developing somatic nucleus through piRNA associations and rearranged according to the inherited RNA templates. This step sometimes reorders or inverts precursor segments to build the mature DNA molecule. The number of copies of each new nanochromosome is also influenced by the concentration of RNA templates supplied by the previous somatic genome during development. Red rectangles represent telomeres added to the ends of somatic chromosomes. Only one representative nanochromosome (of over 16,000 in Oxytricha) is shown for simplicity, and it derives from a representative locus containing 4 scrambled precursor segments in the germline genome.

Much of the information required to reproduce the somatic genome derives from RNA rather than DNA. Long, RNA-cached copies of somatic chromosomes from the previous generation provide templates to guide chromosome rearrangement [16]. Germline transposases participate in the whole process, probably by facilitating DNA cleavage events [17,18] that allow genomic regions to rearrange in the order according to the RNA templates [16]. Experimental introduction of long artificial RNAs can reprogram a developing Oxytricha cell to follow the order of gene segments specified by the artificial RNA templates, rather than the wild-type chromosome.

RNA performs other essential roles in building Oxytricha’s somatic genome. Millions of small, 27-nt piRNAs, which also derive from the previous generation’s somatic genome, mark and protect the retained DNA regions in the new zygotic germline that assemble (according to the RNA template) to form the new somatic genome [19,20]. In addition, the relative abundance of the long template RNAs also establishes chromosome copy number in the daughter cells [17]. Because these RNA templates derive from the previous generation’s somatic genome, this means that both the genomic sequence and chromosome ploidy are inherited from the old somatic nucleus to the new somatic nucleus through information transfer from DNA to RNA and back again to DNA.

These examples of physical transience in genomes show that a genome’s chemical composition and stability are not necessarily fixed requirements at all times in every organism. Synthetic biologists have further demonstrated this point through the chemical synthesis of viral [21,22] and bacterial [23] genomes. Prior to the chemical synthesis of these DNA chromosomes, the genomes existed in a purely informational state as nucleotide sequences in a computer file. In these cases, the genome of the virus or cell is not transferred from one type of nucleic acid to another, but from a physical DNA molecule to a non-physical nucleotide sequence and back again to a physical DNA molecule. Though this example is not a naturally occurring phenomenon, it provides a straightforward demonstration that the information content of the genome is more important than its physical permanence. Therefore, the concept of informational supremacy that is used to define genomes, e.g., “all of the information needed to build and maintain that organism,” also deserves further scrutiny.Go to:

Extra-Genomic Information

Information is both an essential concept that underpins our understanding of a genome’s function and a notoriously difficult concept to define. The genome contains information, but so do other constituents of the cell. A typical and uncontroversial view is that the genome carries information but requires the presence of proteins, ribosomal RNAs, and transfer RNAs in the cell for the meaningful conversion of genomic information to molecular function. Indeed, the construction of synthetic genomes mentioned earlier required transplantation of the chemically synthesized genome into a pre-existing cell [23]. Evidence for heritable information beyond the genome has also been known since the 1960s [24]. A greater understanding of molecular biology has revealed that extra-genomic sources of information are not only required to read the genome but can influence the information encoded within the genome [25].

Epigenetic control of gene regulation provides a subtler—but in many ways more cogent—example of extra-genomic information. DNA methylation [26,27], histone modification encoding chromatin [28,29], and certain proteins (e.g., [30,31]) and noncoding RNAs [32,33], including Oxytricha’s noncoding RNAs described in the previous section [17,18,20], all offer platforms that permit information transfer across generations, while seeming to bypass the DNA genome. It has not yet been shown whether epigenetic information can persist over scales of evolutionary time, but it is clear that many if not most genomes have evolved a capacity for epigenetic control. This makes such genomes sensitive to external information that they do not encode, which, in turn, should influence their ability to adapt to changing environments while, in some cases, preserving the ability to revert to the former wild-type genome. This is epitomized by the genome duality in Oxytricha, in which millions of small and long noncoding RNAs sculpt and decrypt the information in its somatic epigenome, while the germline genome provides a more stable archive.

A second example of extra-genomic information has come by way of genome-wide association studies, which have identified correlations between many phenotypic traits and genetic variants [34]. In doing so, such studies have also revealed the so-called “missing heritability” problem, that genetic variation does not always account for 100% of the measured heritability, let alone the observed phenotypic variance, in many complex traits. In many cases, this missing heritability can be explained as a lack of statistical power due to low phenotypic impact of the genetic variation or low frequency in the population [35]. The missing heritability can also be explained, however, by a gene–environment interaction, such that the genes may only encode a trait that is expressed under certain environmental conditions [36,37]. In this example, genomes do not necessarily encode all of the information of the cell, but rather a set of potential states that may be realized through interaction with different environments.

As these examples demonstrate, the way in which the information content of the genome becomes realized as functions and phenotypes depends on other cellular constituents as well as the environment. The ability of genomes to be affected by this external information is, itself, encoded on the genome. In this way, genomes are not a sole source of cellular information, but rather a more expansive archive of possible states that can be generated through interactions with internal and external factors.Go to:


Many biologists already know that the genome is not always best defined as “all of the information needed to build and maintain” a cell or an organism. While this definition is useful in the context of an online glossary for the public, it is, by necessity, an oversimplification. But if a genome is not a complete set of DNA containing all of the information needed to build and maintain the organism, then what is it?

We have demonstrated through examples from retroviruses, the microbial eukaryote Oxytricha, and synthetic biology that the genome can change its physical character while still maintaining the necessary information encoded within it. We also describe examples in which non-genomic factors can alter the way in which the information within the genome translates to molecular functions and phenotypes. These examples suggest a more expansive definition of the genome as an informational entity, often but not always manifest as DNA, encoding a broad set of functional possibilities that, together with other sources of information, produce and maintain the organism. Whether or not even this definition stands up to future discoveries remains to be seen.Go to:


We thank Ford Doolittle and Susan Rosenberg for organizing this series of papers on “How Microbes ‘Jeopardize’ the Modern Synthesis.”Go to:

Funding Statement

The authors received no specific funding for this work.Go to:


1. Lederberg J, McCray AT. ‘Ome Sweet ‘Omics: A Genealogical Treasury of Words. The Scientist. 2001;15:8. [Google Scholar]2. Sutton WS. On the morphology of the chromosome group in Brachystola magna. Biol. Bull. 1902;4:24–39 [Google Scholar]3. Avery OT, MacLeod CM, McCarty M. Studies on the chemical nature of the substance inducing transformation of Pneumococcal types. J Exp Med, 1944;79:137–159. [PMC free article] [PubMed] [Google Scholar]4. Watson JD, Crick FHC. Genetical Implications of the structure of Deoxyribonucleic Acid. Nature. 1953;171:964–967. [PubMed] [Google Scholar]5. Crick FHC. On protein synthesis. Symp Soc Exp Biol. 1958;12:138–163. [PubMed] [Google Scholar]6. Michel JB, Shen YK, Aiden AP, Veres A, Gray MK, et al. Quantitative analysis of culture using millions of digitized books. Science. 2011;331:176–182. 10.1126/science.1199644 [PMC free article] [PubMed] [CrossRef] [Google Scholar]7. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2005;33:D34–D38. [PMC free article] [PubMed] [Google Scholar]8. Nassal M, Schaller H. Hepatitis B virus replication. Trends Microbiol. 1993;1:221–228. [PubMed] [Google Scholar]9. Nowacki M, Shetty K, Landweber LF. RNA-Mediated Epigenetic Programming of Genome Rearrangements. Annu Rev Genomics Hum Genet. 2011;12:367–389. 10.1146/annurev-genom-082410-101420 [PMC free article] [PubMed] [CrossRef] [Google Scholar]10. Goldman AD, Landweber LF. Oxytricha as a modern analog of ancient genome evolution. Trends Genet. 2012;28:382–388. 10.1016/j.tig.2012.03.010 [PMC free article] [PubMed] [CrossRef] [Google Scholar]11. Bracht JR, Fang W, Goldman AD, Dolzhenko E, Stein EM, Landweber LF. Genomes on the edge: programmed genome instability in ciliates. Cell. 2013;152:406–416. 10.1016/j.cell.2013.01.005 [PMC free article] [PubMed] [CrossRef] [Google Scholar]12. Zoller SD, Hammersmith RL, Swart EC, Higgins BP, Doak TG, et al. Characterization and taxonomic validity of the ciliate Oxytricha trifallax (Class Spirotrichea) based on multiple gene sequences: limitations in identifying genera solely by morphology. Protist. 2012;163:643–657 [PMC free article] [PubMed] [Google Scholar]13. Chen X, Bracht JR, Goldman AD, Dolzhenko E, Clay DM, et al. The architecture of a scrambled genome reveals massive levels of genomic rearrangement during development. Cell. 2014;158:1187–98. 10.1016/j.cell.2014.07.034 [PMC free article] [PubMed] [CrossRef] [Google Scholar]14. Swart EC, Bracht JR, Magrini V, Minx P, Chen X, et al. The Oxytricha trifallax macronuclear genome: a complex eukaryotic genome with 16,000 tiny chromosomes. PLoS Biol. 2013;11:e1001473 10.1371/journal.pbio.1001473 [PMC free article] [PubMed] [CrossRef] [Google Scholar]15. Prescott DM. The DNA of ciliated protozoa. Microbiol Mol Biol Rev. 1994;58:233–267. [PMC free article] [PubMed] [Google Scholar]16. Nowacki M, Vijayan V, Zhou Y, Schotanus K, Doak TG, Landweber LF. RNA-mediated epigenetic programming of a genome-rearrangement pathway. Nature. 2008;451:153–158. [PMC free article] [PubMed] [Google Scholar]17. Nowacki M, Haye JE, Fang W, Vijayan V, Landweber LF. RNA-mediated epigenetic regulation of DNA copy number. Proc Natl Acad Sci U S A, 2010;107:22140–22144. 10.1073/pnas.1012236107 [PMC free article] [PubMed] [CrossRef] [Google Scholar]18. Vogt A, Goldman AD, Mochizuki K, Landweber LF. Transposon domestication versus mutualism in ciliate genome rearrangements. PLoS Genet. 2013;9:e1003659 10.1371/journal.pgen.1003659 [PMC free article] [PubMed] [CrossRef] [Google Scholar]19. Fang W, Wang X, Bracht JR, Nowacki M, Landweber LF. Piwi-interacting RNAs protect DNA against loss during Oxytricha genome rearrangement. Cell. 2012;151:1243–1255. 10.1016/j.cell.2012.10.045 [PMC free article] [PubMed] [CrossRef] [Google Scholar]20. Zahler AM, Neeb ZT, Lin A, Katzman S. Mating of the stichotrichous ciliate Oxytricha trifallax induces production of a class of 27 nt small RNAs derived from the parental macronucleus. PLoS ONE, 2012;7:e42371 10.1371/journal.pone.0042371 [PMC free article] [PubMed] [CrossRef] [Google Scholar]21. Cello J, Paul AV, Wimmer E. Chemical synthesis of poliovirus cDNA: generation of infectious virus in the absence of natural template. Science. 2002;297:1016–1018. [PubMed] [Google Scholar]22. Smith HO, Hutchison CA 3rd, Pfannkoch C, Venter JC. Generating a synthetic genome by whole genome assembly: phiX174 bacteriophage from synthetic oligonucleotides. Proc Natl Acad Sci U S A. 2003;100:15440–5. [PMC free article] [PubMed] [Google Scholar]23. Gibson DG, Glass JI, Lartigue C, Noskov VN, Chuang RY, et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science, 2010;329:52–56. 10.1126/science.1190719 [PubMed] [CrossRef] [Google Scholar]24. Nanney DL. Corticotype transmission in TetrahymenaGenetics. 1966;54:955–968. [PMC free article] [PubMed] [Google Scholar]25. Walker SI. Top-down causation and the rise of information in the Emergence of Life. Information. 2014;5:424–439. [Google Scholar]26. Riggs AD. X inactivation, differentiation, and DNA methylation. Cytogenet Cell Genet. 1975;14:9–25. [PubMed] [Google Scholar]27. Jaenisch R, Bird A. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet. 2003;33:245–254. [PubMed] [Google Scholar]28. D’Urso A, Brickner JH. Mechanisms of epigenetic memory. Trends Genet. 2014;30:230–236. 10.1016/j.tig.2014.04.004 [PMC free article] [PubMed] [CrossRef] [Google Scholar]29. Siklenka K, Erkek S, Godmann M, Lambrot R, McGraw S, et al. Disruption of histone methylation in developing sperm impairs offspring health transgenerationally. Science. 2015;350:aab2006 10.1126/science.aab2006 [PubMed] [CrossRef] [Google Scholar]30. Zordan R, Miller M, Galgoczy D, Tuch B, Johnson A. Interlocking transcriptional feedback loops control white-opaque switching in Candida albicansPLoS Biol. 2007;5:1–11. [PMC free article] [PubMed] [Google Scholar]31. Zacharioudakis I, Gligoris T, Tzamarias D. A yeast catabolic enzyme controls transcriptional memory. Curr Biol. 2007;17:2041–2046. [PubMed] [Google Scholar]32. Rassoulzadegan M, Grandjean V, Gounon P, Vincent S, Gillot I, Cuzin F. RNA-mediated non-Mendelian inheritance of an epigenetic change in the mouse. Nature. 2006;441:469–474. [PubMed] [Google Scholar]33. Rodgers AB, Morgan CP, Leu NA, Bale TL. Transgenerational epigenetic programming via sperm microRNA recapitulates effects of paternal stress. Proc Natl Acad Sci U S A. 2015;112:13699–13704. 10.1073/pnas.1508347112 [PMC free article] [PubMed] [CrossRef] [Google Scholar]34. Bush WS, Moore JH. Chapter 11: Genome-Wide Association Studies. PLoS Comput Biol. 2012;8:e1002822 10.1371/journal.pcbi.1002822 [PMC free article] [PubMed] [CrossRef] [Google Scholar]35. Gibson G. Rare and common variants: twenty arguments. Nat Rev Genet. 2012;13:135–145. 10.1038/nrg3118 [PMC free article] [PubMed] [CrossRef] [Google Scholar]36. Smith EN, Kruglyak L. Gene-environment interaction in yeast gene expression. PLoS Biol. 2008;6:e83 10.1371/journal.pbio.0060083 [PMC free article] [PubMed] [CrossRef] [Google Scholar]37. Manuck SB, McCaffery JM. Gene-environment interaction. Annu Rev Psychol. 2014;65:41–70. 10.1146/annurev-psych-010213-115100 [PubMed] [CrossRef] [Google Scholar]

Leave a Reply

Your email address will not be published.