Intro to gene expression (central dogma)

How genes in DNA can provide instructions for proteins. The central dogma of molecular biology: DNA → RNA → protein.

Overview: Gene expression

DNA is the genetic material of all organisms on Earth. When DNA is transmitted from parents to children, it can determine some of the children's characteristics (such as their eye color or hair color). But how does the sequence of a DNA molecule actually affect a human or other organism's features? For example, how did the sequence of nucleotides (As, Ts, Cs, and Gs) in the DNA of Mendel's pea plants determine the color of their flowers?

Genes specify functional products (such as proteins)

A DNA molecule isn't just a long, boring string of nucleotides. Instead, it's divided up into functional units called genes. Each gene provides instructions for a functional product, that is, a molecule needed to perform a job in the cell. In many cases, the functional product of a gene is a protein. For example, Mendel's flower color gene provides instructions for a protein that helps make colored molecules (pigments) in flower petals.
Diagram of how a gene can dictate a phenotype (observable feature) of an organism. The flower color gene that Mendel studied consists of a stretch of DNA found on a chromosome. The DNA has a particular sequence; part of it, shown in this diagram, is 5'-GTAAATCG-3' (upper strand), paired with the complementary sequence 3'-CATTTAGC-5' (lower strand). The DNA of the gene specifies production of a protein that helps make pigments. When the protein is present and functional, pigments are produced, and the flowers of a plant have a purple color.
Image based on experimental data reported by Hellens et al.1^1 and on similar figure in Reece et al.2^2.
The functional products of most known genes are proteins, or, more accurately, polypeptides. Polypeptide is just another word for a chain of amino acids. Although many proteins consist of a single polypeptide, some are made up of multiple polypeptides. Genes that specify polypeptides are called protein-coding genes.
Not all genes specify polypeptides. Instead, some provide instructions to build functional RNA molecules, such as the transfer RNAs and ribosomal RNAs that play roles in translation.
As mentioned above, an organism's DNA can be divided into functional units called genes. Each gene consists of a sequence of DNA, and that sequence provides instructions to build a product needed by the cell. Some products are polypeptides, while others are functional RNAs.
Examples of different functional products that genes can specify.
In this example, there is a stretch of DNA that contains three different genes:
  • Gene 1 encodes an mRNA, which is then translated to make a polypeptide (protein or protein subunit).
  • Gene 2 encodes a regulatory RNA. This RNA is not translated into a polypeptide, but rather, carries out a job in the cell itself (regulating expression of other genes).
  • Gene 3 encodes a transfer RNA (tRNA). This RNA is also not translated into a polypeptide. Instead, it folds into a complex cloverleaf shape and will play a key role in the synthesis of proteins.
_Image modified from "Central dogma of molecular biochemistry with enzymes," by Daniel Horspool (CC BY-SA 3.0). The modified image is licensed under a CC BY-SA 3.0 license._
The idea that genes encode polypeptides has been around for many years (tracing its roots back to experiments by Beadle and Tatum in the 1940s).
The idea that genes commonly encode functional RNAs is newer. Certain types of functional RNAs (such as transfer RNAs and ribosomal RNAs) have been known for many years. However, scientists have only recently discovered many other genes that encode regulatory RNAs, non-protein-coding RNAs that change the expression of other genes. How these RNAs work is an active area of research.

How does the DNA sequence of a gene specify a particular protein?

Many genes provide instructions for building polypeptides. How, exactly, does DNA direct the construction of a polypeptide? This process involves two major steps: transcription and translation.
  • In transcription, the DNA sequence of a gene is copied to make an RNA molecule. This step is called transcription because it involves rewriting, or transcribing, the DNA sequence in a similar RNA "alphabet." In eukaryotes, the RNA molecule must undergo processing to become a mature messenger RNA (mRNA).
  • In translation, the sequence of the mRNA is decoded to specify the amino acid sequence of a polypeptide. The name translation reflects that the nucleotide sequence of the mRNA sequence must be translated into the completely different "language" of amino acids.
Simplified schematic of central dogma, showing the sequences of the molecules involved.
The two strands of DNA have the following sequences:
5'-ATGATCTCGTAA-3' 3'-TACTAGAGCATT-5'
Transcription of one of the strands of DNA produces an mRNA that nearly matches the other strand of DNA in sequence. However, due to a biochemical difference between DNA and RNA, the Ts of DNA are replaced with Us in the mRNA. The mRNA sequence is:
5'-AUGAUCUCGUAA-5'
Translation involves reading the mRNA nucleotides in groups of three; each group specifies an amino acid (or provides a stop signal indicating that translation is finished).
3'-AUG AUC UCG UAA-5'
AUG \rightarrow Methionine AUC \rightarrow Isoleucine UCG \rightarrow Serine UAA \rightarrow "Stop"
Polypeptide sequence: (N-terminus) Methionine-Isoleucine-Serine (C-terminus)
Thus, during expression of a protein-coding gene, information flows from DNA \rightarrow RNA \rightarrow protein. This directional flow of information is known as the central dogma of molecular biology. Non-protein-coding genes (genes that specify functional RNAs) are still transcribed to produce an RNA, but this RNA is not translated into a polypeptide. For either type of gene, the process of going from DNA to a functional product is known as gene expression.
You may wonder why gene expression has a transcription step. Why isn't a DNA sequence translated directly into the amino acid sequence of a polypeptide?
That's a great question, and we can't give a definitive answer. To some extent, that's simply how the gene expression system evolved, and we are speculating when we address the "why" of transcription. If we found life on another planet, it could possibly express its genes through a different process that did not involve transcription.
In known organisms, however, transcription is an essential part of gene expression. Even if cells somehow had a way to directly read a DNA sequence and use it to build a protein (which they don't), there are reasons why transcription would still be a necessary step:
  • One reason simply relates to location. In a eukaryotic cell, the DNA is locked up in the nucleus, while the ribosomes – molecular machines used to make proteins – are found in the cytosol. Thus, a "messenger" is needed to carry information from DNA out of the nucleus to the waiting ribosomes. Messenger RNAs (mRNAs) fill this role.
  • Transcription also provides an important control point at which cells regulate how much of a polypeptide is produced. Although other stages of gene expression can also be regulated, control of transcription is the most common form of gene regulation. If the transcription stage were somehow removed, cells would lose much of their control over which polypeptides were produced and when.

Transcription

In transcription, one strand of the DNA that makes up a gene, called the non-coding strand, acts as a template for the synthesis of a matching (complementary) RNA strand by an enzyme called RNA polymerase. This RNA strand is the primary transcript.
The two strands of DNA have the following sequences:
5'-ATGATCTCGTAA-3' 3'-TACTAGAGCATT-5'
The DNA opens up to form a bubble, and the lower strand serves as a template for the synthesis of a complementary RNA strand. This strand is called the template strand. Transcription of the template strand produces an mRNA that nearly matches the other strand (coding strand) of DNA in sequence. However, due to a biochemical difference between DNA and RNA, the Ts of DNA are replaced with Us in the mRNA. The mRNA sequence is:
5'-AUGAUCUCGUAA-5'
The primary transcript carries the same sequence information as the non-transcribed strand of DNA, sometimes called the coding strand. However, the primary transcript and the coding strand of DNA are not identical, thanks to some biochemical differences between DNA and RNA. One important difference is that RNA molecules do not include the base thymine (T). Instead, they have the similar base uracil (U). Like thymine, uracil pairs with adenine.
Identity of the sugars. The sugar in an RNA nucleotide is ribose, while the sugar in DNA is deoxyribose. They are very similar, but ribose has a hydroxyl (OH-\text{OH}) group that's missing in deoxyribose.
DNA nucleotide: lacks a hydroxyl group on the 2' carbon of the sugar (i.e., sugar is deoxyribose). Bears a thymine base that has a methyl group attached to its ring.
RNA nucleotide: has a hydroxyl group on the 2' carbon of the sugar (i.e., sugar is ribose). Bears a uracil base that is very similar in structure to thymine, but does not have a methyl group attached to the ring.
Image based on similar image from CyberBridge 3^3.
Number of strands. Transcription produces a single-stranded RNA molecule, while the starting DNA molecule was double-stranded.
Although RNA transcripts are not made up of two separate strands, RNA can sometimes fold back on itself to form double-stranded regions and complex 3D structures. We will see examples of RNA folding when we look at transfer RNA (tRNA) and protein translation. In addition, some viruses have genomes made of double-stranded RNA.
See the nucleic acids article for more information on DNA and RNA.

Transcription and RNA processing: Eukaryotes vs. bacteria

In bacteria, the primary RNA transcript can directly serve as a messenger RNA, or mRNA. Messenger RNAs get their name because they act as messengers between DNA and ribosomes. Ribosomes are RNA-and-protein structures in the cytosol where proteins are actually made.
In eukaryotes (such as humans), a primary transcript has to go through some extra processing steps in order to become a mature mRNA. During processing, caps are added to the ends of the RNA, and some pieces of it may be carefully removed in a process called splicing. These steps do not happen in bacteria.
Eukaryotic cell: Transcription takes place in the nucleus. The primary transcript also undergoes processing steps in the nucleus in order to become a mature mRNA. It is then exported to the cytosol, where it can associate with a ribosome and direct synthesis of a polypeptide in the process of translation.
Bacterium: Transcription takes place in the cytosol. Because of this, the mRNA doesn't have to travel anywhere before it can be translated by a ribosome. In fact, a ribosome may begin translating a mRNA before it is even fully transcribed (while transcription is still going on).
The location of transcription is also different between prokaryotes and eukaryotes. Eukaryotic transcription takes place in the nucleus, where the DNA is stored, while protein synthesis takes place in the cytosol. Because of this, a eukaryotic mRNA must be exported from the nucleus before it can be translated into a polypeptide. Prokaryotic cells, on the other hand, don't have a nucleus, so they carry out both transcription and translation in the cytosol.

Translation

After transcription (and, in eukaryotes, after processing), an mRNA molecule is ready to direct protein synthesis. The process of using information in an mRNA to build a polypeptide is called translation.

The genetic code

During translation, the nucleotide sequence of an mRNA is translated into the amino acid sequence of a polypeptide. Specifically, the nucleotides of the mRNA are read in triplets (groups of three) called codons. There are 6161 codons that specify amino acids. One codon is a "start" codon that indicates where to start translation. The start codon specifies the amino acid methionine, so most polypeptides begin with this amino acid. Three other “stop” codons signal the end of a polypeptide. These relationships between codons and amino acids are called the genetic code.
Genetic code table. Each three-letter sequence of mRNA nucleotides corresponds to a specific amino acid, or to a stop codon. UGA, UAA, and UAG are stop codons. AUG is the codon for methionine, and is also the start codon.
_Image credit: "The genetic code," by OpenStax College, Biology (CC BY 3.0)._
The mRNA sequence is:
5'-AUGAUCUCGUAA-5'
Translation involves reading the mRNA nucleotides in groups of three; each group specifies an amino acid (or provides a stop signal indicating that translation is finished).
3'-AUG AUC UCG UAA-5'
AUG \rightarrow Methionine AUC \rightarrow Isoleucine UCG \rightarrow Serine UAA \rightarrow "Stop"
Polypeptide sequence: (N-terminus) Methionine-Isoleucine-Serine (C-terminus)

Steps of translation

Translation takes place inside of structures known as ribosomes. Ribosomes are molecular machines whose job is to build polypeptides. Once a ribosome latches on to an mRNA and finds the "start" codon, it will travel rapidly down the mRNA, one codon at a time. As it goes, it will gradually build a chain of amino acids that exactly mirrors the sequence of codons in the mRNA.
How does the ribosome "know" which amino acid to add for each codon? As it turns out, this matching is not done by the ribosome itself. Instead, it depends on a group of specialized RNA molecules called transfer RNAS (tRNAs). Each tRNA has a three nucleotides sticking out at one end, which can recognize (base-pair with) just one or a few particular codons. At the other end, the tRNA carries an amino acid – specifically, the amino acid that matches those codons.
Translation occurring in a ribosome. The mRNA is bound to the ribosome, where it can interact with tRNA molecule.
In this image, the mRNA has a sequence of:
3'-...AUG UAC AUC UCG GAU...-5'
A tRNA bound to the third codon (5'-AUC-3') has a complementary sequence of 3'-UAG-5'. It bears a chain of polypeptides consisting of methionine and isoleucine, which is attached to the tRNA by the isoleucine. To the right of this tRNA, another tRNA is binding to the next codon (5'-UCG-3'). This tRNA again has a complementary sequence of nucleotides (3'-AGC-5') and bears the amino acid serine, which is the amino acid specified by the mRNA codon. The serine carried by this tRNA will be added to the growing polypeptide chain.
Other tRNAs carrying other amino acids are floating around in the background. One carries Glu (glutamic acid) and has a sequence of nucleotides at its end that reads 3'-CUU-5'. The other carries Asp (aspartic acid) and has a sequence of nucleotides at its end that reads 3'-CUA-5'.
There are many tRNAs floating around in a cell, but only a tRNA that matches (base-pairs with) the codon that's currently being read can bind and deliver its amino acid cargo. Once a tRNA is snugly bound to its matching codon in the ribosome, its amino acid will be added the end of the polypeptide chain.
  1. Matching tRNA binds to exposed codon in rightmost slot of ribosome.
  2. Chain of amino acids is transferred from tRNA in middle slot of ribosome onto the amino acid of the tRNA in the rightmost slot. This has the effect of adding the amino acid to the end of the amino acid chain.
  3. The ribosome shifts one codon over. The tRNA formerly in the middle slot moves to the leftmost slot and exits the ribosome. The tRNA formerly in the right slot moves into the middle slot and continues to hold the amino acid chain. A new codon is exposed in the rightmost slot for a new tRNA to bind to.
This process repeats many times, with the ribosome moving down the mRNA one codon at a time. A chain of amino acids is built up one by one, with an amino acid sequence that matches the sequence of codons found in the mRNA. Translation ends when the ribosome reaches a stop codon and releases the polypeptide.

What happens next?

Once the polypeptide is finished, it may be processed or modified, combine with other polypeptides, or be shipped to a specific destination inside or outside the cell. Ultimately, it will perform a specific job needed by the cell or organism – perhaps as a signaling molecule, structural element, or enzyme!

Summary:

  • DNA is divided up into functional units called genes, which may specify polypeptides (proteins and protein subunits) or functional RNAs (such as tRNAs and rRNAs).
  • Information from a gene is used to build a functional product in a process called gene expression.
  • A gene that encodes a polypeptide is expressed in two steps. In this process, information flows from DNA \rightarrow RNA \rightarrow protein, a directional relationship known as the central dogma of molecular biology.
    • Transcription: One strand of the gene's DNA is copied into RNA. In eukaryotes, the RNA transcript must undergo additional processing steps in order to become a mature messenger RNA (mRNA).
    • Translation: The nucleotide sequence of the mRNA is decoded to specify the amino acid sequence of a polypeptide. This process occurs inside a ribosome and requires adapter molecules called tRNAs.
  • During translation, the nucleotides of the mRNA are read in groups of three called codons. Each codon specifies a particular amino acid or a stop signal. This set of relationships is known as the genetic code.
This article is licensed under a CC BY-NC-SA 4.0 license.

Works cited:

  1. Hellens, R. P., Moreau, C., Lin-Wang, K., Schwinn, K. E., Thomson, S. J., Fiers, M. W. E. J., . . . Noel Ellis, T. H. (2010, October 11). Identification of Mendel's white flower character. PLOS ONE. http://dx.doi.org/10.1371/journal.pone.0013230.
  2. Reece, J. B., Urry, L. A., Cain, M. L., Wasserman, S. A., Minorsky, P. V., and Jackson, R. B. (2011). Figure 14.4. Alleles, alternative versions of a gene. In Campbell biology (10th ed., p. 271). San Francisco, CA: Pearson.
  3. CyberBridge. (2007). RNA structure. In Structure of DNA. Retrieved from http://cyberbridge.mcb.harvard.edu/dna_3.html.

References:

Hellens, R. P., Moreau, C., Lin-Wang, K., Schwinn, K. E., Thomson, S. J., Fiers, M. W. E. J., . . . Noel Ellis, T. H. (2010, October 11). Identification of Mendel's white flower character. PLOS ONE. http://dx.doi.org/10.1371/journal.pone.0013230.
OpenStax College, Biology. (2015, December 29). The genetic code. In OpenStax CNX. Retrieved from http://cnx.org/contents/GFy_h8cu@9.87:QEibhJMi@8/The-Genetic-Code.
Purves, W. K., Sadava, D. E., Orians, G. H., and Heller, H.C. (2004). DNA, RNA, and the flow of information. In Life: the science of biology (7th ed., pp. 236-237). Sunderland, MA: Sinauer Associates.
Reece, J. B., Urry, L. A., Cain, M. L., Wasserman, S. A., Minorsky, P. V., and Jackson, R. B. (2011). Genes specify proteins via transcription and translatioin. In Campbell biology (10th ed., pp. 334-340). San Francisco, CA: Pearson.

Acknowledgements:

Many thanks to Willy McAllister for helpful comments on this article.
Loading