5' cap and poly-A tail. Splicing, introns, and exons.

Key points:

  • When an RNA transcript is first made in a eukaryotic cell, it is considered a pre-mRNA and must be processed into a messenger RNA (mRNA).
  • A 5' cap is added to the beginning of the RNA transcript, and a 3' poly-A tail is added to the end.
  • In splicing, some sections of the RNA transcript (introns) are removed, and the remaining sections (exons) are stuck back together.
  • Some genes can be alternatively spliced, leading to the production of different mature mRNA molecules from the same initial transcript.


Imagine that you run a book-making factory, and you've just printed up all the pages of your favorite book. Now that you have the pages, is the book ready to go? Well...books usually have front and back covers. So you might want to put those on. Also, were there any blank or messed-up pages made during printing? You should probably check for those and remove them before selling your books, or you might end up with some unhappy customers.
The steps we just talked about are pretty similar to what happens to RNA transcripts in the cells of your body. In humans and other eukaryotes, a freshly made RNA transcript (hot off the RNA polymerase "presses") is not quite ready to go. Instead, it's called a pre-mRNA and has to go through some processing steps to become a mature messenger RNA (mRNA) that can be translated into a protein. These include:
  • Addition of cap and tail molecules to the two ends of the transcript. These play a protective role, like a book's front and back covers.
  • Removal of "junk" sequences called introns. Introns are sort of like blank or messed-up pages made during a book's printing, which have to be removed in order for the book to be readable .
In this article, we'll take a closer look at the cap, tail, and splicing modifications that eukaryotic RNA transcripts receive, seeing how they're carried out and why they are important for making sure we get the right protein from our RNA.

Overview of pre-mRNA processing in eukaryotes

As a quick review, gene expression (the "reading out" of a gene to make a protein, or chunk of a protein) happens a little bit differently in bacteria and eukaryotes such as humans.
Left panel: eukaryotic cell. In the nucleus, a pre-mRNA is produced through transcription of a region of DNA from a linear chromosome. This transcript must undergo processing (splicing and addition of 5' cap and poly-A tail) while it is still in the nucleus in order to become a mature mRNA. The mature mRNA is exported from the nucleus to the cytosol, where it is translated at a ribosome to make a polypeptide.
Right panel: bacterium. The DNA takes the form of a circular chromosome and is located in the cytosol. While the DNA is being transcribed to make an RNA, the RNA (which is already considered a mRNA at this point) can associate with a ribosome and start being translated to make a polypeptide.
In bacteria, RNA transcripts are ready to act as messenger RNAs and get translated into proteins right away. In eukaryotes, things are a little more complex, though in an pretty interesting way. The molecule that's directly made by transcription in one of your (eukaryotic) cells is called a pre-mRNA, reflecting that it needs to go through a few more steps to become an actual messenger RNA (mRNA). These are:
  • Addition of a 5' cap to the beginning of the RNA
  • Addition of a poly-A tail (tail of A nucleotides) to the end of the RNA
  • Chopping out of introns, or "junk" sequences, and pasting together of the remaining, good sequences (exons)
Once it's completed these steps, the RNA is a mature mRNA. It can travel out of the nucleus and be used to make a protein.

5' cap and poly-A tail

Both ends of a pre-mRNA are modified by the addition of chemical groups. The group at the beginning (5' end) is called a cap, while the group at the end (3' end) is called a tail. Both the cap and the tail protect the transcript and help it get exported from the nucleus and translated on the ribosomes (protein-making "machines") found in the cytosol1^1.
The 5’ cap is added to the first nucleotide in the transcript during transcription. The cap is a modified guanine (G) nucleotide, and it protects the transcript from being broken down. It also helps the ribosome attach to the mRNA and start reading it to make a protein.
Image of a pre-mRNA with a 5' cap and 3' poly-A tail. The 5' cap is on the 5' end of the pre-mRNA and is a modified G nucleotide. The poly-A tail is on the 3' end of the pre-mRNA and consists of a long string of A nucleotides (only a few of which are shown).
How is the poly-A tail added? The 3' end of the RNA forms in kind of a bizarre way. When a sequence called a polyadenylation signal shows up in an RNA molecule during transcription, an enzyme chops the RNA in two at that site. Another enzyme adds about 100100 200200 adenine (A) nucleotides to the cut end, forming a poly-A tail. The tail makes the transcript more stable and helps it get exported from the nucleus to the cytosol.

RNA splicing

The third big RNA processing event that happens in your cells is RNA splicing. In RNA splicing, specific parts of the pre-mRNA, called introns are recognized and removed by a protein-and-RNA complex called the spliceosome. Introns can be viewed as "junk" sequences that must be cut out so the "good parts version" of the RNA molecule can be assembled.
What are the "good parts"? The pieces of the RNA that are not chopped out are called exons. The exons are pasted together by the spliceosome to make the final, mature mRNA that is shipped out of the nucleus.
Diagram of a pre-mRNA showing exons and introns. Along the length of the mRNA, there is an alternating pattern of exons and introns: Exon 1 - Intron 1 - Exon 2 - Intron 2 - Exon 3. Each consists of a stretch of RNA nucleotides. During splicing, the introns are revmoved from the pre-mRNA, and the exons are stuck together to form a mature mRNA that does not contain the intron sequences.
A key point here is that it's only the exons of a gene that encode a protein. Not only do the introns not carry information to build a protein, they actually have to be removed in order for the mRNA to encode a protein with the right sequence. If the spliceosome fails to remove an intron, an mRNA with extra "junk" in it will be made, and a wrong protein will get produced during translation.
Splicing needs to precise and consistent. This careful cutting and pasting is performed by the spliceosome, an enzyme complex made of protein and small RNAs. Most introns contain marker sequences at both of their ends, which are recognized by the small RNAs and direct the spliceosome to remove the intron. Once the intron has been cut out, the spliceosome will "glue" (ligate) the flanking exons together.

Alternative splicing

Why splice? We don't know for sure why splicing exists, and in some ways, it seems like a wasteful system. However, splicing does allow for a process called alternative splicing, in which more than one mRNA can be made from the same gene. Through alternative splicing, we (and other eukaryotes) can sneakily encode more different proteins than we have genes in our DNA.
In alternative splicing, one pre-mRNA may be spliced in either of two (or sometimes many more than two!) different ways. For example, in the diagram below, the same pre-mRNA can be spliced in three different ways, depending on which exons are kept. This results in three different mature mRNAs, each of which translates into a protein with a different structure.
Diagram of alternative splicing.
A sequence of DNA encodes a pre-mRNA transcript that contains five regions that may potentially be used as exons: Exon 1, Exon 2, Exon 3, Exon 4, and Exon 5. The exons are arranged in linear order along the pre-mRNA and have introns in between them.
In splicing event #1, all five exons are retained in the mature mRNA. It consists of Exon 1 - Exon 2 - Exon 3 - Exon 4 - Exon 5. When it is translated, it specifies Protein A, a protein with five domains: Coil 1 (specified by Exon 1), Coil 2 (specified by Exon 2), Loop 3 (specified by Exon 3), Loop 4 (specified by Exon 4), and Coil 5 (specified by Exon 5).
In splicing event #2, Exon 3 is not included in the mature mRNA. It consists of Exon 1 - Exon 2 - Exon 4 - Exon 5. When it is translated, it specifies Protein, B a protein with four domains: Coil 1 (specified by Exon 1), Coil 2 (specified by Exon 2), Loop 4 (specified by Exon 4), and Coil 5 (specified by Exon 5). It does not contain Loop 3 because Exon 3 is not present in the mRNA.
In splicing event #3, Exon 4 is not included in the mature mRNA. It consists of Exon 1 - Exon 2 - Exon 3 - Exon 5. When it is translated, it specifies Protein C, a protein with four domains: Coil 1 (specified by Exon 1), Coil 2 (specified by Exon 2), Loop 3 (specified by Exon 3), and Coil 5 (specified by Exon 5). It does not contain Loop 4 because Exon 4 is not present in the mRNA.
_Image credit: "DNA, alternative splicing," by the National Human Genome Research Institute (public domain)._
Great question! Professional biologists would like to know the answer too. It's not totally clear why introns and splicing are so widespread in eukaryotes.
One possibility is that splicing in general permits alternative splicing, which appears to be important for eukaryotes (by allowing them to make more different proteins from a single gene). Also, introns sometimes contain regulatory sequences that control the expression of a gene (aren't really "junk)2^2. Sometimes, small RNAs that regulate gene expression come from introns that are spliced out3^3. Also, the existence of exons and introns may make it easier for new genes and alleles to form through mutation events that mix and match gene segments, which could have evolutionary advantages2,4^{2,4}.

Try it yourself: Splice the message

Your mission, should you choose to accept it: decode the following top-secret message. First, remove the "junk" letters, colored in purple and underlined. Second, put the remaining letters into groups of three, starting at the beginning.
Have you given it a try?
  • If you remove the purple sequences, you should get this series of letters:
  • If you group the remaining letters into sets of three, you should get this message:
    THE DOG R\maroonD{\textbf{THE DOG R}}AN AND A\maroonD{\textbf{AN AND A}}TE THE HAT\maroonD{\textbf{TE THE HAT}}
    Okay, so maybe it's not the most top-secret message in the world. Hopefully that hat was edible!
The process you just went through is basically what your cells must do when they express a gene. As we discussed earlier in the article, most eukaryotic pre-mRNAs contain "junk" sequences called introns, which are like the purple letters in the message. These sequences must be removed, and the meaningful sequences (exons), equivalent to the maroon letters in the message above, must be stuck back together to make a mature mRNA.
During translation, the mRNA sequence is read in groups of three nucleotides. Each three-letter "word" corresponds to an amino acid that's added to a polypeptide (protein or protein subunit). If an RNA hasn't been spliced, it will contain extra nucleotides that it shouldn't, leading to an incorrect protein "message." Something similar happens if we try to decode the message above without removing the purple letters:
THE DOG R\maroonD{\textbf{THE DOG R}}AM APQ\purpleC{\underline{\text{AM APQ}}} ANA NDA\maroonD{\textbf{ ANA NDA}} ZAP TQM \purpleC{\underline{\text{ ZAP TQM }}}TET HEH AT\maroonD{\textbf{TET HEH AT}}
Just as removing the purple letters from the sentence is key to ending up with the right message, so splicing is key to ensuring that an mRNA carries the right information (and directs production of the correct polypeptide).
Here is the pre-mRNA sequence, with exons in bold maroon letters and one intron in underlined purple letters:
If the pre-mRNA is correctly spliced, the nucleotides of the mature mRNA are read in groups of three as follows, producing the polypeptide shown below:
5’-...AUG CAC UAU \text{5'-...}\maroonD{\textbf{AUG CAC UAU }}GAA GCG UAA...-3’\maroonD{\textbf{GAA GCG UAA}}\text{...-3'}
Met - His - Tyr - Glu - Ala - STOP\text{Met - His - Tyr - Glu - Ala - STOP}
If the intron is not removed from the pre-mRNA, the resulting abnormal mRNA contains extra nucleotides. When its nucleotides are read in groups of three, as shown below, a polypeptide with an incorrect amino acid sequence is produced. There are too many amino acids, and the amino acids specified by the second exon are not produced (because the reading frame is shifted):
5’-...AUG CAC UAU \text{5'-...}\maroonD{\textbf{AUG CAC UAU }}GUA UGU GCC UUU UUG UGC UGU G\purpleC{\underline{\text{GUA UGU GCC UUU UUG UGC UGU G}}}GA AGC GUA A...-3’\maroonD{\textbf{GA AGC GUA A}}\text{...-3'}
Met - His - Tyr - Val - Cys - Ala - Phe - Leu - Cys - Cys - Gly - Ser - Val ...\text{Met - His - Tyr - Val - Cys - Ala - Phe - Leu - Cys - Cys - Gly - Ser - Val ...}
All of the above relationships between groups of nucleotides and the amino acids they encode can be determined using the genetic code table:
_Image credit: "The genetic code: Figure 3," by OpenStax College, Biology (CC BY 3.0)._
Intron sequence modified from Bourdin et al.4^4.
This article is licensed under a CC BY-NC-SA 4.0 license.

Works cited:

  1. Reece, J. B., Urry, L. A., Cain, M. L., Wasserman, S. A., Minorsky, P. V., and Jackson, R. B. (2011). Alteration of mRNA ends. In Campbell biology (10th ed., p. 342-343). San Francisco, CA: Pearson.
  2. Reece, J. B., Urry, L. A., Cain, M. L., Wasserman, S. A., Minorsky, P. V., and Jackson, R. B. (2011). The functional and evolutionary importance of introns. In Campbell biology (10th ed., p. 344). San Francisco, CA: Pearson.
  3. Keren, H., Lev-Maor, G., and Ast, G. (2010). Alternative splicing and evolution: diversification, exon definition and function. Nature Reviews Genetics, 11, 345-355. http://dx.doi.org/10.1038/nrg2776. Retrieved from https://www.tau.ac.il/~gilast/PAPERS/nrg2776.pdf.
  4. Standage, D. (2012, April 8). Why do eukaryotic organisms have introns in their DNA? [Answer] In Biology stack exchange. Retrieved from http://biology.stackexchange.com/questions/1724/why-do-eukaryotic-organisms-have-introns-in-their-dna.
  5. Bourdin, C. M., Moignot, B., Wang, L., Murillo, L., Juchaux, M., Quinchard, S., Lapied, B., Guérineau, N. C., Dong, K., and Legros, C. (2013). Intron retention in mRNA encoding ancillary subunit of insect voltage-gated sodium channel modulates channel expression, gating regulation and drug sensitivity. PLoS ONE, 8(8), e67290. http://dx.doi.org/10.1371/journal.pone.0067290.

Additional references:

3'-end cleavage and polyadenylation. (2016). In Nobelprize.org. Retrieved from http://www.nobelprize.org/educational/medicine/dna/a/splicing/splicing_endformation.html.
Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., and Walter, P. (2002). Posttranscriptional controls. In Molecular biology of the cell (4th ed.). New York, NY: Garland Science. Retrieved from http://www.ncbi.nlm.nih.gov/books/NBK26890/.
Intron/introns. (2014). In Scitable. Retrieved from http://www.nature.com/scitable/definition/intron-introns-67.
Lodish, H., Berk, A., Zipursky, S. L., Matsudaira, P., Baltimore, D., and Darnell, J. (2000). Processing of eukaryotic mRNA. In Molecular cell biology (4th ed., section 11.2). New York, NY: W. H. Freeman. Retrieved from http://www.ncbi.nlm.nih.gov/books/NBK21563/.
Lodish, H., Berk, A., Zipursky, S. L., Matsudaira, P., Baltimore, D., and Darnell, J. (2000). Transcription termination. In Molecular cell biology (4th ed., section 11.1). New York, NY: W. H. Freeman. Retrieved from http://www.ncbi.nlm.nih.gov/books/NBK21601/.
Polyadenylation. (2016, January 24). Retrieved February 11, 2016 from Wikipedia: https://en.wikipedia.org/wiki/Polyadenylation.
Raven, P. H., Johnson, G. B., Mason, K. A., Losos, J. B., and Singer, S. R. (2014). Genes and how they work. In Biology (10th ed., AP ed., pp. 278-303). New York, NY: McGraw-Hill.
Reece, J. B., Urry, L. A., Cain, M. L., Wasserman, S. A., Minorsky, P. V., and Jackson, R. B. (2011). Eukaryotic cells modify RNA after transcription . In Campbell biology (10th ed., p. 342-345). San Francisco, CA: Pearson.