If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Eukaryotic pre-mRNA processing

AP.BIO:
IST‑1 (EU)
,
IST‑1.N (LO)
,
IST‑1.N.1 (EK)
,
IST‑1.N.6 (EK)
5' cap and poly-A tail. Splicing, introns, and exons.

Key points:

  • When an RNA transcript is first made in a eukaryotic cell, it is considered a pre-mRNA and must be processed into a messenger RNA (mRNA).
  • A 5' cap is added to the beginning of the RNA transcript, and a 3' poly-A tail is added to the end.
  • In splicing, some sections of the RNA transcript (introns) are removed, and the remaining sections (exons) are stuck back together.
  • Some genes can be alternatively spliced, leading to the production of different mature mRNA molecules from the same initial transcript.

Introduction

Imagine that you run a book-making factory, and you've just printed up all the pages of your favorite book. Now that you have the pages, is the book ready to go? Well...books usually have front and back covers. So you might want to put those on. Also, were there any blank or messed-up pages made during printing? You should probably check for those and remove them before selling your books, or you might end up with some unhappy customers.
The steps we just talked about are pretty similar to what happens to RNA transcripts in the cells of your body. In humans and other eukaryotes, a freshly made RNA transcript (hot off the RNA polymerase "presses") is not quite ready to go. Instead, it's called a pre-mRNA and has to go through some processing steps to become a mature messenger RNA (mRNA) that can be translated into a protein. These include:
  • Addition of cap and tail molecules to the two ends of the transcript. These play a protective role, like a book's front and back covers.
  • Removal of "junk" sequences called introns. Introns are sort of like blank or messed-up pages made during a book's printing, which have to be removed in order for the book to be readable .
In this article, we'll take a closer look at the cap, tail, and splicing modifications that eukaryotic RNA transcripts receive, seeing how they're carried out and why they are important for making sure we get the right protein from our RNA.

Overview of pre-mRNA processing in eukaryotes

As a quick review, gene expression (the "reading out" of a gene to make a protein, or chunk of a protein) happens a little bit differently in bacteria and eukaryotes such as humans.
Left panel: eukaryotic cell. In the nucleus, a pre-mRNA is produced through transcription of a region of DNA from a linear chromosome. This transcript must undergo processing (splicing and addition of 5' cap and poly-A tail) while it is still in the nucleus in order to become a mature mRNA. The mature mRNA is exported from the nucleus to the cytosol, where it is translated at a ribosome to make a polypeptide.
Right panel: bacterium. The DNA takes the form of a circular chromosome and is located in the cytosol. While the DNA is being transcribed to make an RNA, the RNA (which is already considered a mRNA at this point) can associate with a ribosome and start being translated to make a polypeptide.
In bacteria, RNA transcripts are ready to act as messenger RNAs and get translated into proteins right away. In eukaryotes, things are a little more complex, though in an pretty interesting way. The molecule that's directly made by transcription in one of your (eukaryotic) cells is called a pre-mRNA, reflecting that it needs to go through a few more steps to become an actual messenger RNA (mRNA). These are:
  • Addition of a 5' cap to the beginning of the RNA
  • Addition of a poly-A tail (tail of A nucleotides) to the end of the RNA
  • Chopping out of introns, or "junk" sequences, and pasting together of the remaining, good sequences (exons)
Once it's completed these steps, the RNA is a mature mRNA. It can travel out of the nucleus and be used to make a protein.

5' cap and poly-A tail

Both ends of a pre-mRNA are modified by the addition of chemical groups. The group at the beginning (5' end) is called a cap, while the group at the end (3' end) is called a tail. Both the cap and the tail protect the transcript and help it get exported from the nucleus and translated on the ribosomes (protein-making "machines") found in the cytosolstart superscript, 1, end superscript.
The 5’ cap is added to the first nucleotide in the transcript during transcription. The cap is a modified guanine (G) nucleotide, and it protects the transcript from being broken down. It also helps the ribosome attach to the mRNA and start reading it to make a protein.
Image of a pre-mRNA with a 5' cap and 3' poly-A tail. The 5' cap is on the 5' end of the pre-mRNA and is a modified G nucleotide. The poly-A tail is on the 3' end of the pre-mRNA and consists of a long string of A nucleotides (only a few of which are shown).
How is the poly-A tail added? The 3' end of the RNA forms in kind of a bizarre way. When a sequence called a polyadenylation signal shows up in an RNA molecule during transcription, an enzyme chops the RNA in two at that site. Another enzyme adds about 100
200 adenine (A) nucleotides to the cut end, forming a poly-A tail. The tail makes the transcript more stable and helps it get exported from the nucleus to the cytosol.

RNA splicing

The third big RNA processing event that happens in your cells is RNA splicing. In RNA splicing, specific parts of the pre-mRNA, called introns are recognized and removed by a protein-and-RNA complex called the spliceosome. Introns can be viewed as "junk" sequences that must be cut out so the "good parts version" of the RNA molecule can be assembled.
What are the "good parts"? The pieces of the RNA that are not chopped out are called exons. The exons are pasted together by the spliceosome to make the final, mature mRNA that is shipped out of the nucleus.
Diagram of a pre-mRNA showing exons and introns. Along the length of the mRNA, there is an alternating pattern of exons and introns: Exon 1 - Intron 1 - Exon 2 - Intron 2 - Exon 3. Each consists of a stretch of RNA nucleotides. During splicing, the introns are revmoved from the pre-mRNA, and the exons are stuck together to form a mature mRNA that does not contain the intron sequences.
A key point here is that it's only the exons of a gene that encode a protein. Not only do the introns not carry information to build a protein, they actually have to be removed in order for the mRNA to encode a protein with the right sequence. If the spliceosome fails to remove an intron, an mRNA with extra "junk" in it will be made, and a wrong protein will get produced during translation.

Alternative splicing

Why splice? We don't know for sure why splicing exists, and in some ways, it seems like a wasteful system. However, splicing does allow for a process called alternative splicing, in which more than one mRNA can be made from the same gene. Through alternative splicing, we (and other eukaryotes) can sneakily encode more different proteins than we have genes in our DNA.
In alternative splicing, one pre-mRNA may be spliced in either of two (or sometimes many more than two!) different ways. For example, in the diagram below, the same pre-mRNA can be spliced in three different ways, depending on which exons are kept. This results in three different mature mRNAs, each of which translates into a protein with a different structure.
Diagram of alternative splicing.
A sequence of DNA encodes a pre-mRNA transcript that contains five regions that may potentially be used as exons: Exon 1, Exon 2, Exon 3, Exon 4, and Exon 5. The exons are arranged in linear order along the pre-mRNA and have introns in between them.
In splicing event #1, all five exons are retained in the mature mRNA. It consists of Exon 1 - Exon 2 - Exon 3 - Exon 4 - Exon 5. When it is translated, it specifies Protein A, a protein with five domains: Coil 1 (specified by Exon 1), Coil 2 (specified by Exon 2), Loop 3 (specified by Exon 3), Loop 4 (specified by Exon 4), and Coil 5 (specified by Exon 5).
In splicing event #2, Exon 3 is not included in the mature mRNA. It consists of Exon 1 - Exon 2 - Exon 4 - Exon 5. When it is translated, it specifies Protein, B a protein with four domains: Coil 1 (specified by Exon 1), Coil 2 (specified by Exon 2), Loop 4 (specified by Exon 4), and Coil 5 (specified by Exon 5). It does not contain Loop 3 because Exon 3 is not present in the mRNA.
In splicing event #3, Exon 4 is not included in the mature mRNA. It consists of Exon 1 - Exon 2 - Exon 3 - Exon 5. When it is translated, it specifies Protein C, a protein with four domains: Coil 1 (specified by Exon 1), Coil 2 (specified by Exon 2), Loop 3 (specified by Exon 3), and Coil 5 (specified by Exon 5). It does not contain Loop 4 because Exon 4 is not present in the mRNA.
_Image credit: "DNA, alternative splicing," by the National Human Genome Research Institute (public domain)._

Try it yourself: Splice the message

Your mission, should you choose to accept it: decode the following top-secret message. First, remove the "junk" letters, colored in purple and underlined. Second, put the remaining letters into groups of three, starting at the beginning.
start color #ca337c, start bold text, T, H, E, D, O, G, R, end bold text, end color #ca337cstart color #aa87ff, start underline, start text, A, M, A, P, Q, end text, end underline, end color #aa87ffstart color #ca337c, start bold text, A, N, A, N, D, A, end bold text, end color #ca337cstart color #aa87ff, start underline, start text, Z, A, P, T, Q, M, end text, end underline, end color #aa87ffstart color #ca337c, start bold text, T, E, T, H, E, H, A, T, end bold text, end color #ca337c
Have you given it a try?
  • If you remove the purple sequences, you should get this series of letters:
  • If you group the remaining letters into sets of three, you should get this message:
The process you just went through is basically what your cells must do when they express a gene. As we discussed earlier in the article, most eukaryotic pre-mRNAs contain "junk" sequences called introns, which are like the purple letters in the message. These sequences must be removed, and the meaningful sequences (exons), equivalent to the maroon letters in the message above, must be stuck back together to make a mature mRNA.
During translation, the mRNA sequence is read in groups of three nucleotides. Each three-letter "word" corresponds to an amino acid that's added to a polypeptide (protein or protein subunit). If an RNA hasn't been spliced, it will contain extra nucleotides that it shouldn't, leading to an incorrect protein "message." Something similar happens if we try to decode the message above without removing the purple letters:
start color #ca337c, start bold text, T, H, E, space, D, O, G, space, R, end bold text, end color #ca337cstart color #aa87ff, start underline, start text, A, M, space, A, P, Q, end text, end underline, end color #aa87ffstart color #ca337c, start bold text, space, A, N, A, space, N, D, A, end bold text, end color #ca337cstart color #aa87ff, start underline, start text, space, Z, A, P, space, T, Q, M, space, end text, end underline, end color #aa87ffstart color #ca337c, start bold text, T, E, T, space, H, E, H, space, A, T, end bold text, end color #ca337c
Just as removing the purple letters from the sentence is key to ending up with the right message, so splicing is key to ensuring that an mRNA carries the right information (and directs production of the correct polypeptide).

Want to join the conversation?

  • leafers ultimate style avatar for user isaacboardman
    Why in the last example, using an actual RNA molecule, is methionine coded by the codon AUC? Methionine is referenced as being coded by the codon AUG in the table provided.
    (21 votes)
    Default Khan Academy avatar avatar for user
  • leafers tree style avatar for user Lim Pin Seng
    why does the introns exist at the first place just waiting to be splice ?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Kevin D. Fettel
      Lim Pin Seng,

      Introns allow for alternative splicing; generating multiple proteins from a single gene. It adds a layer of complexity to an organism, without having to drastically extend the genome length. In return, it may also save energy as the cell does not have to replicate as long of a genome - a reasonable explanation as to why introns may be favored.

      Further, introns may possess regulatory processes or code for functional RNA products. In addition, introns may also be mobile elements, contributing to the overall variation of the genetic pool.

      Hope this helps!
      (23 votes)
  • orange juice squid orange style avatar for user fiacl
    Why are the introns referred to as "junk" (RNA splicing section)? Don't they play a role in gene expression regulation? Thank you.
    (5 votes)
    Default Khan Academy avatar avatar for user
    • leafers ultimate style avatar for user wrharris93
      I think they are only considered "junk" in terms of what they contribute to the resultant protein. They do likely play a role in regulation, but because they are spliced out before translation, they will not effect the protein that results from translating the mature mRNA sequence.
      (7 votes)
  • blobby green style avatar for user Maria Pikoula
    Is it possible that DNA introns/splicing exist so that bacteria can't copy eukaryotes' DNA and express the same proteins? As they are known to steal DNA that floats around in general..
    (4 votes)
    Default Khan Academy avatar avatar for user
  • mr pants teal style avatar for user ar05181
    What happens if a new slice acceptor site is created on the 1st intron? What happens to the new messenger RNA created?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • winston baby style avatar for user Ivana - Science trainee
      Usually every intron has donor (splicing site at beginning of intron – 5') and acceptor (splicing site at the end of intron 5') sites. Splicing occurs at those specific sites just like on the photo labelled. At the 5' end the DNA nucleotides are GT [GU in the pre-messenger RNA (pre-mRNA)]; at the 3' end they are AG. These nucleotides are part of the splicing sites. (1)(2)



      So what would happen if new acceptor site appears on the first intron? First of all, first intron has to gets excised. Second, what do you mean by if 'new' appeared? New cannot appear if existing donor and acceptor sites already exist because it means that intron is excised already and does not bother mRNA anymore.



      I'd rephrase question, WHAT WOULD HAPPEN IF SPLICE ACCEPTOR SITE APPEARED IN THE MIDDLE OF FIRST INTRON INSTEAD OF in the 5' region of intron? Imagine, what if AG-GU is in the middle of intron?

      Well what is in between would be excised. As for the 'sticky ends' hanging in there in the processed mRNA while being translated will end up in faulty protein again. That's how mutations cause diseases.



      Hmmm but don't AG-GU is kind of marker used to determine the beginning and ending of intron? While it can be, if mutation happened and let's say changed AA into AG and CU into GU – it accidentally shortened intron which could be recognized by spliceosome but original noncoding region still leaves. If original DNA sequence does not have GU in the middle of intron, than it is mutation. The splicing mutation may occur in both introns and exons and disrupt existing splice sites or splicing regulatory sequences (intronic and exonic splicing silencers and enhancers), create new ones, or activate the cryptic ones. (2) (3) (4)



      But I have to spice it up. Did you know that 98.7% of exon/intron sequences contain AG-GU? Which means that 1.3% have different canonical sequence.





      http://www.imgt.org/IMGTeducation/Aide-memoire/_UK/splicing/

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6060985/

      https://www.ncbi.nlm.nih.gov/pubmed/1556139

      https://link.springer.com/article/10.1007/BF00216459
      (2 votes)
  • piceratops ultimate style avatar for user Seaj1
    In the little drop-section explaining more about spliceosomes, it states "Once the intron has been cut out, the spliceosome will "glue" (ligate) the flanking exons together." How would this work with alternative splicing? Additionally for alternative splicing, can only one exon be removed? More alternatives could be created through removing two exons or switching the exons around. Ex: 145 and 14235
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Shashank Shekhar
    Why prokaryotes do not require these post trancriptional mechanisms as needed in case of Eukaryotes? Does it mean that Eukaryotes' trancripts are free of introns?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      Prokaryotes do have some post-transcriptional modifications, but introns are much less common and as far as I know are always self-splicing — i.e. don't require a spliceosome.

      Mature mRNAs in eukaryotes generally lack introns, but note that alternative splicing means some sequences can act as either introns or exons,
      (2 votes)
  • blobby green style avatar for user draawt
    Why there is only Poly A tail? What can be possible problems with Poly G, U or C?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Priyanka
    With so many mRNA molecules being manufactured all the time, shouldn't all that splicing create a build up of spliced out introns in the nucleus?
    Evolution wouldn't waste resources..What happens to them?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user 😊
    RNA SPLICING: 3rd PARAGRAPH:
    If introns are failed to be spliced how can a wrong protein be translated. Shouldn't the protein be translated as per usual since introns are non-coding?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • primosaur ultimate style avatar for user Noah Pettinari
      What is meant by "non-coding" is that the RNA found in introns doesn't code for any specific part of the protein. Introns are made of the same RNA that exons are made of, and so if they are not spliced before the mature mRNA leaves the nucleus, there will be additional amino acids coded from the mRNA sequence that will result in an erroneous protein.
      (1 vote)