If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

The genetic code

AP.BIO:
IST‑1 (EU)
,
IST‑1.N (LO)
,
IST‑1.N.1 (EK)
,
IST‑1.N.2 (EK)
The genetic code links groups of nucleotides in an mRNA to amino acids in a protein. Start codons, stop codons, reading frame.

Introduction

Have you ever written a secret message to one of your friends? If so, you may have used a code to keep the message hidden. For instance, you may have replaced the letters of the word with numbers or symbols, following a particular set of rules. In order for your friend to understand the message, they would need to know the code and apply the same set of rules, in reverse, to decode it.
Decoding messages is also a key step in gene expression, in which information from a gene is read out to build a protein. In this article, we'll take a closer look at the genetic code, which allows DNA and RNA sequences to be "decoded" into the amino acids of a protein.

Background: Making a protein

Genes that provide instructions for proteins are expressed in a two-step process.
  • In transcription, the DNA sequence of a gene is "rewritten" in RNA. In eukaryotes, the RNA must go through additional processing steps to become a messenger RNA, or mRNA.
  • In translation, the sequence of nucleotides in the mRNA is "translated" into a sequence of amino acids in a polypeptide (protein chain).
If this is a new concept for you, you may want to learn more by watching Sal's video on transcription and translation.

Codons

Cells decode mRNAs by reading their nucleotides in groups of three, called codons. Here are some features of codons:
  • Most codons specify an amino acid
  • Three "stop" codons mark the end of a protein
  • One "start" codon, AUG, marks the beginning of a protein and also encodes the amino acid methionine
Codons in an mRNA are read during translation, beginning with a start codon and continuing until a stop codon is reached. mRNA codons are read from 5' to 3' , and they specify the order of amino acids in a protein from N-terminus (methionine) to C-terminus.
The mRNA sequence is:
5'-AUGAUCUCGUAA-5'
Translation involves reading the mRNA nucleotides in groups of three; each group specifies an amino acid (or provides a stop signal indicating that translation is finished).
3'-AUG AUC UCG UAA-5'
AUG right arrow Methionine (Start) AUC right arrow Isoleucine UCG right arrow Serine UAA right arrow "Stop"
Polypeptide sequence: (N-terminus) Methionine-Isoleucine-Serine (C-terminus)

The genetic code table

The full set of relationships between codons and amino acids (or stop signals) is called the genetic code. The genetic code is often summarized in a table.
Genetic code table. Each three-letter sequence of mRNA nucleotides corresponds to a specific amino acid, or to a stop codon. UGA, UAA, and UAG are stop codons. AUG is the codon for methionine, and is also the start codon.
Image credit: "The genetic code," by OpenStax College, Biology (CC BY 3.0).
Notice that many amino acids are represented in the table by more than one codon. For instance, there are six different ways to "write" leucine in the language of mRNA (see if you can find all six).
An important point about the genetic code is that it's universal. That is, with minor exceptions, virtually all species (from bacteria to you!) use the genetic code shown above for protein synthesis.

Reading frame

To reliably get from an mRNA to a protein, we need one more concept: that of reading frame. Reading frame determines how the mRNA sequence is divided up into codons during translation.
That's a pretty abstract concept, so let's look at an example to understand it better. The mRNA below can encode three totally different proteins, depending on the frame in which it's read:
mRNA sequence: 5'-UCAUGAUCUCGUAAGA-3'
Read in Frame 1:
5'-UCA UGA UCU CGU AAG A-3'
Ser-STOP-Ser-Arg-Lys
Read in Frame 2:
5'-U CAU GAU CUC GUA AGA-3'
His-Asp-Leu-Val-Arg
Read in Frame 3:
5'-UC AUG AUC UCG UAA GA-3'
Met(Start)-Ile-Ser-STOP
The start codon's position ensures that Frame 3 is chosen for translation of the mRNA.
So, how does a cell know which of these protein to make? The start codon is the key signal. Because translation begins at the start codon and continues in successive groups of three, the position of the start codon ensures that the mRNA is read in the correct frame (in the example above, in Frame 3).
Mutations (changes in DNA) that insert or delete one or two nucleotides can change the reading frame, causing an incorrect protein to be produced "downstream" of the mutation site:
Illustration shows a frameshift mutation in which the reading frame is altered by the deletion of two amino acids.
_Image credit; "The genetic code: Figure 3," by OpenStax College, Biology, CC BY 4.0._

How was the genetic code discovered?

The story of how the genetic code was discovered is a pretty cool and epic one. We've stashed our version in the pop-up below, so as not to distract you if you're in a hurry. However, if you have some time, it's definitely interesting reading.
I always like to imagine how cool it would have been to be one of the people who discovered the basic molecular code of life. Although we now know the code, there are many other biological mysteries still waiting to be solved (perhaps by you!).

Want to join the conversation?

  • leaf grey style avatar for user Andres Cantu
    Are Glutamate (Glu) and Glutamine (Gln) interchangeable? or there is something wrong with the example on reading the codon table, because CAG codes for Gln, not Glu.
    (9 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user SeekerAtFarnham
    When does the tRNA know when to use AUG as a start codon and when to code Methionine? Are there other influencers
    (6 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      Excellent question!

      Translation is quite bit more complicated that this introductory material can cover.

      The sequence of the mRNA around a potential start codon influences whether or not it will be used§. These sequences are bound by proteins that help guide the ribosome to assemble at the correct place to start translation.
      (In fact, codons other than AUG are sometimes used as start codons!)

      This is covered in a bit more detail in another article:
      https://www.khanacademy.org/science/biology/gene-expression-central-dogma/translation-polypeptides/a/the-stages-of-translation

      I also encourage you to look at some of the references for that section, which will help give you more detail on this high complex process that is still being actively studied.


      §Note: The mechanisms are very different in prokaryotic and eukaryotic organisms — they can also vary between different species and even for different genes!
      (5 votes)
  • piceratops seed style avatar for user cwdean592
    would it be possible to use the "coding language" of RNA to synthesize chemicals?
    (5 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user Arki🖤
    Why is AUG a start codon and UAA , UGA and UAG stop codons?
    (5 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Pelekanos
    I have heard that the 3' end of mrna is rich in stop codons so that in case of a mutation the peptide gets released but I am unable to find an article about that. Can someone confirm if this is true or not?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • winston baby style avatar for user Ivana - Science trainee
      You are correct.

      Usually nucleotides present in mRNA channel downstream the A site help determining the future.
      The expected hierarchy in the intrinsic fidelity of the
      stop codons (UAA>UAG>>UGA) was observed, with
      highly influential effects on termination readthrough
      mediated by nucleotides at position +4 and position
      +8.


      https://watermark.silverchair.com/gkx1315.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAlcwggJTBgkqhkiG9w0BBwagggJEMIICQAIBADCCAjkGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMiuKY1yQvGdfscjqKAgEQgIICCmJx0U3b-ecY2oxn1QYcqc6a6QXYNlX9FCUngn9zbbPC6TWDitK20Cl65tVkDb6ARGxakyB0TEEbjl2c5gg6rg2qBTOI7x9Vy8585GIls0cxO0YkUJjM5nl4tIHHoOTo9GSTyGAW827IJoH0xMHIBZC6tWuwCiR6jqOaN1HrKwsQVlraRvdQyJb9eCxJcVkE-No67IraffHateNr-8Xin1lgr4vGQAfQXU9PjGDIReo41KpdTVC4ROs0BWMsX5SiIrOq0CT2I_d8aPe3BoxnnN5Vwdb-tIzNAmBaBiIlyQa2NBwBvWioTTqoTIlkqhVX4USGtnaevTT72XcMrlPPZm-hY4KtVOzqRFEiJZvumj8GsYH5VL8XA-vT_ZHLfZxscDuS2AaEIts5h3YNsYXoB_VtpESmnQzfU8QXfocNOamKdN2HvESBttG-e1DGLH7er75hfzVjy99742-LR77NeJApSW8uphwYIJGkdiRMkKm33yLfYQi2FH7UjzzmPuBukRAYG9gDCtTozVMKGh25SeJhmtQ2ASplMszMGS0eHfdOEFXsP3xM7Y_qNU8Bp3Er0_1f-3QzZrvK4R0HBzKUFaBhBxzm36nDFx7kMyvupiurNRcLbGuj65jWL5ezK4Rel-eplBH3Zv087GDxgvSEss9ZFntFfyS1O0Ra3yW8F6OFRZNJY86-N0puzw

      There are also cases where there are mutations non-stop codon so transcription cannot stop.

      https://www.cell.com/cell/pdf/S0092-8674(16)30788-7.pdf
      (2 votes)
  • blobby green style avatar for user Priyanka
    In the section, Reading Frame, frameshift mutations are mentioned.

    Point mutations will shift the frame of reference.
    The insertion or deletion of three(or it's multiple )bases would insert or delete one or more codons or amino acids, without shifting the reading frame. But addition or subtraction of amino acids from a polypeptide would transform it..... How is this dealt with?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      How small "in frame" indels (insertions and deletions) are dealt with depends on many factors including where in the gene the indel happens — so the short answer is "it depends".

      For example, if you disrupt the catalytic site of an enzyme the effect will probably be the same as if the protein was never produced at all — this is likely to lead to a complete loss (assuming the mutation is homozygous) of that enzyme activity — the effect on the cell could be anything from fatal to unnoticeable (depending on how critical that enzyme activity is in that cell).

      On the other hand, some proteins have loops of amino acid sequences on their surfaces that do not appear to be critically important and making those loops a little longer or shorter might have little or no effect on the protein function.

      (Note that we only use "point mutation" to refer to mutations that change a base — not for deletions of a single base pair.)
      (2 votes)
  • starky sapling style avatar for user Dana Alkudsi
    So the genetic code is the mRNA sequence of bases and it starts from the 5' to the 3' and it is the coding strand. Now if we want to find the tRNA sequence, which is the template or the non-coding, for ACU, for example, we start at 3' to 5' and we write it as TGA? Is that the correct way or am I missing something?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user David Afang
    how many alleles are expressed when a b cell carrying two alleles encode immunoglobulin heavy and light chains
    (2 votes)
    Default Khan Academy avatar avatar for user
  • orange juice squid orange style avatar for user Juanita Havelaar
    Are proteins made at the same time as new DNA? Does DNA unwind when it makes proteins?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • leafers seed style avatar for user skilfoy
      The DNA that isn't being utilized is very tightly packaged, and contrarily, the DNA that is being utilized is unwound, so yes, in a sense, but your choice of words is slightly off... DNA unwinds to be transcribed into RNA, which eventually makes its way to a ribosome, which then gets translated into protein. So you are somewhat correct, just your word choice is off. Don't forget the central dogma: DNA->RNA->protein, that middle molecule is essential.
      (3 votes)
  • purple pi teal style avatar for user genesis101705
    How do mutations occur in the genetic code?
    (2 votes)
    Default Khan Academy avatar avatar for user