If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Molecular mechanism of DNA replication

AP.BIO:
IST‑1 (EU)
,
IST‑1.M (LO)
,
IST‑1.M.1 (EK)
Roles of DNA polymerases and other replication enzymes. Leading and lagging strands and Okazaki fragments.

Key points:

  • DNA replication is semiconservative. Each strand in the double helix acts as a template for synthesis of a new, complementary strand.
  • New DNA is made by enzymes called DNA polymerases, which require a template and a primer (starter) and synthesize DNA in the 5' to 3' direction.
  • During DNA replication, one new strand (the leading strand) is made as a continuous piece. The other (the lagging strand) is made in small pieces.
  • DNA replication requires other enzymes in addition to DNA polymerase, including DNA primase, DNA helicase, DNA ligase, and topoisomerase.

Introduction

DNA replication, or the copying of a cell's DNA, is no simple task! There are about 3 start text, b, i, l, l, i, o, n, end text base pairs of DNA in your genome, all of which must be accurately copied when any one of your trillions of cells dividesstart superscript, 1, end superscript.
The basic mechanisms of DNA replication are similar across organisms. In this article, we'll focus on DNA replication as it takes place in the bacterium E. coli, but the mechanisms of replication are similar in humans and other eukaryotes.
Let's take a look at the proteins and enzymes that carry out replication, seeing how they work together to ensure accurate and complete replication of DNA.

The basic idea

DNA replication is semiconservative, meaning that each strand in the DNA double helix acts as a template for the synthesis of a new, complementary strand.
This process takes us from one starting molecule to two "daughter" molecules, with each newly formed double helix containing one new and one old strand.
Schematic of Watson and Crick's basic model of DNA replication.
  1. DNA double helix.
  2. Hydrogen bonds break and helix opens.
  3. Each strand of DNA acts as a template for synthesis of a new, complementary strand.
  4. Replication produces two identical DNA double helices, each with one new and one old strand.
In a sense, that's all there is to DNA replication! But what's actually most interesting about this process is how it's carried out in a cell.
Cells need to copy their DNA very quickly, and with very few errors (or risk problems such as cancer). To do so, they use a variety of enzymes and proteins, which work together to make sure DNA replication is performed smoothly and accurately.

DNA polymerase

One of the key molecules in DNA replication is the enzyme DNA polymerase. DNA polymerases are responsible for synthesizing DNA: they add nucleotides one by one to the growing DNA chain, incorporating only those that are complementary to the template.
Here are some key features of DNA polymerases:
  • They always need a template
  • They can only add nucleotides to the 3' end of a DNA strand
  • They can't start making a DNA chain from scratch, but require a pre-existing chain or short stretch of nucleotides called a primer
  • They proofread, or check their work, removing the vast majority of "wrong" nucleotides that are accidentally added to the chain
The addition of nucleotides requires energy. This energy comes from the nucleotides themselves, which have three phosphates attached to them (much like the energy-carrying molecule ATP). When the bond between phosphates is broken, the energy released is used to form a bond between the incoming nucleotide and the growing chain.
In prokaryotes such as E. coli, there are two main DNA polymerases involved in DNA replication: DNA pol III (the major DNA-maker), and DNA pol I, which plays a crucial supporting role we'll examine later.

Starting DNA replication

How do DNA polymerases and other replication factors know where to begin? Replication always starts at specific locations on the DNA, which are called origins of replication and are recognized by their sequence.
E. coli, like most bacteria, has a single origin of replication on its chromosome. The origin is about 245 base pairs long and has mostly A/T base pairs (which are held together by fewer hydrogen bonds than G/C base pairs), making the DNA strands easier to separate.
Specialized proteins recognize the origin, bind to this site, and open up the DNA. As the DNA opens, two Y-shaped structures called replication forks are formed, together making up what's called a replication bubble. The replication forks will move in opposite directions as replication proceeds.
Bacterial chromosome. The double-stranded DNA of the circular bacteria chromosome is opened at the origin of replication, forming a replication bubble. Each end of the bubble is a replication fork, a Y-shaped junction where double-stranded DNA is separated into two single strands. New DNA complementary to each single strand is synthesized at each replication fork. The two forks move in opposite directions around the circumference of the bacterial chromosome, creating a larger and larger replication bubble that grows at both ends.
Diagram based on similar illustration in Reece et al. squared.
How does replication actually get going at the forks? Helicase is the first replication enzyme to load on at the origin of replicationcubed. Helicase's job is to move the replication forks forward by "unwinding" the DNA (breaking the hydrogen bonds between the nitrogenous base pairs).
Proteins called single-strand binding proteins coat the separated strands of DNA near the replication fork, keeping them from coming back together into a double helix.

Primers and primase

DNA polymerases can only add nucleotides to the 3' end of an existing DNA strand. (They use the free -OH group found at the 3' end as a "hook," adding a nucleotide to this group in the polymerization reaction.) How, then, does DNA polymerase add the first nucleotide at a new replication fork?
Alone, it can't! The problem is solved with the help of an enzyme called primase. Primase makes an RNA primer, or short stretch of nucleic acid complementary to the template, that provides a 3' end for DNA polymerase to work on. A typical primer is about five to ten nucleotides long. The primer primes DNA synthesis, i.e., gets it started.
Once the RNA primer is in place, DNA polymerase "extends" it, adding nucleotides one by one to make a new DNA strand that's complementary to the template strand.

Leading and lagging strands

In E. coli, the DNA polymerase that handles most of the synthesis is DNA polymerase III. There are two molecules of DNA polymerase III at a replication fork, each of them hard at work on one of the two new DNA strands.
DNA polymerases can only make DNA in the 5' to 3' direction, and this poses a problem during replication. A DNA double helix is always anti-parallel; in other words, one strand runs in the 5' to 3' direction, while the other runs in the 3' to 5' direction. This makes it necessary for the two new strands, which are also antiparallel to their templates, to be made in slightly different ways.
One new strand, which runs 5' to 3' towards the replication fork, is the easy one. This strand is made continuously, because the DNA polymerase is moving in the same direction as the replication fork. This continuously synthesized strand is called the leading strand.
The other new strand, which runs 5' to 3' away from the fork, is trickier. This strand is made in fragments because, as the fork moves forward, the DNA polymerase (which is moving away from the fork) must come off and reattach on the newly exposed DNA. This tricky strand, which is made in fragments, is called the lagging strand.
The small fragments are called Okazaki fragments, named for the Japanese scientist who discovered them. The leading strand can be extended from one primer alone, whereas the lagging strand needs a new primer for each of the short Okazaki fragments.

The maintenance and cleanup crew

Some other proteins and enzymes, in addition the main ones above, are needed to keep DNA replication running smoothly. One is a protein called the sliding clamp, which holds DNA polymerase III molecules in place as they synthesize DNA. The sliding clamp is a ring-shaped protein and keeps the DNA polymerase of the lagging strand from floating off when it re-starts at a new Okazaki fragmentstart superscript, 4, end superscript.
Topoisomerase also plays an important maintenance role during DNA replication. This enzyme prevents the DNA double helix ahead of the replication fork from getting too tightly wound as the DNA is opened up. It acts by making temporary nicks in the helix to release the tension, then sealing the nicks to avoid permanent damage.
Finally, there is a little cleanup work to do if we want DNA that doesn't contain any RNA or gaps. The RNA primers are removed and replaced by DNA through the activity of DNA polymerase I, the other polymerase involved in replication. The nicks that remain after the primers are replaced get sealed by the enzyme DNA ligase.

Summary of DNA replication in E. coli

Let's zoom out and see how the enzymes and proteins involved in replication work together to synthesize new DNA.
Illustration shows the replication fork. Helicase unwinds the helix, and single-strand binding proteins prevent the helix from re-forming. Topoisomerase prevents the DNA from getting too tightly coiled ahead of the replication fork. DNA primase forms an RNA primer, and DNA polymerase extends the DNA strand from the RNA primer. DNA synthesis occurs only in the 5' to 3' direction. On the leading strand, DNA synthesis occurs continuously. On the lagging strand, DNA synthesis restarts many times as the helix unwinds, resulting in many short fragments called “Okazaki fragments.” DNA ligase joins the Okazaki fragments together into a single DNA molecule.
  • Helicase opens up the DNA at the replication fork.
  • Single-strand binding proteins coat the DNA around the replication fork to prevent rewinding of the DNA.
  • Topoisomerase works at the region ahead of the replication fork to prevent supercoiling.
  • Primase synthesizes RNA primers complementary to the DNA strand.
  • DNA polymerase III extends the primers, adding on to the 3' end, to make the bulk of the new DNA.
  • RNA primers are removed and replaced with DNA by DNA polymerase I.
  • The gaps between DNA fragments are sealed by DNA ligase.

DNA replication in eukaryotes

The basics of DNA replication are similar between bacteria and eukaryotes such as humans, but there are also some differences:
  • Eukaryotes usually have multiple linear chromosomes, each with multiple origins of replication. Humans can have up to 100, comma000 origins of replicationstart superscript, 5, end superscript!
  • Most of the E. coli enzymes have counterparts in eukaryotic DNA replication, but a single enzyme in E. coli may be represented by multiple enzymes in eukaryotes. For instance, there are five human DNA polymerases with important roles in replicationstart superscript, 5, end superscript.
  • Most eukaryotic chromosomes are linear. Because of the way the lagging strand is made, some DNA is lost from the ends of linear chromosomes (the telomeres) in each round of replication.

Explore outside of Khan Academy

Do you want to learn more about DNA replication? Check out this scrollable interactive from LabXchange.
LabXchange is a free online science education platform created at Harvard’s Faculty of Arts and Sciences and supported by the Amgen Foundation.

Want to join the conversation?

  • leafers tree style avatar for user Fairuz Nawar
    Is topoisomerase same as DNA gyrase ?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • starky ultimate style avatar for user natureforever.care
    How are the histone proteins taken care of during eukaryotic DNA replication?
    (13 votes)
    Default Khan Academy avatar avatar for user
    • piceratops seedling style avatar for user J
      The DNA is first unwound at origins of replication and the displaced histone proteins move onto to other parts of the DNA that haven't been unwound so that those parts can maintain their chromatin structure.
      (7 votes)
  • purple pi pink style avatar for user Michelle Verstraaten
    "Many DNA have proofreading activity" mentions : "In most cases, the correct nucleotide is indeed added, because the DNA polymerization reaction won't usually occur unless the incoming nucleotide base-pairs correctly with the template." If the reaction cannot occur unless there is correct base matching, how then can the DNA polymerase still make an error?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • leafers tree style avatar for user emilyabrash
      The key word is the "usually." The reaction won't occur with a mis-paired base in most cases. However, about 1 in 10^5 base pairs will involve an incorrect pairing. This may not same like a high rate of errors, but it is high enough to cause a lot of mutations in a cell. The role of the proofreading is to fix these occasional but still problematic errors.
      (19 votes)
  • leaf green style avatar for user Isaac D. Cohen
    In the last section "DNA replication in Eukaryotes" it says that in eukaryote cells a little DNA at the ends of the chromosomes gets lost. If this is the case, will we eventually loose enough DNA to stop functioning properly?

    You might say that this is indeed why we die eventually. Each time our cells divide and our DNA gets copied some of it gets lost placing a limit on how many times our cells could divide and still function properly. However, consider that DNA is also copied before meiosis. This means that the DNA that was lost when the ancestors of my cells (in my parents) divided was never passed on to me. And the DNA that got lost when my cells divided to form my germ cells will never get passed on to my sperm cells. Will humanity eventually loose its entire gene pool?
    (12 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Charles LaCour
      At the ends of DNA strands there is a section non-coding nucleotides that we call a telomere. The telomere is what gets shorter every time a cell divides and when the telomere is gone is when the cell spontaneously dies. There is no loss of coding DNA in this process so there is no loss of genetic information between generations.
      (2 votes)
  • purple pi purple style avatar for user gregattac
    The part of the article that deals with the Okazaki-fragments states that:

    "DNA polymerase I and DNA ligase are also needed (more infrequently) for the leading strand. DNA polymerase I removes the primer at the very beginning of the strand, and DNA polymerase seals the remaining gap."

    Shouldn't the gap between the Primerreplacement and the new Nucleotide chain be sealed by DNA-Ligase instead?
    (7 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Lilah Bleich
    Why are the DNA polymerases numbered here? (I/II/III) I though that Eu-k were named by alpha beta delta etc
    (4 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user Hans Kristian Pedersen
    In the paragraph 'DNA polymerases' it says that polymerase II has a DNA repair function, but in 'Many DNA polymerases have proofreading activity', it is stated that DNA pol. I and II have proofreading activity. Does DNA pol. II aid in a different repair mechanism than proofreading?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • leafers ultimate style avatar for user Shreyas Pai
    'A DNA molecule “unzips” as the hydrogen bonds between bases are broken, separating the two strands.' What makes this happen?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user Faiza Salah
    Topoisomerase works at the region ahead of the replication fork to prevent supercoiling.

    What does it mean? I can not understand supercoiling.
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Priyanka
    Genome refers to the haploid content of DNA in a cell, so how can it consist of 3 billion base PAIRS? Or is it the diploid content in any cell?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      Most DNA exists as a double-stranded DNA in a double helix — the strands are held together by base pairs and we usually think of this as a single molecule (even though there are no covalent bonds between the two strands).

      So, each haploid chromosome has at its core a (mostly) double-stranded DNA "molecule" and a human haploid genome contains ~3.2 billion base pairs.

      A diploid cell has two of each haploid chromosome (each of which contains a double-stranded DNA "molecule"), so a diploid human genome contains ~6.4 billion base pairs.

      Does that help?
      (3 votes)