If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

DNA sequencing

How the sequence of nucleotide bases (As, Ts, Cs, and Gs) in a piece of DNA is determined.

Key points:

  • DNA sequencing is the process of determining the sequence of nucleotides (As, Ts, Cs, and Gs) in a piece of DNA.
  • In Sanger sequencing, the target DNA is copied many times, making fragments of different lengths. Fluorescent “chain terminator” nucleotides mark the ends of the fragments and allow the sequence to be determined.
  • Next-generation sequencing techniques are new, large-scale approaches that increase the speed and reduce the cost of DNA sequencing.

What is sequencing?

You may have heard of genomes being sequenced. For instance, the human genome was completed in 2003, after a many-year, international effort. But what does it mean to sequence a genome, or even a small fragment of DNA?
DNA sequencing is the process of determining the sequence of nucleotide bases (As, Ts, Cs, and Gs) in a piece of DNA. Today, with the right equipment and materials, sequencing a short piece of DNA is relatively straightforward.
Sequencing an entire genome (all of an organism’s DNA) remains a complex task. It requires breaking the DNA of the genome into many smaller pieces, sequencing the pieces, and assembling the sequences into a single long "consensus." However, thanks to new methods that have been developed over the past two decades, genome sequencing is now much faster and less expensive than it was during the Human Genome Project1.
In this article, we’ll take a look at methods used for DNA sequencing. We'll focus on one well-established method, Sanger sequencing, but we'll also discuss new ("next-generation") methods that have reduced the cost and accelerated the speed of large-scale sequencing.

Sanger sequencing: The chain termination method

Regions of DNA up to about 900 base pairs in length are routinely sequenced using a method called Sanger sequencing or the chain termination method. Sanger sequencing was developed by the British biochemist Fred Sanger and his colleagues in 1977.
In the Human Genome Project, Sanger sequencing was used to determine the sequences of many relatively small fragments of human DNA. (These fragments weren't necessarily 900 bp or less, but researchers were able to "walk" along each fragment using multiple rounds of Sanger sequencing.) The fragments were aligned based on overlapping portions to assemble the sequences of larger regions of DNA and, eventually, entire chromosomes.
Although genomes are now typically sequenced using other methods that are faster and less expensive, Sanger sequencing is still in wide use for the sequencing of individual pieces of DNA, such as fragments used in DNA cloning or generated through polymerase chain reaction (PCR).

Ingredients for Sanger sequencing

Sanger sequencing involves making many copies of a target DNA region. Its ingredients are similar to those needed for DNA replication in an organism, or for polymerase chain reaction (PCR), which copies DNA in vitro. They include:
  • A DNA polymerase enzyme
  • A primer, which is a short piece of single-stranded DNA that binds to the template DNA and acts as a "starter" for the polymerase
  • The four DNA nucleotides (dATP, dTTP, dCTP, dGTP)
  • The template DNA to be sequenced
However, a Sanger sequencing reaction also contains a unique ingredient:
  • Dideoxy, or chain-terminating, versions of all four nucleotides (ddATP, ddTTP, ddCTP, ddGTP), each labeled with a different color of dye
Two chemical structures of a nucleotide are shown. Both images have a 5 carbon ring shaped molecule with a base attached to the first carbon and a phosphate group attached to the fifth carbon. The top image has a two hydrogens attached to the number 3 carbon and is labeled Dideoxynucleotide, ddNTP. The bottom image has one hydrogen and an OH attached to the number 3 carbon and is labeled Deoxynucleotide, dNTP.
_Image credit: "Whole-genome sequencing: Figure 1," by OpenStax College, Biology (CC BY 4.0)._
Dideoxy nucleotides are similar to regular, or deoxy, nucleotides, but with one key difference: they lack a hydroxyl group on the 3’ carbon of the sugar ring. In a regular nucleotide, the 3’ hydroxyl group acts as a “hook," allowing a new nucleotide to be added to an existing chain.
Once a dideoxy nucleotide has been added to the chain, there is no hydroxyl available and no further nucleotides can be added. The chain ends with the dideoxy nucleotide, which is marked with a particular color of dye depending on the base (A, T, C or G) that it carries.

Method of Sanger sequencing

The DNA sample to be sequenced is combined in a tube with primer, DNA polymerase, and DNA nucleotides (dATP, dTTP, dGTP, and dCTP). The four dye-labeled, chain-terminating dideoxy nucleotides are added as well, but in much smaller amounts than the ordinary nucleotides.
The mixture is first heated to denature the template DNA (separate the strands), then cooled so that the primer can bind to the single-stranded template. Once the primer has bound, the temperature is raised again, allowing DNA polymerase to synthesize new DNA starting from the primer. DNA polymerase will continue adding nucleotides to the chain until it happens to add a dideoxy nucleotide instead of a normal one. At that point, no further nucleotides can be added, so the strand will end with the dideoxy nucleotide.
This process is repeated in a number of cycles. By the time the cycling is complete, it’s virtually guaranteed that a dideoxy nucleotide will have been incorporated at every single position of the target DNA in at least one reaction. That is, the tube will contain fragments of different lengths, ending at each of the nucleotide positions in the original DNA (see figure below). The ends of the fragments will be labeled with dyes that indicate their final nucleotide.
A diagram with images showing the Sanger DNA sequencing method. At the top of the diagram is a horizontal green line with 17 small vertical lines extending from the horizontal line. The left side of the line is labeled 3 prime, the right side of the line is labeled 5 prime, and the line is titled template. Above the green line is a horizontal gray line with 9 small vertical lines extending from it; the left side of the gray line is labeled 5 prime and the right side is labeled 3 prime and the line is titled primer. An arrow points from the primer to a series of horizontal lines that are increasing in length. The arrow is joined by an arrow from a key that read dNTPs. The symbols for the key are ddTTP is red circle, ddCTP is blue circle ddATP is green circle and ddGTP is purple circle. At the point where the arrow from the key joins the arrow from the primer, the arrow is labeled primer extension and chain termination. At the tip of the arrow are 9 parallel lines, each increasing in length. Each line begins with the gray primer and then is joined by a red line segment with a vertical line attached that ends with a colored circle. The first line has one vertical line segment and a purple circle. The second line has 2 red vertical line segments and a purple circle. Each horizontal line continues to extend by 1 vertical line segment adding to the 3 prime end of the segment, and at the end of each horizontal line segment is either a red, blue, green or purple circle. All 9 horizontal lines are enclosed within a bracket that has an arrow pointing to a tube shaped structure that has different colored bands on the tube and has the label capillary gel electrophoresis. The tube has an arrow pointing to a box labeled laser that shows a laser line running through the tube and connected to a box labeled detector. The detector is hooked up to a computer. From the computer there is an arrow pointing to a box that has an image of purple, red, green and blue lines on a graph and the box is labeled chromatogram. Under the box is a label sequence that reads G, G, T, C, A, T, A, G, C. The Gs are colored purple, the Ts are colored red, the Cs are colored blue and the As are colored green.
Image modified from "Sanger sequencing," by Estevezj (CC BY-SA 3.0). The modified image is licensed under a (CC BY-SA 3.0) license.
After the reaction is done, the fragments are run through a long, thin tube containing a gel matrix in a process called capillary gel electrophoresis. Short fragments move quickly through the pores of the gel, while long fragments move more slowly. As each fragment crosses the “finish line” at the end of the tube, it’s illuminated by a laser, allowing the attached dye to be detected.
The smallest fragment (ending just one nucleotide after the primer) crosses the finish line first, followed by the next-smallest fragment (ending two nucleotides after the primer), and so forth. Thus, from the colors of dyes registered one after another on the detector, the sequence of the original piece of DNA can be built up one nucleotide at a time. The data recorded by the detector consist of a series of peaks in fluorescence intensity, as shown in the chromatogram above. The DNA sequence is read from the peaks in the chromatogram.

Uses and limitations

Sanger sequencing gives high-quality sequence for relatively long stretches of DNA (up to about 900 base pairs). It's typically used to sequence individual pieces of DNA, such as bacterial plasmids or DNA copied in PCR.
However, Sanger sequencing is expensive and inefficient for larger-scale projects, such as the sequencing of an entire genome or metagenome (the “collective genome” of a microbial community). For tasks such as these, new, large-scale sequencing techniques are faster and less expensive.

Next-generation sequencing

The name may sound like Star Trek, but that’s really what it’s called! The most recent set of DNA sequencing technologies are collectively referred to as next-generation sequencing.
There are a variety of next-generation sequencing techniques that use different technologies. However, most share a common set of features that distinguish them from Sanger sequencing:
  • Highly parallel: many sequencing reactions take place at the same time
  • Micro scale: reactions are tiny and many can be done at once on a chip
  • Fast: because reactions are done in parallel, results are ready much faster
  • Low-cost: sequencing a genome is cheaper than with Sanger sequencing
  • Shorter length: reads typically range from 50 -700 nucleotides in length
Conceptually, next-generation sequencing is kind of like running a very large number of tiny Sanger sequencing reactions in parallel. Thanks to this parallelization and small scale, large quantities of DNA can be sequenced much more quickly and cheaply with next-generation methods than with Sanger sequencing. For example, in 2001, the cost of sequencing a human genome was almost $100 million. In 2015, it was just $12452!
Why does fast and inexpensive sequencing matter? The ability to routinely sequence genomes opens new possibilities for biology research and biomedical applications. For example, low-cost sequencing is a step towards personalized medicine – that is, medical treatment tailored to an individual's needs, based on the gene variants in his or her genome.

Want to join the conversation?

  • blobby green style avatar for user iristabak
    This might be a bit off topic, but I am a chemistry student and I want to get a tattoo of my father DNA. Now I know the human DNA contains too many basepairs to fit as a tattoo. But if a small segment of the sequence is used, how likely is it that that sequence also belongs to a different person?
    (23 votes)
    Default Khan Academy avatar avatar for user
    • leafers ultimate style avatar for user Laila87
      Technically speaking, you could use the sequence from DNA fingerprinting (the method used to identify a person), but it would be still a lot of material for a tattoo, it's typically thirteen sequences of varying length... I think it would be pretty big tattoo.
      Another option would be getting a tattoo of "DNA ladder" (the DNA fingerprinting pattern seen on electrophoresis), this is also unique for a person and DNA related. And add only two or three nice, detailed basepairs next to it.
      (20 votes)
  • blobby green style avatar for user thejeremiahbender
    Why can't the die molecule be attached to a regular nucleotide, and then the entire DNA chain could be read as a single item?
    (7 votes)
    Default Khan Academy avatar avatar for user
    • female robot amelia style avatar for user Yuvraj Chaudhry
      you cannot read the entire dna chain as a single item, even if each base pair were to be dyed since you would be getting the four colours at once due to the small size of the molecule. you use ddntp to stop the synthesis for a strand and get fragments of all possible lengths that move in ascending order of length. this lets you to exactly know which base comes at what point.
      (5 votes)
  • blobby green style avatar for user Avi Benshtein
    Are there two systems to sequencing?
    one with light (laser) and the second with electric field?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • leafers tree style avatar for user emilyabrash
      In traditional (Sanger) sequencing, both electric fields and lasers are required for slightly different, but interdependent, purposes. The electric field is applied to the DNA in the capillary tube, and it pulls the DNA pieces through in order from shortest to longest. As the pieces pass the laser (moving through because of the electric field), they are excited and detected by the detector.

      Some next-generation sequencing does not use lasers at all. Instead, it uses H+ ion fluxes to determine whether bases have been added. I'm not super familiar with this technique, but you can read more in this Wikipedia article: https://en.wikipedia.org/wiki/Ion_semiconductor_sequencing.

      Hope one of those answers addresses your question!
      (11 votes)
  • blobby green style avatar for user mekhanaanilkumar
    Does a 2000bp or a 500bp migrate faster throgh this agarose gell
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Manar Al-Masri
    How do we know the specific primer if we already don’t know the DNA fragment sequence ?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      If we want to amplify a fragment of DNA, but don't know the sequence a technique known as "Ligation-mediated PCR" can be used. This technique starts by adding (ligating) the primer sequences to the ends of the DNA fragment.
      (3 votes)
  • blobby green style avatar for user Rida
    "For instance, the human genome was completed in 2003"

    i dont understand this line. isnt the genome unique to each and every individual? so what does this mean? did they create one entire human genome and then assumed that the changes which occur amongst humans are due to mutations(point mutations?) in this genome?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • starky ultimate style avatar for user ++§ Αλεκσανδαρ
      Hey, Rida.

      You're correct: we all have a unique DNA in our cells. Even identical twins have different sequence of nucleotides, due to mutations that accumulate over time (although those differences are only minute).
      However, our DNA isn't exactly entirely composed of genes. In fact, protein-coding genes make up only about 2 % of our DNA, and this is the part in which individual people differ very little.

      There were two human genome sequencing projects happening simultaneously. One was Human Genome Project, and the other one was Celera Genomics. The latter one was led by J. Craig Venter, and they were sequencing his own genome. As for the Human Genome Project, i don't think they ever publicly announced whose DNA they were working with.

      As you can probably tell, sequencing a genome is only one part of the work, and "completing the human genome" isn't the end of the work. We are still looking for differences and variations both among genes, and non-coding parts of DNA. Also, we still haven't figured out what all those non-coding parts of DNA do.
      Speaking of it, the human genome wasn't exactly completed in 2003. The most part of it was, but some problematic regions took almost an entire decade to complete. The final assembly was published this year, in January 2022.

      If you're interested, you can take a look at results published in 2004 by Human Genome Project. Here's the link:
      https://www.gutenberg.org/ebooks/author/856
      There you can find entire sequences, nucleotide by nucleotide, in a plain text format.

      Hope this answers your question. Feel free to ask more in case i left anything out.

      Take care,

      Alex
      (7 votes)
  • leaf green style avatar for user snoozy
    Why do the nucleotides used here have 3 phosphate groups instead of 1?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user toadere17
    in a chromatogram why are the peaks of different heights?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user osaintfleurubstu
    what molecule can be added to a 1st gen and second gen sequencing that will halt the reaction but will not affect a third gen sequencing?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user Andrew
    This may go a bit further than what the article specified, but here we go. In next-generation sequencing, how would you prevent reading a double nucleotide. If you add multiple dNTPs to a single cluster, and the next sequence is AA, how would you be able to tell the difference between this and just A?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user bozunlu92
      depending on the method you are reading it the answer may change. but let me give you one example i know

      in the ones that we observe the dntp by the light if it is aa it emits more light if it is a smaller amount of light emitted. if there is aaa there is even more emitted but more than 4 aaaa is being problem that we can not observe the differences between emitions. you can watch a small video on youtube about next generation sequencing i would reccomend the one with ion torrent, which gives your answer specially
      (1 vote)