If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

DNA sequencing

DNA sequencing involves three main steps: 1) using PCR to amplify DNA fragments, 2) introducing dideoxynucleotides that halt DNA strand elongation, and 3) employing a computer to analyze the fluorescent labels on the DNA fragments to determine the sequence. Created by Ronald Sahyouni.

Want to join the conversation?

  • blobby green style avatar for user claire hamel
    there's a small error, at the base+ribose was called (and several times afterwards) a nucleotide. It's a nucleoside, nucleotides have the phosphate group attached...just mentioning so that there isn't any confusion...
    (29 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Brendan
    In regards to the overall process... The ddNTPs are inserted randomly when carrying out PCR correct? So how would one know the order in which to place the strands once you have them? Are they just annealed in order of size?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • mr pink red style avatar for user Chris Saffran
      The PCR is run long enough to allow the probability that elongation was terminated at each position in the sequence thousands of times over. That means for a sequence of length n amplified with a primer of length p, you have thousands of fragments of every length from p+1 to n, each one labeled with a fluorescent marker at it's 3' end. For example if your actual DNA sequence is 5'-ATGGCGATGT-3', but yore only sure about the last 5 bases, so you design your primer: 5'-ACATC-3' (which is complementary to those last 5 bases when read 3' to 5'). At the end of your PCR, you'll have a few thousand fragments of 5'-ACATCg-3', a few thousand 5'-ACATCGc-3', a few thousand 5'-ACATCGCc-3', a few thousand 5'-ACATCGCCa-3' and a few thousand 5'-ACATCGCCAt-3'. Note I used the lower case for the 3' bases to indicate that it's a labeled dideoxynucleotide. So what will your gel look like when you run electrophoresis? Well here p+1 is 6, and n is 10, so assuming you've used up all your primer, you'll have a band corresponding to 6 bp and all the fragments in that band will have the G label. You'll have another one at 7 bp all the fragments in which will have the C label, one at 8 bp also labeled C, one at 9 bp labeled A, and finally one at 10 bp labeled T. If you use easily distinguishable fluorescent labels, you can actually read the result without a computer (though in practice a computer is pretty much always used): GCCAT from bottom to top all lined up nicely. Remember that's 5' to 3', but it's the reverse complement to our mystery sequence, so the final result is 5'-ATGGC-3'. Very elegant. A guy named Sanger came up with the technique in the 70's so it's called "Sanger sequencing."
      (31 votes)
  • male robot hal style avatar for user Abid Ali
    How does the computer determine what the first nucleotide in the sequence is?
    (15 votes)
    Default Khan Academy avatar avatar for user
    • old spice man green style avatar for user Susmita Sarkar
      I'm not sure if my understanding is correct, but I think it might be because you assume that you can't start a new strand w/ the ddNTP. We're then assuming that the radiolabelled ddNTP end determines the length of that strand, and the fragment starts from other end (which is consistent for all strands).
      (3 votes)
  • marcimus pink style avatar for user Priyanka
    Technically, as far as I understand, knowing the details of the ddNTPs is unnecessary for the MCAT because the Recombinant DNA and Biotechnology section is listed under the "BIO" umbrella on the AAMC outline. The structure of nucleotides isn't taught in such detail in Bio as it is in Biochem. I think it's more important to understand the general concept of DNA sequencing here.
    (4 votes)
    Default Khan Academy avatar avatar for user
  • piceratops tree style avatar for user banoubr
    My question is: the iterative process of this procedure will give you the "complementary strand" of Gene A? Is that correct?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user EliasLeavitt
    At , wouldn't the ddNTP have phosphate groups on its 5' carbon? Or is that not important?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • male robot johnny style avatar for user Danielle Jones
    Are microbial DNA sequencing methods similar to this?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Becca
      Provided you can cultivate isolated colonies of the microbe in a lab setting, the same techniques can be used. Although prokaryotic DNA is circularized, it isn't a concern, as the DNA is chopped into fragments. In some ways, it is easier to sequence bacterial DNA than eukaryotic DNA, as it has less repeat sequences and such.
      (2 votes)
  • aqualine ultimate style avatar for user SULAGNA NANDI
    What does the NTP in ddNTP stand for?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leafers sapling style avatar for user Theultimatespammer123
    Is this the only way to find out the sequence of a DNA? It seems very long and error prone... Is there a way where like the DNA sequence can be just read of a single DNA strand or something?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user amrutarajarajan
    Wouldn't we need to know something about the DNA sequence, if we want to PCR-amplify the sample in the first place? How are the primers for this PCR designed? Referring to step one,
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

- [Voiceover] Have you ever wondered how we sequence DNA? Well, let's just take a quick look at DNA sequencing. We're going to break down DNA sequencing into three different steps. The first step is you take the sample of DNA that you are interested in sequencing and you basically use PCR to amplify the sample. By using PCR in order to amplify the sample, you're able to generate lots and lots of DNA fragments. The next thing that you do is normally in PCR you have to add nucleotides, you have to give the growing strand the substrate from which it can grow. Normally you add in regular deoxynucleotides and those look something like this. You've got an OH group here. You've got an H group here. You have a base... And then you've got a carbon group... And oxygen-hydrogen. So, this is what a normal nucleotide looks like... But interspersed in the PCR, what you also want to add is you want to add in something known as a dideoxynucleotide. A dideoxynucleotide looks something like this. It's basically exactly the same thing but it only has a hydrogen here, so this oxygen is removed. And what that basically does is if this dideoxynucleotide, we can abbreviate ddNTP, if this incorporates into the growing strand, since there's no oxygen group here, the strand can no longer elongate. You basically have termination of strand elongation, as soon as this ddNTP incorporates. What you can do is you can actually fluorescently label the different dideoxynucleotides. For example, we have four different options. We can label all the G's blue, we can label all the A's red, all the T's green, and all the C's orange. And so basically what you have is you have these dideoxynucleotides with different fluorescent labels getting incorporated into the growing strand and since PCR is able to amplify creating millions and million of DNA fragments, you can basically, what you can do is you'll have strands of different lengths. Let's just kind of look at an example. Let's imagine that we've got a nucleotide being incorporated here, a regular nucleotide, and then another one incorporated here and then another one and then just randomly, all of a sudden, we have a dideoxynucleotide being incorporated here and this would stop the elongation of the strand. So, you would have a DNA strand which that's just four nucleotides long. And after another round of PCR, what we might have is we might have, one, two, three, four, five, six, it's just growing, it's growing, it's growing, and all the sudden, whoa, what happened? You got a dideoxynucleotide being incorporated. And so basically, you just do this and after you've got millions of samples, you will eventually be able to have something that looks like this. You'll have maybe just one regular nucleotide and you've got a dideoxynucleotide incorporated, or you might have maybe, let's say, two of them, so you'll have two and then you've got a... Let's use this color suite we've got here. What you can basically do is you can see you have strands and they're elongating and different strands are terminated at different points by a dideoxynucleotide. And so, basically, the next step, is you use gelelectrophoresis... Electrophoresis... In order to separate the strands by size. So, when you run all the different fragments on a gel, it will separate them by size and then you can just have a computer go in and analyze all the fluorescent labels. So if it sees here, that you've got this blue fluorescent light, then it knows that the second nucleotide in the sequence is a G, so it'll say G. And then, it'll look here, it'll say, okay well this is a C. It'll look here, it'll say we have another G and so on and so forth. And basically computer is able to, by reading these fluorescent labels, these fluorescent tags, it's able to give you a DNA sequence. And so this is basically an overview of how DNA sequencing works.