If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

DNA libraries & generating cDNA

Discover the process of creating a DNA library from proteins. Learn how scientists use messenger RNA, reverse transcriptase, and DNA polymerase to generate a complimentary DNA sequence. Explore how this sequence is amplified, sequenced, and added to an online database, enabling easy access to the DNA sequence of any protein.
Visit us (http://www.khanacademy.org/science/healthcare-and-medicine) for health and medicine content or (http://www.khanacademy.org/test-prep/mcat) for MCAT related content. These videos do not provide medical advice and are for informational purposes only. The videos are not intended to be a substitute for professional medical advice, diagnosis or treatment. Always seek the advice of a qualified health provider with any questions you may have regarding a medical condition. Never disregard professional medical advice or delay in seeking it because of something you have read or seen in any Khan Academy video.
Created by Ronald Sahyouni.

Want to join the conversation?

  • blobby green style avatar for user cassandraacairns
    At :28 he says that we can determine the mRNA sequence from looking at the codons used to form each Amino Acids. If multiple codon sequences code for the same AA, how can we be sure which one is used in each case?
    (36 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user walter yip
      You can't go from an amino acid sequence to mRNA but you can extract mRNA from a cell (through column purification), convert it into DNA with reverse transcriptase and DNA polymerase, sequence, and then insert the DNA into a bacteria to determine the protein (find protein through the use of translational fusion). There is a lot of extra information but nice to know and clears things up.
      (22 votes)
  • piceratops ultimate style avatar for user SaraEBarnes
    I am confused. At the video says, "It's not possible to infer mRNA sequence from protein sequence, because multiple codons can code for the same amino acid. mRNA sequence can be inferred from DNA sequence." Isn't the whole point of this process to determine the DNA sequence of the protein. If you already know the DNA and thus can interpret the mRNA sequence why do you need to do this? Wouldn't attempting to create a cDNA library this way give you the possibility of multiple different, possibly incorrect, DNA sequences since multiple RNA codons can code for the same amino acid? It doesn't seem like this would work accurately if you are trying to determine the true DNA sequence of your protein.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • mr pink red style avatar for user Chris Saffran
      This is a good point and the video vastly oversimplifies the process of deriving the mRNA sequence from the amino-acid sequence. It actually can be done, but it's a bit of a pain. In order to do it you will need to generate a probe that is complementary to some portion of the target mRNA sequence, and then do an RNA extraction on cells actively expressing your protein. How do you select the right probe without knowing which codons are actually coding for the aminos? By making several. Take a small section of the amino sequence (say 7 residues) and infer all the different 21 base sequences that would code for that 7 amino acid sequence, and generate a probe for each. Then conduct Northern blot analyses using each probe and the RNA extract and see if any bands correspond to mRNA of the right length (3 times the length of the amino sequence, plus a couple hundred extra bases for the poly-A tail). If so, you isolate the mRNA and then can carry on with the rest of the steps described in the video. It's labor intensive, but not strictly impossible.
      (12 votes)
  • duskpin ultimate style avatar for user kiilava.julius
    But isn't there many possible codons that can code the same amino acid? So then you couldn't really say the exact order of DNA's nucleic acids but a few possibilities for it?
    (6 votes)
    Default Khan Academy avatar avatar for user
  • aqualine tree style avatar for user meenakshig097
    At about , he talks about how double-stranded DNA can be injected into a cloning vector (plasmid/virus), which can then be injected into a bacteria to make the desired protein. But wouldn't bacterial restriction enzymes recognize the foreign DNA (virus has injected its DNA into the bacteria again!) and cut it all out with restriction enzymes? (as mentioned in the last video) If someone could explain this to me quickly, that would be great!
    (6 votes)
    Default Khan Academy avatar avatar for user
    • purple pi purple style avatar for user fatemeh<3
      I believe as he mentioned in the last video when we add the dna fragment (circular to straight and then addition of the portion of the DNA that we want), the cell recognizes it as self and does not attack it - restrict it- if it was a whole set of unique dna floating around by itself without any methy group then yes definitely but now it is part of the cells' DNA and newly synthesized DNA will be methylated just like the cell DNA and the replication process proceeds. Hope that helped!
      (4 votes)
  • blobby green style avatar for user dnadora
    at how can we get the sequence after amplification?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • primosaur seedling style avatar for user ky6490
    Are we using PCR or Bacteria to amplify our sequence? Which one would be better & why?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      Typically libraries will be made in bacteria.

      While PCR is very useful for rapidly making many copies of a sequence, there are a number of reasons why you might want to clone into a vector rather than just doing PCR:

      0) Vectors can be used to do different things with the DNA. A common example of this would be an expression vector — this causes the DNA to be transcribed and translated and would allow you to examine the protein encoded in the cloned DNA.

      1) PCR is somewhat error prone, so if you need a lot of DNA having a bacterium make copies of it for you is much less likely to introduce mutations.

      2) PCR is also more expensive if you want to make large amounts of DNA.

      3) DNA is more stable as a circular plasmid, rather than a linear piece of DNA — this means to store DNA for any length of time you probably want to clone it into a vector anyway.

      4) Long term storage — you can make a frozen (-80°C) stock of the bacteria containing the plasmid bearing the piece of DNA and you or someone else who wants to reuse that DNA can access it decades later by simply adding the bacteria to media and growing up as much as you want.


      In fact, we will often use PCR to create enough of a specific fragment to put into a vector, which then gets transformed into Escherichia coli (often abbreviated to E. coli). We will then often sequence the cloned fragment to make sure we got what we really wanted and that it doesn't have any mutations. This cloned DNA can now be stored and manipulated in whatever ways are necessary for our experiments.

      Does that help?
      (6 votes)
  • marcimus pink style avatar for user eman
    If you already have the dsDNA, why do you need to amplify it to get the correct sequence? Can you not find the sequence already of the dsDNA by itself, however small a sample?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • marcimus pink style avatar for user Taylor
      Well, first of all, you want a large sample of DNA to work with. It's very difficult to work with only a few copies. Secondly, if you watched the video on PCR, you'll see that by amplifying the DNA sequence, you will end up with exactly the sequence you want in abundance. The use of primers in PCR allow the amplification of only the target sequence, and the majority of the DNA in your sample will only contain your target sequence.
      (1 vote)
  • old spice man green style avatar for user adam kim
    if we can look into an mrna table to find out the sequence of the dna, then why are we making all these proteins just find the same information (the dna sequence)
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leafers seedling style avatar for user K Banerjee
    And what about Genomic DNA Library?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leafers seedling style avatar for user K Banerjee
    How did that mRNA come from that amino acid sequence?
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Voiceover] Alright, so let's say that you've got this little guy over here and he's got his shoes and he's just happy, smilin'. So, this guy right here is our protein. So, let's look at how this protein was created. So, in order to make protein we have to start out with our base, and in this case our base is DNA. >From DNA we generate messenger RNA, then that messenger RNA eventually leads to the formation of a protein. And protein is this happy guy over here. This is pretty straightforward, but what if we wanted to go in reverse? What if we started out with a protein and we wanted to figure out what its DNA sequence was? So, if we wanted to go in this direction. So let's look at how this is done. Now scientists thought it would be nice to basically be able to type in the name of any protein that they're interested in and automatically it would pop up with the DNA sequence of that protein. Now that is known as a DNA library. And a DNA library would be beneficial for researchers, and scientists, and clinicians. So, let's look at how this is done. So, we'll start out with our protein and our protein is basically a chain of amino acids. So, amino acids basically are formed from messenger RNA. So, if we know the amino acid sequence of our protein we know what the messenger RNA sequence is based on the Codon table, that we all are too familiar with. So, if we have the messenger RNA sequence, what we do is we add an enzyme known reverse transcriptase and when we add reverse transcriptase, basically takes this messenger RNA and makes a complimentary DNA sequence to the messenger RNA. And that's known as cDNA, the 'c' stands for complimentary. So complimentary DNA, one thing to keep in mind is single-stranded DNA. So, normally DNA in our cells is double-stranded DNA, but complimentary DNA is single-stranded. So, in order to generate double-stranded DNA we need to add another enzyme known as DNA polymerase. DNA polyermase basically generates double-stranded DNA. So, this is basically step one of the process of creating a DNA library. So this is step one, now let's look at step two. So, now that we have our double-stranded DNA what we need to do is sequence it. So, in order to sequence it we'll start out with our double-stranded DNA and we'll basically inject it into some sort of cloning vector, such as a plasmate or a virus. Cloning vector, and that cloning vector can then be, then you can take that cloning vector and add it to some bacteria. And it'll basically infect the bacteria and the bacteria will basically produce lots and lots of this DNA, this double-stranded DNA, so that's a process known as Amplification. And once we have lots of double-stranded DNA, we'll go and sequence that double-stranded DNA and basically once we have that sequence we'll put the sequence into a large database that's readily accessible online and that database will basically populate the DNA library. So, now anybody that is interested in the DNA sequence of a particular protein can just go into this library, and pull up the genetic sequence of the protein of interest.