If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

DNA structure and replication

Hank introduces us to that wondrous molecule deoxyribonucleic acid - also known as DNA - and explains how it replicates itself in our cells. Created by EcoGeek.

Want to join the conversation?

Video transcript

Hank: It's just beautiful, isn't it? It's mesmerizing. It's double-helixciting. Really can tell just by looking at it how, sort of, important and amazing it is. It's pretty much the most complicated molecule that exists, and potentially, the most important one. It's so complex, that we didn't eve know for sure what it looked like until about 60 years ago; and so multifariously awesome, that if you took off it from just one of our cells and untangled it, it would be taller than me. Nowt considering that there are probably 50 trillion cells in my body, right now, laid end to end, the DNA in those cells would stretch to the sun, not once, but 600 times. Mind blown yet? Hey, you wanna make one? (upbeat music with whistling) Of course you know, I'm talking about deoxyribonucleic acid, known to its friends as DNA. DNA is what stores our genetic instructions, the information that programs all of our cells' activities. It's a 6 billion letter code that provides the assembly instructions for everything that you are. And it does the same thing for pretty much every other living thing. I'm gonna go out on a limb here, and assume that you are human, in which case, every body cell that you have, or somatic cell in you, has 46 chromosomes, each containing 1 big DNA molecule. These chromosomes are packed together tightly with proteins in the nucleus of the cell. DNA is nuclei acid, and so is its cousin, which we'll also be talking about, ribonucleic acid, or RNA. Now, if you can make your mind do this, remember all the way back to episode 3, where we talked about all of the important biological molecules, carbohydrates, lipids and proteins. That ring a bell? Well, nucleic acids are the 4th major group of bioligical molecules, and for my money, they have the most complicated job of all. Structurally, they're polymers, which means that each one is made up of many small repeating molecular units. In DNA, these small units are called nucleotides; link them together and you have yourself polynucleotide. Now, before we actually put these tiny parts together to build a DNA molecule like some microscopic piece of IKEA furniture, let's first take a look at what makes up each nucleotide. We're gonna need 3 things: 1. A 5-carbon sugar molecule, 2. A phosphate group, and 3. 1 of 4 nitrogen bases. DNA gets the first part of its name from our first ingredient, the sugar molecule, which is called deoxyribose; but all the really significant stuff, the genetic coding that makes you you, is found among the 4 nitrogenous bases, adenine, thymine, cytosine, and guanine. It's important to note that in living organisms, DNA doesn't exist as a single polynucleotide molecule, but rather a pair of molecules that are held tightly together. They're like an intertwined, microscopic, double-spiral staircase; basically, just a ladder, but twisted. The famous double-helix. And like any good structure, we have to have a main support. In DNA, the sugars and phosphates bond together to form twin backbones. These sugar-phosphate bonds run down each side of the helix, but chemically, in opposite directions. In other words, if you look at each of the sugar phosphate backbones, you'll see that 1 appears to be upside down in relation to the other. One strand begins at the top with the first phosphate connected to the sugar molecule's 5th carbon, and then ending where the next phosphate would go, with a free end at the sugar's 3rd carbon. This creates a pattern called 5-prime and 3-prime. I've always thought of the deoxyribose with an arrow, with the oxygen as a point; it always points from 3 prime to 5 prime. Now, the other strand is exactly the opposite; it begins up top with a free end at the sugar's 3rd carbon, and the phosphates connect to the sugar's 5th carbons all the way down, and ends at the bottom with a phosphate, and you've probably figured this out already, but this is called the 3-prime to 5 prime direction. Now, it is time to make ourselves one of these famous double-helices. These 2 long chains are linked together by the nitrogenous bases via relatively weak hydrogen bonds. But they can't just be any pair of nitrogenous bases. Thankfully, when it comes to figuring out what part goes where, all you have to do is remember that if 1 nucleotide has an adenine base, only thymine can be its counterpart; likewise, guanine can only bond with cytosine. These bonded nitrongeous bases are called base pairs. GC pairing has 3 hydrogen bonds, making it slightly stronger than the AT base pair, which only has 2. It's the order of these 4 nucleobases, or the base sequence, that allows your DNA to create you. So, AGGTCCATG means something completely different as a base sequence than say, TTCAGTCG. Human chromosome 1, the largest of all of our chromosomes, contains a single molecule of DNA with 247 million base pairs. If you printed all of the letters of chromosome 1 into a book, it would be about 200,000 pages long, and each of your somatic cells has 46 DNA molecules tightly packed into its nucleus; that's 1 for each of your chromosomes. Put all 46 molecules together, and we're talking about roughly 6 billion base pairs in every cell. This is the longest book that I have ever read. It's about 1,000 pages long. If we were to fill it with our DNA sequence, we'd need about 10,000 of them to fit our entire genome. Pop quiz! Let's test your skills using a very short strand of DNA. I'll give you 1 base sequence, you give me the base sequence that appears on the other strand. Okay, here goes. So, we've got a 5-prime AGGTCCG to 3-prime. And ... Times up, the answer is 3-prime TCCAGGC 5-prime. See how that works? It's not super complicated, since each nitrogenous base only has 1 counterpart, you can use 1 base sequence to predict what its matching sequence is going to look like. So, could I make the same base sequence with a strand of that other nucleic acid, RNA? No, you could not. RNA is certainly similar to its cousin, DNA; it has a sugar-phosphate backbone with nucleotide bases attached to it, but there are 3 major differences: 1. RNA is a single-stranded molecule, no double-helix here, 2. The sugar in RNA is ribose, which has 1 more oxygen atom than deoxyribose, hence the whole starting with an R instead of a D thing, and finally, RNA does not contain thymine. Its 4th nucleotide is the base uracil, so it bonds with adenine instead. RNA is super important to the production of our proteins, and you'll see later, that it has a crucial role in the replication of DNA, but first ... (joyful piano) Biolographies! Yes, plural this week, because when you start talking about something as multitudinously awesome and elegant as DNA, you have to wonder just who figured all this stuff out and how big was their brain? Well, unsurprisingly, it actually took a lot of different brains in a lot of different countries and nearly 100 years of thinking to do it. The names you usually hear when someone asks, "Who discovered DNA?" are James Watson and Francis Crick, but that's bunk; they did not discover DNA, nor did they discover that DNA contains genetic information. DNA itself was discovered in 1869, by a Swiss biologist named Friedrich Miescher. His deal was studying white blood cells, and he got those white blood cells in the most horrible way you could possibly imagine, from collecting used bandages from a nearby hospital (laughs). God. For science he did it! He bathed the cells in warm alcohol to remove the lipids, and then he set enzymes loose on them to digest the proteins and what was left after all of that was this snotty, grey stuff that he knew must be some new kind of biological substance. He called it nuclein, what was later to become known as nucleic acid. But Miescher didn't know what its role was or what it looked like. One of the scientists who helped figure that out was Rosalind Franklin, a young biophysicist in London nearly 100 years later. Using a technique called X-ray diffraction, Franklin may have been the first to confirm the helical structure of DNA. She also figured out that the sugar-phosphate backbone existed on the outside of the structure. So, why is Rosalind Franklin not exactly a household name? Well, 2 reasons: 1. Unlike Watson and Crick, Franklin was happy to share data with her rivals; it was Franklin who informed Watson and Crick that an earlier theory of a triple helic structure was not possible, and in doing so, she indicated that DNA may indeed be a double helix. Later, her [evidence] confirming a helical structure of DNA were shown to Watson without her knowledge. Her work was eventually published in nature, but not until after 2 papers by Watson and Crick had already appeared, in which the duo only hinted at her contribution. Even worse than that, the Nobel Prize committee couldn't even consider her for the prize that they awarded in 1962 because of how dead she was. The really tragic thing is that it's totally possible that her scientific work may have led to her early death of ovarian cancer at the age of 37. At the time, the X-ray diffraction technology that she was using to photograph DNA required dangerous amounts of radiation exposure, and Franklin rarely took precautions to protect herself. Nobel Prizes cannot be awarded posthumously. Maybe believed that she would have shared Watson and Crick's medal if she had been alive to receive it. Now that we know the basics of DNA structure, we need to understand how it copies itself, because cells are constantly dividing, and that requires a complete copy of all that DNA information. It turns out that our cells are extremely good at this. Our cells can create the equivalent of 10,000 copies of this book in just a few hours; that, my friends, is called replication. Every cell in your body has a copy of the same DNA, it started from an original copy and will copy itself trillions of times over the course of a lifetime, each time using half of the original DNA strand as a template to build a new molecule. So, how is a teenage boy like the enzyme helicase? They both want to unzip your genes. Helicase is marvelous, undwinding the double helix at break-neck speed, slicing open those loose hydrogen bonds between the base pairs. The point where the splitting starts is known as the replication fork, it has a top strand called the leading strand, or the good-guy strand as I call it, and another bottom strand, called the lagging strand, which I like to call the scumbag strand, because it is a pain in the butt to deal with. These unwound sections can now be used as templates to create 2 complimentary DNA strands, but remember, the 2 stands go in opposite directions in terms of their chemical structure, which means that making a new DNA strand for the leading strand is going to be much, much easier than for the lagging strand. For the leading, good-guy strand, an enzyme called DNA polymerase just adds matching nucleotides onto the main stem, all the way down the molecule. But, before it could do that, it needs a selection of nucleotides that fill in the section that's just been unzipped. They get started at the very beginning of the DNA molecule. DNA polymerase needs a bit of a primer, just a little thing for it to hook onto so that it can start building the new DNA chain, and for that little primer, you can thank the enzyme RNA primase. The leading strand only needs this RNA primer once at the very beginning, then DNA polymerase is all, "I got this," and it just follows the unzipping, adding new nucleotides to the chain continuously all the way down the molecule. Copying the lagging, or scumbag, strand, is well, it's a frickin' scumbag. This is because DNA polymerase can only copy strands in the 5-prime 3-prime direction, and the lagging strand is 3-prime 5-prime. So, DNA polymerase can only add new nucleotides to the free 3-prime end of a primer, so maybe the real scumbag here is the DNA polymerase. Since the lagging strand runs in the opposite direction, it has to be copied in a series of segments; and here, that awesome little enzyme, RNA primase, does its thing again, laying down an occasional short little RNA primer that gives the DNA polymerase a starting point to then work backwards along the strand. This is done in a ton of individual segments, each 1,000 to 2,000 base pairs long, each starting with an RNA primer, these are called Okazaki fragments. After the couple of married scientists who discovered this step in the process in the 1960s, and thank goodness they were married, so that we could just call them Okazaki fragments, instead of Okazaki-someone's-someone fragments. These allow the strands to be synthesized in short bursts, and then another kind of DNA polymerase has to go back over and replace all of those RNA primers, and then, all little fragments gets joined up by a final enzyme called DNA ligase. And that is why I say that the lagging strand is such a scumbag. DNA replication gets it wrong in about 1 in every 10 billion nucleotides, but don't think your body doesn't have an app for that. It turns out that DNA polymerases can also proof read, in a sense removing nucleotides from the end of a strand whenever they discover a mismatched base, because the last thing we want is an A when it would have been a G. Considering how tightly packed DNA is into each one of our cells, it's honestly amazing that more mistakes don't happen. Remember, we're talking about millions of miles worth of this stuff inside us, and this, my friends, is why scientists are not exaggerating when they called DNA the most celebrated molecule of all time.