If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Transcription and mRNA processing

Transcription involves rewriting genetic information from DNA to mRNA, with RNA polymerase playing a crucial role. In eukaryotic cells, DNA to mRNA transcription occurs within the nucleus, producing pre-mRNA. This pre-mRNA undergoes processing, including the addition of a 5' cap, a poly-A tail, and splicing out introns, resulting in mature mRNA, which then leaves the nucleus for protein translation. Created by Sal Khan.

Want to join the conversation?

Video transcript

- [Voiceover] What we're going to do in this video is a little bit of a deep dive on transcription. And just as a bit of a review, we touch on it on the video on replication, transcription and translation. Transcription in everyday language just means to rewrite something or to rewrite some information in another form. And that's essentially what's happening here. Transcription is when we take the information encoded in the gene in DNA and encode essentially that same information in mRNA. So transcription we are going from DNA to messenger RNA, and we're gonna, in this video, focus on genes that code for proteins. So this first step is the transcription, the DNA to messenger RNA, and then in a future video we'll dig a little bit deeper into translation. We will translate that information into an actual protein. But these diagrams give a little bit of an overview of it. It's a little bit simpler in bacteria. You have the DNA just floating around in the cytosol, and so the transcription takes place. You start with that DNA, that protein coding gene in the DNA, and from that you code the messenger RNA, you see that in that purple color right over here, and then that messenger RNA can be involved with the ribosome, and that's the translation process to actually produce the polypeptide, to produce the protein. In eukaryotic cells, and we're going to get into a little bit more depth in this video, the transcription, the DNA to mRNA, that happens inside of the nucleus. There's essentially two steps here. You go from DNA to what we would call pre-mRNA, let me write that down, pre-mRNA, which is depicted right over there, and then it needs to be processed to turn into what we would call mRNA, which then can leave the nucleus to be translated into a protein. So now that we have that overview, let's dig a little bit deeper into this and understand the different actors and understand if we're talking about a eukaryotic cell what type of processing might actually go on. So right over here, we are going to start with the protein coding gene inside of the DNA, right over here, and the primary actor that's not the DNA or the mRNA here is going to be RNA polymerase. It's used to create a sequence that will become a nucleotide sequence, that will become the messenger RNA. So this RNA polymerase, it needs to know where to start. The way it knows where to start is it attaches to a sequence of the DNA known as a promoter. And every gene is going to have a promoter associated with it, especially if we're talking about eukaryotic cells. Sometimes you might have a promoter associated with a collection of genes as well. But in general, if you've got a gene, you're gonna have a promoter. That's how the RNA polymerase knows to attach right over there. Once it attaches, well then, it is able to separate the strands. It separates the strands, and it's pretty interesting, because when we went in deep into replication, you saw all of these actors, the helicase and whatever else, but this RNA polymerase complex is actually quite capable. Not only it separates the strand and then it's actually able to code for the RNA. It does that the same way that when we studied DNA polymerase, it does it in only one direction. It can only add more nucleotides on the three prime end. So it encodes from the five prime to the three prime direction. Notice this arrow here, we're extending it on the three prime end of the RNA. So as you can see here, when it does this, it's only encoding one side of... Or it's only interacting, I guess you could say, or coding complementary information to one side. But let's think about this a little bit. We could call the side that it is interacting with, you can call that the template strand because that side of the DNA is acting as the template for forming that RNA. But if you think about the information that that RNA is actually going to encode, well it's gonna contain the same information as the coding strand of DNA, as the other stand of DNA, because these nucleotides right over here, this nucleotide is going to be complementary to this one over here, just as this nucleotide was complementary to that one over there. And you can see it in a little bit more depth if we actually were to add the nucleotides. So this is the template strand. If you have a thymine, well on the RNA, you'd have the adenine. Look, on the coding strand of DNA, the one up here, you would also have an adenine. Essentially the coding strand and the RNA, essentially end up being the same sequence, but the one difference is that you won't find the thymine in the RNA, instead you'll find a similar nitrogenous base, and that is uracil. But uracil plays the role of thymine, so you're essentially coding the same information. So once again, this bottom strand is acting as a template, but it's going to be the resulting RNA that gets coded, is essentially going to have the same information that we had in the coding strand. Just to get an appreciation for what this looks like, I would even write, I'd put looks in quotations, I even did little quote things with my fingers when I said that, is that it's hard to really visualize what these things look like, but you can see here that the RNA polymerase complex, and this is for a specific organism, can be very, very complex and involved, and it's fascinating how these things interact. Every time you're studying biology and someone like me is going to give you these nice clean narratives of how these enzymes interact with the different macromolecules, like the DNA or the RNA, you should always remember this is amazing. These are these molecules interacting with each other, bouncing into each other. It's happening incredibly fast inside of the cell. You should be in awe of this. It's happening in all of your cells or as we speak. This is pretty incredible stuff. So the next thing you have to think about, this right over here, we are extending the RNA, well when does this thing actually stop? It stops once we... So this RNA polymerase is going to keep going on and then this blue, we've labeled this a terminator. So let me write. So this area is a terminator, and there's multiple ways that that signals to the RNA polymerase that "Hey, it's time to stop." More particularly, it somehow creates something structurally that the polymerase just lets go. One mechanism, that's depicted right over here is that the mRNA that's coded, this could happen in bacteria, is that the mRNA that's coded forms a hairpin. So it has to have the right complementary base pairs, base pairs right over here, to form this hairpin. This hairpin, along with the things around the hairpin, essentially make it, impair the polymerase to keep on going. So, the complex kind of changes a little bit. So, it let's go, or at least that's how people believe it. There's other forms of how the terminator can act. It might be sequences that parts of the polymerase complex recognize and it makes a conformation change so that the RNA polymerase lets go. If we're talking about a prokaryote, we're done. This would be our messenger RNA which then can go to a ribosome and then be translated into a protein. But if we're talking about a eukaryote, then we have to do a little bit of processing. If we're talking about a eukaryote, if this is a prokaryote right over here, this would be our mRNA. If this is s eukaryote, then this is our pre-mRNA, which now has to be processed. And you might say, "Well how is that going to be processed?" Well, there's a couple of things that are going to be done. Some things are going to be added at the beginning and the end of the mRNA. The five prime cap, this is a modified guanine, modified guanine right over here, which is going to help in the translation process as the ribosomes attach onto it. And then you have this poly-A tail, and it's called a poly-A tail because it has a bunch of adenines at the end, right over here. These not only help in the translation process, it helps make sure that the information is more robust, that the ends of the mRNA don't in some way become, or makes it less likely that they're going to become damaged. Now the other thing that needs to be processed, and this is one of those fascinating things in evolutionary biology, is that we will have in this mRNA sequence, you're going to have parts of the sequence, which we currently consider to be nonsense sequence. Nonsense sequences, and we call them introns. I'm gonna put it in quotes because in general in evolution it's seldom that things have absolutely no purpose, but these are not coding for the protein that is going to be coded by our initial gene. And so, these are actually processed out, they are spliced out. I'm not going to go into the details of the actors that cause the splicing, but as part of this eukaryotic processing, you add the cap, you add the tail, and then you splice out the introns, and once you've spliced out the introns all you have left are the exons. So you have that. It's going to be connected to that. It's going to be connected to that. And so this is what you have resulted. This is in a eukaryote, you will have this mature mRNA. And that's what we saw right over here that can then, let me underline that in a color you can see, right over here, which then migrates out of the nucleus to a ribosome where it can be translated.