If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Overview of transcription

In transcription, the DNA sequence of a gene is transcribed (copied out) to make an RNA molecule.

Key points:

  • Transcription is the first step in gene expression. It involves copying a gene's DNA sequence to make an RNA molecule.
  • Transcription is performed by enzymes called RNA polymerases, which link nucleotides to form an RNA strand (using a DNA strand as a template).
  • Transcription has three stages: initiation, elongation, and termination.
  • In eukaryotes, RNA molecules must be processed after transcription: they are spliced and have a 5' cap and poly-A tail put on their ends.
  • Transcription is controlled separately for each gene in your genome.

Introduction

Have you ever had to transcribe something? Maybe someone left a message on your voicemail, and you had to write it down on paper. Or maybe you took notes in class, then rewrote them neatly to help you review.
As these examples show, transcription is a process in which information is rewritten. Transcription is something we do in our everyday lives, and it's also something our cells must do, in a more specialized and narrowly defined way. In biology, transcription is the process of copying out the DNA sequence of a gene in the similar alphabet of RNA.

Overview of transcription

Transcription is the first step in gene expression, in which information from a gene is used to construct a functional product such as a protein. The goal of transcription is to make a RNA copy of a gene's DNA sequence. For a protein-coding gene, the RNA copy, or transcript, carries the information needed to build a polypeptide (protein or protein subunit). Eukaryotic transcripts need to go through some processing steps before translation into proteins.
In transcription, a region of DNA opens up. One strand, the template strand, serves as a template for synthesis of a complementary RNA transcript. The other strand, the coding strand, is identical to the RNA transcript in sequence, except that it has uracil (U) bases in place of thymine (T) bases.
Example:
Coding strand: 5'-ATGATCTCGTAA-3' Template strand: 3'-TACTAGAGCATT-5' RNA transcript: 5'-AUGAUCUCGUAA-3'
For a protein-coding gene, the RNA transcript contains the information needed to synthesize a polypeptide (protein or protein subunit) with a particular amino acid sequence. In this case:
RNA transcript (acting as messenger RNA): 5'-AUGAUCUCGUAA-3' Polypeptide: Met-Ile-Ser-STOP

RNA polymerase

The main enzyme involved in transcription is RNA polymerase, which uses a single-stranded DNA template to synthesize a complementary strand of RNA. Specifically, RNA polymerase builds an RNA strand in the 5' to 3' direction, adding each new nucleotide to the 3' end of the strand.
RNA polymerase synthesizes an RNA strand complementary to a template DNA strand. It synthesizes the RNA strand in the 5' to 3' direction, while reading the template DNA strand in the 3' to 5' direction. The template DNA strand and RNA strand are antiparallel.
RNA transcript: 5'-UGGUAGU...-3' (dots indicate where nucleotides are still being added at 3' end) DNA template: 3'-ACCATCAGTC-5'

Stages of transcription

Transcription of a gene takes place in three stages: initiation, elongation, and termination. Here, we will briefly see how these steps happen in bacteria. You can learn more about the details of each stage (and about how eukaryotic transcription is different) in the stages of transcription article.
  1. Initiation. RNA polymerase binds to a sequence of DNA called the promoter, found near the beginning of a gene. Each gene (or group of co-transcribed genes, in bacteria) has its own promoter. Once bound, RNA polymerase separates the DNA strands, providing the single-stranded template needed for transcription.
    The promoter region comes before (and slightly overlaps with) the transcribed region whose transcription it specifies. It contains recognition sites for RNA polymerase or its helper proteins to bind to. The DNA opens up in the promoter region so that RNA polymerase can begin transcription.
  2. Elongation. One strand of DNA, the template strand, acts as a template for RNA polymerase. As it "reads" this template one base at a time, the polymerase builds an RNA molecule out of complementary nucleotides, making a chain that grows from 5' to 3'. The RNA transcript carries the same information as the non-template (coding) strand of DNA, but it contains the base uracil (U) instead of thymine (T).
    RNA polymerase synthesizes an RNA transcript complementary to the DNA template strand in the 5' to 3' direction. It moves forward along the template strand in the 3' to 5' direction, opening the DNA double helix as it goes. The synthesized RNA only remains bound to the template strand for a short while, then exits the polymerase as a dangling string, allowing the DNA to close back up and form a double helix.
    In this example, the sequences of the coding strand, template strand, and RNA transcript are:
    Coding strand: 5' - ATGATCTCGTAA-3'
    Template strand: 3'-TACTAGAGCATT-5'
    RNA: 5'-AUGAUC...-3' (the dots indicate where nucleotides are still being added to the RNA strand at its 3' end)
  3. Termination. Sequences called terminators signal that the RNA transcript is complete. Once they are transcribed, they cause the transcript to be released from the RNA polymerase. An example of a termination mechanism involving formation of a hairpin in the RNA is shown below.
    The terminator DNA encodes a region of RNA that forms a hairpin structure followed by a string of U nucleotides. The hairpin structure in the transcript causes the RNA polymerase to stall. The U nucleotides that come after the hairpin form weak bonds with the A nucleotides of the DNA template, allowing the transcript to separate from the template and ending transcription.

Eukaryotic RNA modifications

In bacteria, RNA transcripts can act as messenger RNAs (mRNAs) right away. In eukaryotes, the transcript of a protein-coding gene is called a pre-mRNA and must go through extra processing before it can direct translation.
  • Eukaryotic pre-mRNAs must have their ends modified, by addition of a 5' cap (at the beginning) and 3' poly-A tail (at the end).
  • Many eukaryotic pre-mRNAs undergo splicing. In this process, parts of the pre-mRNA (called introns) are chopped out, and the remaining pieces (called exons) are stuck back together.
    Top of image: Diagram of a pre-mRNA with a 5' cap and 3' poly-A tail. The 5' cap is on the 5' end of the pre-mRNA and is a modified G nucleotide. The poly-A tail is on the 3' end of the pre-mRNA and consists of a long string of A nucleotides (only a few of which are shown).
    The pre-mRNA still contains both exons and introns. Along the length of the mRNA, there is an alternating pattern of exons and introns: Exon 1 - Intron 1 - Exon 2 - Intron 2 - Exon 3. Each consists of a stretch of RNA nucleotides.
    During splicing, the introns are removed from the pre-mRNA, and the exons are stuck together to form a mature mRNA.
    Bottom of image: Mature mRNA that does not contain the intron sequences (Exon 1 - Exon 2 - Exon 3 only).
End modifications increase the stability of the mRNA, while splicing gives the mRNA its correct sequence. (If the introns are not removed, they'll be translated along with the exons, producing a "gibberish" polypeptide.)
To learn more about pre-mRNA modifications in eukaryotes, check out the article on pre-mRNA processing.

Transcription happens for individual genes

Not all genes are transcribed all the time. Instead, transcription is controlled individually for each gene (or, in bacteria, for small groups of genes that are transcribed together). Cells carefully regulate transcription, transcribing just the genes whose products are needed at a particular moment.
For example, the diagram below shows a "snapshot" of an imaginary cell's RNAs at a given moment in time. In this cell, genes 1, 2 and 3, are transcribed, while gene 4 is not. Also, genes 1, 2, and 3 are transcribed at different levels, meaning that different numbers of RNA molecules are made for each.
Diagram showing that individual genes are transcribed in different amounts.
A region of DNA containing four genes is shown, with the transcribed region of each gene highlighted in dark blue. The number of transcripts of each gene is indicated above the DNA (on a Y- axis). There are six transcripts of gene 1, one transcript of gene 2, twelve transcripts of gene 3, and no transcripts of gene 4.
This is not an illustration of any actual set of genes and their transcription levels, but rather, illustrates that transcription is controlled individually for genes and other transcription units.
In the following articles, we'll take a more in-depth look at RNA polymerase, the stages of transcription, and the process of RNA modification in eukaryotes. We'll also consider some important differences between bacterial and eukaryotic transcription.

Want to join the conversation?

  • leaf grey style avatar for user Anson Chan
    The hairpin somewhat appears to look like a tRNA molecule. Am I wrong in saying that tRNA is formed from these hairpin structures?
    (23 votes)
    Default Khan Academy avatar avatar for user
    • leafers tree style avatar for user emilyabrash
      No, you're not wrong. A tRNA contains hairpins as well, though the hairpins play different roles in the two cases. In transcription termination, the hairpin causes the RNA polymerase to stall and the transcript to separate from the DNA. In a tRNA, multiple hairpins form and give the tRNA molecule the 3D shape it needs to perform its job of delivering amino acids.
      (36 votes)
  • blobby green style avatar for user Sukidhar9
    if introns are not important, why are introns are formed?
    (14 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      Good question!

      Introns have multiple roles in biology including the regulation of gene expression.

      Other introns have functions after they are spliced out from the transcript and can act as signaling or regulatory molecules.

      Some relatively rare types of introns appear to be parasitic DNA molecules — they insert copies of themselves into genes and then splice themselves out from the RNA presumably to keep the host cell alive. It is possible that the more typical introns originated from such parasitic DNA elements.

      This is still an area of active research and it is quite likely that more functions for introns will be uncovered in the future.

      If you wish to know more, you could start with this section of the wikipedia article on introns:
      https://en.wikipedia.org/wiki/Intron#Biological_functions_and_evolution
      (22 votes)
  • leafers seedling style avatar for user sreelakshmi.s
    do the presence of introns indicate something related to evolution?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • spunky sam blue style avatar for user Meggie Lund
      Not really. Introns enable one gene to produce multiple polypeptide sequences, thereby creating a more efficient genome. This will make more sense if you look at the examples in the pre-mRNA processing article. I think you're thinking of pseudogenes, which are non-coding regions remaining in an organism's DNA from ancestral roots. You're correct in your conclusion that introns are non-coding, but just because a sequence is an intron in one pre-mRNA sequence doesn't mean that it can't be included in the exon sequence in another.
      (11 votes)
  • blobby green style avatar for user Priyanka
    Hi, this isn't mentioned in this article, but I would like to ask,
    What is the difference between a gene and a cistron? Why do we need the term , cistron, in the first place?
    And what do the terms monocistronic and polycistronic mean?
    (5 votes)
    Default Khan Academy avatar avatar for user
  • leaf blue style avatar for user Megan Sullivan
    does the hairpin structure come in to play in transcription?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • aqualine seedling style avatar for user SpinosaurusRex
      A hairpin loop is an unpaired loop of messenger RNA (mRNA) that is created when an mRNA strand folds and forms base pairs with another section of the same strand. The resulting structure looks like a loop or a U-shape.

      Hairpins are a common type of secondary structure in RNA molecules. In RNA, the secondary structure is the basic shape that the sequence of A, C, U, and G nucleotides form after they are linked in series, such a folding or curling of the nucleic acid strand. mRNA hairpins can be formed when two complementary sequences in a single mRNA molecule meet and bind together, after a folding or wrinkling of the molecule. Hairpin loops can also form in DNA molecules, but are most commonly observed in mRNA.

      There are many instances of the hairpin loop phenomenon among nucleic acid strands. One example of a hairpin loop is the termination sequence for transcription in some prokaryotes. Once a polymerase meets this loop, it falls of and transcription ends. Another more general example is tRNA, a central player in protein synthesis, which is partially formed by hairpin loops. The tRNA molecule actually contains three hairpin loops that form the shape of a three-leafed clover. One of these hairpin loops contains a sequence called the anticodon, which recognizes and decodes the mRNA molecule three nucleotides (one codon) at a time during translation. This clover-leaf structure supports the eventual connection between every codon, anti-codon and amino acid.

      http://www.nature.com/scitable/definition/hairpin-loop-mrna-314
      (11 votes)
  • blobby green style avatar for user Kaitlin DeJesus
    I thought helicase was the enzyme that separates the DNA helix for the SSB to keep the DNA strands separated?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • male robot donald style avatar for user will.butacu
    What I don't understand is: If the Promoter is located at the 5' end of a gene how does RNA polymerase start there if it reads from 3' to 5' and syntetase RNA from 5' to 3?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • starky ultimate style avatar for user aryan0904
    Are there other ways that the mRNA strand could detach from the DNA strand instead of the hairpin turn? And what would happen if the mRNA nucleotide accidentally gets changed instead of the normal one ie. a mutation?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      This is briefly covered in the next article — short answer: yes, but transcription termination is still being actively studied and is not completely understood.

      Additional reading:
      https://en.wikipedia.org/wiki/Eukaryotic_transcription#Termination
      https://www.nature.com/scitable/topicpage/dna-transcription-426


      I'm not completely sure I understand your second question — are you asking what would happen if the "wrong" base was incorporated into an mRNA?

      If so, probably not much since each gene typically will make multiple transcripts and most mRNAs have a very short lifetime. (Note that this is almost certainly something that happens all the time since all biological processes make errors.)

      While I've never see any evidence that any of this ever actually happens, it seems possible that in rare cases the change might make an mRNA encode a toxic protein that could kill a cell or worse yet trigger cancer formation. I suppose if you were spectacularly unlucky it might even promote prion formation (a contagious toxic protein structure).
      (5 votes)
  • duskpin ultimate style avatar for user Tammie Derpine
    Won't the RNA have the wrong sequence if the introns are spliced, or is it predetermined to omit the codons in the introns in order to have the "perfect" code in the mature RNA?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • piceratops tree style avatar for user Jen
      Introns are actually noncoding DNA segments (in other words, they do not code for proteins), so splicing them out actually helps produce a functional protein rather than potentially disrupt protein function. However, this doesn't mean introns are useless either; in fact, they are actually very important for regulating gene expression.

      We've learned a lot about introns since their discovery but many questions about them and their functions still remain unresolved. You can learn more about them in the link below. Hope that helps!

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3325483/
      (5 votes)
  • scuttlebug green style avatar for user Tzviofen ✡
    Does the transcribed region always start with bases TAC, so that the RNA will start with bases AUG, which codes for methionine?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user RowanH
      No, transcription starts upstream of the AUG, so the mRNA contains a 5' untranslated region. Then ribosomes translate starting from the AUG in the mRNA. The details of how they find the AUG is different in eukaryotes and prokaryotes.
      (3 votes)