If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Genetic linkage & mapping

What it means for genes to be linked. How to determine recombination frequency for a pair of genes.

Key points:

  • When genes are found on different chromosomes or far apart on the same chromosome, they assort independently and are said to be unlinked.
  • When genes are close together on the same chromosome, they are said to be linked. That means the alleles, or gene versions, already together on one chromosome will be inherited as a unit more frequently than not.
  • We can see if two genes are linked, and how tightly, by using data from genetic crosses to calculate the recombination frequency.
  • By finding recombination frequencies for many gene pairs, we can make linkage maps that show the order and relative distances of the genes on the chromosome.

Introduction

In general, organisms have a lot more genes than chromosomes. For instance, we humans have roughly 19,000 genes on 23 chromosomes (present in two sets)1. Similarly, the humble fruit fly—a favorite subject of study for geneticists—has around 13,000 genes on 4 chromosomes (also present in two sets)2.
The consequence? Each gene isn't going to get its own chromosome. In fact, not even close! Quite a few genes are going to be lined up in a row on each chromosome, and some of them are going to be squished very close together.
Does this affect how genes are inherited? In some cases, the answer is yes. Genes that are sufficiently close together on a chromosome will tend to "stick together," and the versions (alleles) of those genes that are together on a chromosome will tend to be inherited as a pair more often than not.
This phenomenon is called genetic linkage. When genes are linked, genetic crosses involving those genes will lead to ratios of gametes (egg and sperm) and offspring types that are not what we'd predict from Mendel's law of independent assortment. Let's take a closer look at why this is the case.

What is genetic linkage?

When genes are on separate chromosomes, or very far apart on the same chromosomes, they assort independently. That is, when the genes go into gametes, the allele received for one gene doesn't affect the allele received for the other. In a double heterozygous organism (AaBb), this results in the formation of all 4 possible types of gametes with equal, or 25%, frequency.
Two diagrams showing the possible combination of gametes. The diagram on the left is titled genes on different chromosomes. A circle contains 2 blue lines and 2 red lines. One blue line is labeled uppercase A and is next to a red line labeled lowercase a. A second blue line is labeled uppercase B and it is paired with a red line labeled lowercase b. The lines labeled A are the same length and are longer than the lines labeled B. Below the image is the label gametes made and there are 4 gametes shown. 1 gamete has an uppercase A blue line and uppercase B blue line and is labeled uppercase AB 25 percent. The next gamete is labeled lowercase a lowercase b, 25 percent and has a red lowercase a line and a red lowercase b line. The next gamete is labeled lowercase a uppercase B, 25 percent and has a red lowercase a line and a blue uppercase B line. The final gamete is labeled uppercase A lowercase b, 25 percent and has a blue uppercase A line and a red lowercase b line. The diagram on the right is labeled genes far apart on the same chromosome A circle contains one blue line and one red line. The blue line has an uppercase A at the top of the line and a uppercase B at the bottom of the line. The red line has a lowercase a at the top of the line and a lowercase b at the bottom of the line. Below the image is the label gametes made and there are 4 different gametes shown. One gamete is labeled uppercase a uppercase B, 25 percent and has one blue line with an uppercase A at the top and a uppercase B at the bottom of the line. One gamete is labeled lowercase a lowercase b, 25 percent and has one red line with a lowercase a at the top and a lowercase b at the bottom of the line. One gamete is labeled lowercase a uppercase B, 25 percent and has one line. The top portion of the line is red with a lowercase a and the bottom portion of the line is blue with an uppercase B. The final gamete is labeled uppercase A lowercase b, 25 percent and has one line. The top portion of the line is blue with an uppercase A and the bottom portion of the line is red with a lowercase b.
Why is this the case? Genes on separate chromosomes assort independently because of the random orientation of homologous chromosome pairs during meiosis. Homologous chromosomes are paired chromosomes that carry the same genes, but may have different alleles of those genes. One member of each homologous pair comes from an organism's mom, the other from its dad.
On the left of the diagram there is a blue line labeled from dad, and the line has a black line at the top labeled capital A. Next to the blue line there is a red line labeled from mom, and the line has a black line at the top labeled lower-case a. At the bottom of the 2 lines is the caption homologous pair. There is an arrow pointing from the homologous pair with the caption DNA replication for meiosis. On the right of the arrow, the blue line and the red line each have an identical copy, and the paired blue and red lines are joined at the center. There is a line under one of the blue lines and one of the red lines labeled duplicated homologues.
As illustrated in the diagram below, the homologues of each pair separate in the first stage of meiosis. In this process, which side the "dad" and "mom" chromosomes of each pair go to is random. When we are following two genes, this results in four types of gametes that are produced with equal frequency.
The diagram shows the possible chromosome combinations in gametes depending on how the chromosomes are arranged during metaphase 1 of meiosis. The diagram shows an original cell with 2 pairs of chromosomes. One pair of chromosomes is depicted as a blue line labeled uppercase A and a red line labeled lowercase a. The other pair of chromosomes is shown as a blue line labeled uppercase B and a red line labeled lowercase b. The original cell undergoes DNA replication to produce a cell with homologous pairs of both chromosomes. The cell is then shown with two possible arrangements of the homologous pairs of chromosomes at metaphase I. In the first scenario, The duplicated blue uppercase A chromosome and the duplicated blue uppercase B chromosome are aligned on the left of the cell and the duplicated lowercase a chromosome and the duplicated lowercase b chromosome are aligned on the right of the cell. This cell points to 2 cells, one has two uppercase A chromosomes and two uppercase B chromosomes and the other has two lowercase a chromosomes and two lowercase b chromosomes. The diagram indicates that when these two cells divide during meiosis II 4 gametes are produced 2 of the gametes have an uppercase A and an uppercase B chromosome and 2 of the gametes have a lowercase a and lowercase b chromosome. The diagram indicates there is a 25 percent chance of each gamete being produced. The second scenario in the diagram shows the lowercase a red chromosomes and the uppercase B blue chromosomes aligning on the left side of the cell and the uppercase A blue chromosomes and the lowercase b blue chromosomes aligning on the right side of the cell during metaphase I. There are 2 arrows pointing to 2 cells. One cell has a pair of red lowercase a chromosomes and a pair of blue uppercase B chromosomes, and the other cell has a pair of blue uppercase A chromosomes and a pair of red lowercase b chromosomes. The diagram indicates that these two cells each divide during meiosis II and 4 possible gametes form. Two gametes have a red lowercase a and blue uppercase B chromosome and the other 2 gametes have a blue uppercase A and a red lowercase b chromosome. The diagram indicates there is a 25 percent chance of each gamete being produced.
When genes are on the same chromosome but very far apart, they assort independently due to crossing over (homologous recombination). This is a process that happens at the very beginning of meiosis, in which homologous chromosomes randomly exchange matching fragments. Crossing over can put new alleles together in combination on the same chromosome, causing them to go into the same gamete. When genes are far apart, crossing over happens often enough that all types of gametes are produced with 25% frequency.
Homologous chromosomes are shown as 1 X shaped blue structure and one X shaped red structure. At the top of the blue X are 2 small lines on each arm of the X labeled with a capital A, and at the top of the red X are 2 small lines on each arm of the X labeled with a lowercase a. At the bottom of the blue X are 2 small lines on each arm of the X labeled with a capital B, and at the bottom of the red X are 2 small lines on each arm of the X labeled with a lowercase b. There is an arrow pointing from the homologous pair of chromosomes to an image of the blue X and the red X with the lower portion of one arm of the blue X crossing the lower portion of one arm of the red X and under the image is the label crossing over. From the crossing over, and arrow points to the homologous chromosomes that are restructured. On the blue X, there is now a small red segment on one of the lower arms of the X with a line that is labeled lowercase b, and on the red X there is a small blue segment on one of the lower arms of the X with a line that is labeled upper case B.
When genes are very close together on the same chromosome, crossing over still occurs, but the outcome (in terms of gamete types produced) is different. Instead of assorting independently, the genes tend to "stick together" during meiosis. That is, the alleles of the genes that are already together on a chromosome will tend to be passed as a unit to gametes. In this case, the genes are linked. For example, two linked genes might behave like this:
The diagram is titled genes close together on the same chromosome. There is a circle with a blue and a red line in it. At the top of the blue line are two horizontal lines, with one line right below the second line. The top line is labeled uppercase A and the second line is labeled uppercase B. The top of the red line has the same two horizontal lines as the blue line; the top line is labeled lowercase a and the second line is labeled lowercase b. Below the image are 4 boxes with the title gametes made. Box 1 contains a blue line with an uppercase A and uppercase B at the top of the line. Box 1 is labeled uppercase A uppercase B and below the box is the label 48 percent. Box 2 contains a line that is both blue and red. The top section of the line that is blue is marked with an uppercase A, and the remainder of the line is red with the top of the red portion of the line marked with a lowercase b. Box 2 is labeled uppercase A lowercase b and below the box is the label 2 percent. Box 3 contains a line that is both red and blue. The top section of the line that is red is marked with a lowercase a, and the remainder of the line is blue with the top of the blue portion of the line marked with an uppercase B. Box 3 is labeled lowercase a uppercase B and below the box is the label 2 percent. Box 4 contains a red line with a lowercase a and a lowercase b at the top of the line. Box 4 is labeled lowercase a lowercase b and below the box is the label 48 percent. Below the gamete chart are 2 different arrows. One arrow connects box 1 with box 4 and is labeled parental, and the second arrow connects box 2 and box 3 and is labeled recombinant.
Now, we see gamete types that are present in very unequal proportions. The common types of gametes contain parental configurations of alleles—that is, the ones that were already together on the chromosome in the organism before meiosis (i.e, on the chromosome it got from its parents). The rare types of gametes contain recombinant configurations of alleles, that is, ones that can only form if a recombination event (crossover) occurs in between the genes.
Why are the recombinant gamete types rare? The basic reason is that crossovers between two genes that are close together are not very common. Crossovers during meiosis happen at more or less random positions along the chromosome, so the frequency of crossovers between two genes depends on the distance between them. A very short distance is, effectively, a very small "target" for crossover events, meaning that few such events will take place (as compared to the number of events between two further-apart genes).
Homologous chromosomes are shown as 1 X shaped blue structure and one X shaped red structure. At the top of the blue X are 2 small lines on each arm of the X, the top lines are labeled with a capital A, and the other lines are labeled with a capital B. At the top of the red X are 2 small lines on each arm of the X, the top lines are labeled with a lowercase a and the other lines are labeled with lowercase bs. Next to the homologous chromosomes is a small bracket beside the lines labeled a and b with the caption only crossovers happening in this small region can produce uppercase A lowercase b or lowercase a uppercase B chromosomes. From that caption there is an arrow pointing to a box with the caption Recombinant chromosomes do form, but not very often. There is an arrow pointing from the box to recombinant homologous chromosomes. The upper segment of one arm of the blue X is red and the line on the red segment is labeled lowercase a. The upper segment of one arm of the red X is blue and the line is labeled uppercase A.
Thanks to this relationship, we can use the frequency of recombination events between two genes (i.e., their degree of genetic linkage) to estimate their relative distance apart on the chromosome. Two very close-together genes will have very few recombination events and be tightly linked, while two genes that are slightly further apart will have more recombination events and be less tightly linked. In the next section, we'll see how to calculate the recombination frequency between two genes, using information from genetic crosses.

Finding recombination frequency

Let's suppose we are interested in seeing whether two genes in the fruit fly (Drosophila) are linked to each other, and if so, how tightly linked they are. In our example, the genes are3:
  • The purple gene, with a dominant pr+ allele that specifies normal, red eyes and a recessive pr allele that specifies purple eyes.
  • The vestigial gene, with a dominant vg+ allele that specifies normal, long wings and a recessive vg allele that specifies short, "vestigial" wings.
If we want to measure recombination frequency between these genes, we first need to construct a fly in which we can observe recombination. That is, we need to make a fly that is not just heterozygous for both genes, but where we know exactly which genes are together on the chromosome. To do so, we can start by crossing two homozygous flies as shown below:
The P generation shows an image of a female fruit fly with the label red eyes normal wings crossed with a male fruit fly with the label purple eyes vestigial wings. Next to the female fruit fly are 2 parallel red lines with markings on the left and the right on each line. The markings on the left are both labeled pr positive symbol, and the marking on the right are both labeled vg positive symbol. Next to the male fruit fly are 2 parallel blue lines with markings on the left and the right on each line. The markings on the left are both labeled pr and the markings on the right are both labeled vg. The arrow from the parental cross points to a fruit fly in the F1 generation. Next to the fruit fly are 2 parallel lines; the top line is red and the bottom line is blue. Both lines have marks on the left and on the right. The mark on the left side of the red line is pr positive symbol, and the mark on the left side of the blue line is pr. The mark on the right side of the red line is vg positive symbol, and the mark on the right side of the blue line is vg. There is an arrow pointing to the red and blue lines with the caption double heterozygote-in which we know which alleles are together on the chromosome.
_Image modified from "Drosophila melanogaster," by Madboy74 (CC0/public domain)._
This cross gives us exactly what we need to observe recombination: a fly that's heterozygous for the purple and vestigial genes, in which we know clearly which alleles are together on a single chromosome.
Now, we need a way to "see" recombination events. The most direct approach would be to look into the gametes made by the heterozygous fly and see what alleles they had on their chromosomes. Practically, though, it's much simpler to use those gametes in a cross and see what the offspring look like!
To do so, we can cross a double heterozygous fly with a tester, a fly that's homozygous recessive for all the genes of interest (in this case, the pr and vg alleles). The purpose of using a tester is to ensure that the alleles provided by the non-tester parent fully determine the phenotype, or appearance, of the offspring. When we cross our fly of interest to a tester, we can directly "read" the genotype of each gamete from the physical appearance of the offspring.
An F1 cross of 2 fruit flies is shown. Next to the female are 2 parallel lines; the top line is red and the bottom line is blue. Both lines have marks on the left and on the right. The mark on the left side of the red line is pr positive symbol, and the mark on the left side of the blue line is pr. The mark on the right side of the red line is vg positive symbol, and the mark on the right side of the blue line is vg. There is an arrow pointing to the red and blue lines with the caption double heterozygote-in which we know which alleles are together on the chromosome. Next to the male fruit fly are 2 parallel purple lines with marks on the left of the lines labeled pr and marks on the right of the line labeled vg. An arrow pointing to the lines has the caption Tester-homozygous recessive for both genes so we can see exactly what alleles the heterozygous female puts into each egg.
_Image modified from "Drosophila melanogaster," by Madboy74 (CC0/public domain)._
Below, we can see a modified Punnett square showing the results of the cross between our double heterozygous fly and the tester fly. Four different types of eggs are produced by a double heterozygous female fly, each of which combines with a sperm from the male tester fly. Four different phenotypic (appearance-based) classes of offspring are produced in this cross, each corresponding to a particular gamete from the female parent:
The top of the diagram shows an F1 cross of 2 fruit flies. Next to the female are 2 parallel lines; the top line is red and the bottom line is blue. Both lines have marks on the left and on the right. The mark on the left side of the red line is pr positive symbol, and the mark on the left side of the blue line is pr. The mark on the right side of the red line is vg positive symbol, and the mark on the right side of the blue line is vg. There is an arrow pointing to the red and blue lines with the caption double heterozygote-in which we know which alleles are together on the chromosome. Next to the male fruit fly are 2 parallel purple lines with marks on the left of the lines labeled pr and marks on the right of the line labeled vg. An arrow pointing to the lines has the caption Tester-homozygous recessive for both genes so we can see exactly what alleles the heterozygous female puts into each egg. An arrow points from the F 1 cross to a row of 4 boxes. To the left of the row of boxes is a drawing of a sperm cell with a purple line with a pr marked on the left side of the line and vg marked on the right side of the line. Above the boxes is the label egg cells. Above box 1 the egg cell has a red line marked pr positive symbol vg positive symbol. Inside box 1 is an image of a fruit fly and a red line marked pr positive symbol vg positive symbol and a purple line marked pr vg. Below box 1 is the number 1339. Above box 2 the egg cell has a blue line marked with a pr and a vg. Inside box 2 is an image of fruit fly and a blue line and a purple line and both lines are marked with a pr and a vg. Below box 2 is the number 1195. Above box 3 the egg cell has a line that is half red and half blue. The red portion of the line is marked pr positive sign and the blue portion of the line is marked vg. Inside box 3 is an image of a fruit fly with a half red and half blue line and a purple line. The red portion of the red and blue line is marked pr positive symbol and the blue portion of the line is marked vg. The purple line is marked pr and vg. Below box 3 is the number 151. Above box 4 the egg cell has a line that is half blue and half red. The blue portion of the line is marked pr and the red portion of the line is marked vg positive symbol. Inside box 4 is an image of a fruit fly with a half blue and half red line and a purple line. The blue portion of the blue and red line is marked pr and the red portion of the line is marked vg positive sign. The purple line is marked with a pr and a vg. Below box 3 is the number 154. Underneath the boxes are 2 brackets. The first bracket is between box 1 and box 2 and is labeled parental and the second bracket is between box 3 box 4 and is labeled recombinant.
_Image modified from "Drosophila melanogaster," by Madboy74 (CC0/public domain)._
The four classes of offspring are not produced in equal numbers, which tells us that the purple and vestigial genes are linked. As we expect for linked genes, the parental chromosome configurations are over-represented in the offspring, while the recombinant chromosome configurations are under-represented. To measure linkage quantitatively, we can calculate the recombination frequency (RF) between the purple and vestigial genes:
Recombination frequency (RF)=RecombinantsTotal offspring×100%
In our case, the recombinant progeny classes are the red-eyed, vestigial-winged flies and the purple-eyed, long-winged flies. We can identify these flies as the recombinant classes for two reasons: one, we know from the series of crosses we performed that they must have inherited a chromosome from their mother that had undergone a recombination event; and two, they are the underrepresented classes (relative to the overrepresented, parental classes).
So, for the cross above, we can write our equation as follows:
RF=151+1541339+1195+151+154×100%=10.7%
The recombination frequency between the purple and vestigial genes is 10.7%.

Recombination frequency and linkage maps

What is the benefit of calculating recombination frequency? One way that recombination frequencies have been used historically is to build linkage maps, chromosomal maps based on recombination frequencies. In fact, studying linkage helped early geneticists establish that chromosomes were in fact linear, and that each gene had its own specific place on a chromosome.
Recombination frequency is not a direct measure of how physically far apart genes are on chromosomes. However, it provides an estimate or approximation of physical distance. So, we can say that a pair of genes with a larger recombination frequency are likely farther apart, while a pair with a smaller recombination frequency are likely closer together.
Importantly, recombination frequency "maxes out" at 50% (which corresponds to genes being unlinked, or assorting independently). That is, 50% is the largest recombination frequency we'll ever directly measure between genes. So, if we want to figure out the map distance between genes further apart than this, we must do so by adding the recombination frequencies of multiple pairs of genes, "building up" a map that extends between the two distant genes.
Comparison of recombination frequencies can also be used to figure out the order of genes on a chromosome. For example, let's suppose we have three genes, A, B, and C, and we want to know their order on the chromosome (ABC? ACB? CAB?) If we look at recombination frequencies among all three possible pairs of genes (AC, AB, BC), we can figure out which genes lie furthest apart, and which other gene lies in the middle. Specifically, the pair of genes with the largest recombination frequency must flank the third gene:
The diagram has a horizontal line with three vertical marks on the line labeled A, B and C. Above the line the distance between A and B is labeled as 13.2 percent and the distance between B and C is labeled 6.4 percent. Beneath the line the distance between A and C is labeled 18.5 percent. Above the diagram are 3 formulas. The first formula is RF equals A minus B equals 13.2 percent. The second formula is RF equals B minus C equals 6.4 percent and the third formula is RF equals A minus C equals 18.5 percent. There is an arrow pointing at 18.5 percent with the caption largest RF equals outermost genes of trio.
Recombination frequencies are based on those for fly genes v, cv, and ct, as given in D. C Bergmann4.
By doing this type of analysis with more and more genes (e.g., adding in genes D, E, and F and figuring out their relationships to A, B, and C) we can build up linkage maps of entire chromosomes. In linkage maps, you may see distances expressed as centimorgans or map units rather than recombination frequencies. Luckily, there's a direct relationship among these values: a 1% recombination frequency is equivalent to 1 centimorgan or 1 map unit.
Is map distance always the same as recombination frequency? Sometimes, the directly measured recombination frequency between two genes is not the most accurate measure of their map distance. That's because, in addition to the single crossovers we've discussed in this article, double crossovers (two separate crossovers between the two genes) can also occur:
At the top of the diagram are two parallel lines; the top line is red and the line beneath it is blue. The red line has a vertical mark at the beginning of the line labeled uppercase A and a vertical mark at the end of the line labeled uppercase C. The blue line also has a vertical mark at the beginning of the line and it is labeled lowercase a and a vertical line at the end of the line that is labeled lowercase c. Below the parallel lines is an arrow pointing to another pair of parallel lines and the arrow has the caption double crossover. The two parallel lines at the bottom of the diagram have alternating segments of red and blue. The beginning segment of the top line is red and has a mark at the beginning of the line labeled uppercase A. The middle segment of the top line is blue, and the end segment of the top line is red, and at the end of the line is a mark labeled uppercase C. The beginning segment of the bottom line is blue and has a mark at the beginning of the line labeled lowercase a. The middle segment of the bottom line is red, and the end segment of the bottom line is blue and at the end of the line is a mark labeled lowercase c.
Double crossovers are "invisible" if we're only monitoring two genes, in that they put the original two genes back on the same chromosome (but with a swapped-out bit in the middle). For example, the double crossover shown above wouldn't be detectable if we were just looking at genes A and C, since these genes end up back in their original configuration.
Because of this, double crossovers are not counted in the directly measured recombination frequency, resulting a slight underestimate of the actual number of recombination events. This is why, in the example below, the recombination frequency directly measured between A and C is a bit smaller than the sum of the recombination frequencies between A-B and B-C. When B is included, double crossovers between A and C can be detected and accounted for.
A horizontal line with 3 markers labeled A, B and C. The distance between mark A and mark B is 13.2 centimeters and the distance between mark B and mark C is 6.4 centimeters. Above the measured lines is the formula 13.2centimeters plus 6.4 centimeters equals 19.6 centimeters. Beneath the marked line is a line between mark A and mark C labeled 18.5 centimeters. There is an arrow pointing to 18.5 with the caption the directly measured A to C RF is smaller than the map distance calculated from the A to B and B to C RFs due to double crossovers.
By measuring recombination frequencies for closer-together gene pairs and adding them up, we can minimize "invisible" double crossovers and get more accurate map distances.

Want to join the conversation?