If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Proof of the Cauchy-Schwarz inequality

Proof of the Cauchy-Schwarz Inequality. Created by Sal Khan.

Want to join the conversation?

  • blobby green style avatar for user Usaid Khan
    Why does he bring up an artificial function out of thin air? [p(t)= ty-x]
    (58 votes)
    Default Khan Academy avatar avatar for user
  • leaf blue style avatar for user blahdee327
    This reminds me of the uncertainty principle... Anyone know if there's a relation?
    (22 votes)
    Default Khan Academy avatar avatar for user
  • mr pink red style avatar for user vanshreebhalotia
    why is |c| ||y|| = || cy || ?
    (6 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Mark Walle
    I understand the outcome of this proof; but can anybody please explain what insight compels the decision to evaluate P(t) as P(b/2a)? It seems like if you didn't just inherently know to do this, by previous research into the Cauchy-Scwartz inequality, you wouldn't easily come to this substitution. Is there a naive way to come to b/2a?
    (17 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user Kyler Kathan
      b/2a is something that comes up a lot when solving quadratic equations. All quadratic equations form a parabola which is given by the equation:
      ax² + bx +c = y
      All parabolas have a single vertex. The left and right sides of the graph are symmetric about the vertex, so if we can find the value at the vertex, we can easily find other information about the parabola and easily solve for x given any value for y (and a, b, and c obviously). If we change our equation into the form:
      ax²+bx = y-c
      Then we can factor out an x:
      x(ax+b) = y-c
      Since y-c only shifts the parabola up or down, it's unimportant for finding the x-value of the vertex. Because of this, I'll simply replace it with 0:
      x(ax+b) = 0
      Now, we just solve for x:
      x = 0 and
      ax+b = 0
      x = -b/a
      This gives us 2 values of x that are an equal distance away from the vertex point. So, the vertex point is the value perfectly in between them (or the average). This gives:
      vx = (0+(-b/a))/2 or
      vx = -b/2a (vx is the x-value of the vertex)
      If you have any function, you can shift it left or right by changing the input:
      f(x-h) shifts the graph f(x) to the right by h units.
      So, when Sal inputs b/2a into the equation, what he's doing is inputting the value that will shift the vertex point to x=0. It's somewhat complex, but hopefully this helps.
      Here's a website that talks more about the vertex of a parabola:
      http://hotmath.com/hotmath_help/topics/vertex-of-a-parabola.html
      (32 votes)
  • blobby green style avatar for user Billy Buchholz
    when he had 4ac >= b^2, is that just a coincidence that b^2 - 4ac is what is under the radical in the quadratic formula?
    (14 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user ArDeeJ
      Of course not, it comes from completing the square.

      ax^2 + bx + c = 0
      4a^2x^2 + 4abx + 4ac = 0
      (2ax)^2 + 4abx + b^2 + 4ac - b^2 = 0
      (2ax + b)^2 + 4ac - b^2 = 0
      (2ax + b)^2 = b^2 - 4ac
      2ax + b = ±√(b^2 - 4ac)
      2ax = -b ±√(b^2 - 4ac)
      x = (-b ±√(b^2 - 4ac)) / (2a)
      (36 votes)
  • leaf green style avatar for user Keving
    is there a difference between |x| and ||x||
    edit: thanks for your help
    (5 votes)
    Default Khan Academy avatar avatar for user
    • primosaur ultimate style avatar for user Derek M.
      |c| where c is a scalar is the absolute value of c.
      |x| where x is a vector (note the bold letters) is equal to ||x||. There are multiple terms for this notation, it is called the following things: absolute value, norm, length, and/or the magnitude of the vector.
      The difference is that ||c|| where c is a scalar doesn't really make sense, I have never seen that notation used for calculating the absolute value of a scalar.

      Notice that in Sal's video here, he has |x dot y|. Remember that a dot product produces a scalar, so he is taking the absolute value of the scalar that comes from that particular dot product. In other words
      ||x dot y|| would not make sense, or maybe is just uncommon notation, as you don't usually see ||c|| (I will ask my professor if ||c|| exists to clarify later).

      Edit: Just talked to my professor about ||c|| where c is a scalar. He said it is fine to write that, but uncommon. It still means the absolute value of the scalar.
      (30 votes)
  • blobby green style avatar for user Niels-Oliver Walkowski
    What irritates me a lot is the strategy for the proof. In former videos the strategy is clear. He wants to proof a certain relationship, for instance that the span of independent vectors with rank n spans R^n, so he represents this relationship in an equation and looks if it solves.

    In this video it appears all so random for. The proof starts with "Well, let's take some function..." Why a function? Why this function? "And now, let's substitute..." Yest, but why? I can follow the whole process, but it is not understandable why he does one thing in the moment he does it. Can anyone explain the strategy of the proof a little bit more concrete? Thx. so much.
    (14 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Upeksha Amarasinghe
    Where is the equation p(t) = ||ty-x||^2 from? How did you choose it?
    (11 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Josh Froelich
      Look at the earlier videos introducing vectors and recall the insight about what happens when two vectors are colinear, that one is just a scalar of the other, and then think about what it takes to think of a simple formula where you know it is not guaranteed that two vectors are colinear.

      Also, watching early videos in the Geometry section regarding lengths of sides of a triangle and the conclusions you can draw about the relations between those lengths for a group of 3 lines (or intervals) to form a triangle. Can you form a triangular shape if one side is the sum of the lengths of the other two sides? Or can you only form a line? Don't even worry about the right answer if you find it difficult, just try and think about what you concepts you are using in your head to prove it to yourself.
      (3 votes)
  • leaf green style avatar for user RaBBit
    Oh my god this solution is genius! But what if x y ∈ C rather than R?
    (9 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Lucas Van Meter
      Great question,

      What facts did we use about the real numbers in this proof? It seems like all we used was that |x| = sqrt(x_1^2+x_2^2+...+x_n^2) right? So we need to define a "norm" on C. If z=a+bi what should |z| be? The answer is |z|=a^2+b^2. We can extend this to a norm on a vector by writing |z|=sqrt(|z_1|^2+|z_2|^2+...+|z_n|^2). Then the proof follows the video exactly from here.

      If you want a stranger way to think about it you might see that if z=a+bi we can think of C as actually being a copy of R^2 since they both have real dimension 2. In this since we have the correspondence z = (a,b). This gives us |z| = sqrt(a^2+b^2) which agrees with the definition we choose above. Then C^n is a copy of R^(2n) and we are done.
      (7 votes)
  • blobby green style avatar for user Sek Meng Sow
    let a and b be any vectors and x be the angle between a and b. Since we know that a.b = ||a|| ||b|| cos x, can we prove cauchy-schwarz inequality as follow:

    a . b = ||a|| ||b|| cos x
    |a . b| = | ||a|| ||b|| cos x |
    |a . b| = ||a|| ||b|| |cos x|
    |a . b| <= ||a|| ||b|| since |cos x| <= 1 (proved, this way is much easy to understand in my point of view) Please comnment!
    (9 votes)
    Default Khan Academy avatar avatar for user
    • leaf grey style avatar for user Qeeko
      Well, this would be fine, but the thing is that we usually define the the angle between two nonzero vectors a and b in n-space to be the number x for which cos x = a · b / (||a|| ||b||), and the Cauchy-Schwarz inequality shows us that there is a unique such x in the interval [0, π].
      (4 votes)

Video transcript

Let's say that I have two nonzero vectors. Let's say the first vector is x, the second vector is y. They're both in the set Rn and they're nonzero. It turns out that the absolute value of their-- let me do it in a different color. This color's nice. The absolute value of their dot product of the two vectors-- and remember, this is just a scalar quantity-- is less than or equal to the product of their lengths. And we've defined the dot product and we've defined lengths already. It's less than or equal to the product of their lengths and just to push it even further, the only time that this is equal, so the dot product of the two vectors is only going to be equal to the lengths of this-- the equal and the less than or equal apply only in the situation-- let me write that down-- where one of these vectors is a scalar multiple of the other. Or they're collinear. You know, one's just kind of the longer or shorter version of the other one. So only in the situation where let's just say x is equal to some scalar multiple of y. These inequalities or I guess the equality of this inequality, this is called the Cauchy-Schwarz Inequality. So let's prove it because you can't take something like this just at face value. You shouldn't just accept that. So let me just construct a somewhat artificial function. Let me construct some function of-- that's a function of some variables, some scalar t. Let me define p of t to be equal to the length of the vector t times the vector-- some scalar t times the vector y minus the vector x. It's the length of this vector. This is going to be a vector now. That squared. Now before I move forward I want to make one little point here. If I take the length of any vector, I'll do it here. Let's say I take the length of some vector v. I want you to accept that this is going to be a positive number, or it's at least greater than or equal to 0. Because this is just going to be each of its terms squared. v2 squared all the way to vn squared. All of these are real numbers. When you square a real number, you get something greater than or equal to 0. When you sum them up, you're going to have something greater than or equal to 0. And you take the square root of it, the principal square root, the positive square root, you're going to have something greater than or equal to 0. So the length of any real vector is going to be greater than or equal to 0. So this is the length of a real vector. So this is going to be greater than or equal to 0. Now, in the previous video, I think it was two videos ago, I also showed that the magnitude or the length of a vector squared can also be rewritten as the dot product of that vector with itself. So let's rewrite this vector that way. The length of this vector squared is equal to the dot product of that vector with itself. So it's ty minus x dot ty minus x. In the last video, I showed you that you can treat a multiplication or you can treat the dot product very similar to regular multiplication when it comes to the associative, distributive and commutative properties. So when you multiplied these, you know, you could kind of view this as multiplying these two binomials. You can do it the same way as you would just multiply two regular algebraic binomials. You're essentially just using the distributive property. But remember, this isn't just regular multiplication. This is the dot product we're doing. This is vector multiplication or one version of vector multiplication. So if we distribute it out, this will become ty dot ty. So let me write that out. That'll be ty dot ty. And then we'll get a minus-- let me do it this way. Then we get the minus x times this ty. Instead of saying times, I should be very careful to say dot. So minus x dot ty. And then you have this ty times this minus x. So then you have minus ty dot x. And then finally, you have the x's dot with each other. And you can view them as minus 1x dot minus 1x. You could say plus minus 1x. I could just view this as plus minus 1 or plus minus 1. So this is minus 1x dot minus 1x. So let's see. So this is what my whole expression simplified to or expanded to. I can't really call this a simplification. But we can use the fact that this is commutative and associative to rewrite this expression right here. This is equal to y dot y times t squared. t is just a scalar. Minus-- and actually, this is 2. These two things are equivalent. They're just rearrangements of the same thing and we saw that the dot product is associative. So this is just equal to 2 times x dot y times t. And I should do that in maybe a different color. So these two terms result in that term right there. And then if you just rearrange these you have a minus 1 times a minus 1. They cancel out, so those will become plus and you're just left with plus x dot x. And I should do that in a different color as well. I'll do that in an orange color. So those terms end up with that term. Then of course, that term results in that term. And remember, all I did is I rewrote this thing and said, look. This has got to be greater than or equal to 0. So I could rewrite that here. This thing is still just the same thing. I've just rewritten it. So this is all going to be greater than or equal to 0. Now let's make a little bit of a substitution just to clean up our expression a little bit. And we'll later back substitute into this. Let's define this as a. Let's define this piece right here as b. So the whole thing minus 2x dot y. I'll leave the t there. And let's define this or let me just define this right here as c. X dot x as c. So then, what does our expression become? It becomes a times t squared minus-- I want to be careful with the colors-- b times t plus c. And of course, we know that it's going to be greater than or equal to 0. It's the same thing as this up here, greater than or equal to 0. I could write p of t here. Now this is greater than or equal to 0 for any t that I put in here. For any real t that I put in there. Let me evaluate our function at b over 2a. And I can definitely do this because what was a? I just have to make sure I'm not dividing by 0 any place. So a was this vector dotted with itself. And we said this was a nonzero vector. So this is the square of its length. It's a nonzero vector, so some of these terms up here would end up becoming positively when you take its length. So this thing right here is nonzero. This is a nonzero vector. Then 2 times the dot product with itself is also going to be nonzero. So we can do this. We don't worry about dividing by 0, whatever else. But what will this be equal to? This'll be equal to-- and I'll just stick to the green. It takes too long to keep switching between colors. This is equal to a times this expression squared. So it's b squared over 4a squared. I just squared 2a to get the 4a squared. Minus b times this. So b times-- this is just regular multiplication. b times b over 2a. Just write regular multiplication there. Plus c. And we know all of that is greater than or equal to 0. Now if we simplify this a little bit, what do we get? Well this a cancels out with this exponent there and you end up with a b squared right there. So we get b squared over 4a minus b squared over 2a. That's that term over there. Plus c is greater than or equal to 0. Let me rewrite this. If I multiply the numerator and denominator of this by 2, what do I get? I get 2b squared over 4a. And the whole reason I did that is to get a common denominator here. So what do you get? You get b squared over 4a minus 2b squared over 4a. So what do these two terms simplify to? Well the numerator is b squared minus 2b squared. So that just becomes minus b squared over 4a plus c is greater than or equal to 0. These two terms add up to this one right here. Now if we add this to both sides of the equation, we get c is greater than or equal to b squared over 4a. It was a negative on the left-hand side. If I add it to both sides it's going to be a positive on the right-hand side. We're approaching something that looks like an inequality, so let's back substitute our original substitutions to see what we have now. So where was my original substitutions that I made? It was right here. And actually, just to simplify more, let me multiply both sides by 4a. I said a, not only is it nonzero, it's going to be positive. This is the square of its length. And I already showed you that the length of any real vector's going to be positive. And the reason why I'm taking great pains to show that a is positive is because if I multiply both sides of it I don't want to change the inequality sign. So let me multiply both sides of this by a before I substitute. So we get 4ac is greater than or equal to b squared. There you go. And remember, I took great pains. I just said a is definitely a positive number because it is essentially the square of the length. y dot y is the square of the length of y, and that's a positive value. It has to be positive. We're dealing with real vectors. Now let's back substitute this. So 4 times a, 4 times y dot y. y dot y is also-- I might as well just write it there. y dot y is the same thing as the magnitude of y squared. That's y dot y. This is a. y dot y, I showed you that in the previous video. Times c. c is x dot x. Well x dot x is the same thing as the length of vector x squared. So this was c. So 4 times a times c is going to be greater than or equal to b squared. Now what was b? b was this thing here. So b squared would be 2 times x dot y squared. So we've gotten to this result so far. And so what can we do with this? Oh sorry, and this whole thing is squared. This whole thing right here is b. So let's see if we can simplify this. So we get-- let me switch to a different color. 4 times the length of y squared times the length of x squared is greater than or equal to-- if we squared this quantity right here, we get 4 times x dot y. 4 times x dot y times x dot y. Actually, even better, let me just write it like this. Let me just write 4 times x dot y squared. Now we can divide both sides by 4. That won't change our inequality. So that just cancels out there. And now let's take the square root of both sides of this equation. So the square roots of both sides of this equation-- these are positive values, so the square root of this side is the square root of each of its terms. That's just an exponent property. So if you take the square root of both sides you get the length of y times the length of x is greater than or equal to the square root of this. And we're going to take the positive square root. We're going to take the positive square root on both sides of this equation. That keeps us from having to mess with anything on the inequality or anything like that. So the positive square root is going to be the absolute value of x dot y. And I want to be very careful to say this is the absolute value because it's possible that this thing right here is a negative value. But when you square it, you want to be careful that when you take the square root of it that you stay a positive value. Because otherwise when we take the principal square root, we might mess with the inquality. We're taking the positive square root, which will be-- so if you take the absolute value, you're ensuring that it's going to be positive. But this is our result. The absolute value of the dot product of our vectors is less than the product of the two vectors lengths. So we got our Cauchy-Schwarz inequality. Now the last thing I said is look, what happens if x is equal to some scalar multiple of y? Well in that case, what's the absolute value? The absolute value of x dot y? Well that equals-- that equals what? If we make the substitution that equals the absolute value of c times y. That's just x dot y, which is equal to just from the associative property. It's equal to the absolute value of c times-- we want to make sure our absolute value, keep everything positive. y dot y. Well this is just equal to c times the magnitude of y-- the length of y squared. Well that just is equal to the magnitude of c times-- or the absolute value of our scalar c times our length of y. Well this right here, I can rewrite this. I mean you can prove this to yourself if you don't believe it, but this-- we could put the c inside of the magnitude and that could be a good exercise for you to prove. But it's pretty straightforward. You just do the definition of length. And you multiply it by c. This is equal to the magnitude of cy times-- let me say the length of cy times the length of y. I've lost my vector notation someplace over here. There you go. Now, this is x. So this is equal to the length of x times the length of y. So I showed you kind of the second part of the Cauchy-Schwarz Inequality that this is only equal to each other if one of them is a scalar multiple of the other. If you're a little uncomfortable with some of these steps I took, it might be a good exercise to actually prove it. For example, to prove that the absolute value of c times the length of the vector y is the same thing as the length of c times y. Anyway, hopefully you found this pretty useful. The Cauchy-Schwarz Inequality we'll use a lot when we prove other results in linear algebra. And in a future video, I'll give you a little more intuition about why this makes a lot of sense relative to the dot product.