Main content

## Linear algebra

### Course: Linear algebra > Unit 1

Lesson 5: Vector dot and cross products- Vector dot product and vector length
- Proving vector dot product properties
- Proof of the Cauchy-Schwarz inequality
- Vector triangle inequality
- Defining the angle between vectors
- Defining a plane in R3 with a point and normal vector
- Cross product introduction
- Proof: Relationship between cross product and sin of angle
- Dot and cross product comparison/intuition
- Vector triple product expansion (very optional)
- Normal vector from plane equation
- Point distance to plane
- Distance between planes

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Proof of the Cauchy-Schwarz inequality

Proof of the Cauchy-Schwarz Inequality. Created by Sal Khan.

## Want to join the conversation?

- Why does he bring up an artificial function out of thin air? [p(t)= ty-x](57 votes)
- It is extremely common to define auxiliary functions in calculus/analysis in order to prove things(17 votes)

- This reminds me of the uncertainty principle... Anyone know if there's a relation?(22 votes)
- I'm studying quantum mechanics now and it is indeed one way to derive the uncertainty principle.(37 votes)

- why is |c| ||y|| = || cy || ?(5 votes)
- || cy || =sqrt((c*y1)^2+(c*y2)^2+...+(c*yn)^2)

=sqrt(c^2*(y1^2+y2^2+...+(yn)^2))

=c*sqrt(y1^2+y2^2+...+(yn)^2)

=|c| ||y||(56 votes)

- I understand the outcome of this proof; but can anybody please explain what insight compels the decision to evaluate P(t) as P(b/2a)? It seems like if you didn't just inherently know to do this, by previous research into the Cauchy-Scwartz inequality, you wouldn't easily come to this substitution. Is there a naive way to come to b/2a?(16 votes)
- b/2a is something that comes up a lot when solving quadratic equations. All quadratic equations form a parabola which is given by the equation:
`ax² + bx +c = y`

All parabolas have a single vertex. The left and right sides of the graph are symmetric about the vertex, so if we can find the value at the vertex, we can easily find other information about the parabola and easily solve for x given any value for y (and a, b, and c obviously). If we change our equation into the form:`ax²+bx = y-c`

Then we can factor out an x:`x(ax+b) = y-c`

Since y-c only shifts the parabola up or down, it's unimportant for finding the x-value of the vertex. Because of this, I'll simply replace it with 0:`x(ax+b) = 0`

Now, we just solve for x:`x = 0`

and`ax+b = 0`

`x = -b/a`

This gives us 2 values of x that are an equal distance away from the vertex point. So, the vertex point is the value perfectly in between them (or the average). This gives:`vx = (0+(-b/a))/2`

or`vx = -b/2a`

(vx is the x-value of the vertex)

If you have any function, you can shift it left or right by changing the input:`f(x-h)`

shifts the graph`f(x)`

to the right by h units.

So, when Sal inputs`b/2a`

into the equation, what he's doing is inputting the value that will shift the vertex point to x=0. It's somewhat complex, but hopefully this helps.

Here's a website that talks more about the vertex of a parabola:

http://hotmath.com/hotmath_help/topics/vertex-of-a-parabola.html(30 votes)

- when he had 4ac >= b^2, is that just a coincidence that b^2 - 4ac is what is under the radical in the quadratic formula?(14 votes)
- Of course not, it comes from completing the square.

ax^2 + bx + c = 0

4a^2x^2 + 4abx + 4ac = 0

(2ax)^2 + 4abx + b^2 + 4ac - b^2 = 0

(2ax + b)^2 + 4ac - b^2 = 0

(2ax + b)^2 = b^2 - 4ac

2ax + b = ±√(b^2 - 4ac)

2ax = -b ±√(b^2 - 4ac)

x = (-b ±√(b^2 - 4ac)) / (2a)(33 votes)

- is there a difference between |x| and ||x||

edit: thanks for your help(5 votes)- |c| where c is a scalar is the absolute value of c.

|**x**| where**x**is a vector (note the bold letters) is equal to ||**x**||. There are multiple terms for this notation, it is called the following things: absolute value, norm, length, and/or the magnitude of the vector.

The difference is that ||c|| where c is a scalar doesn't really make sense, I have never seen that notation used for calculating the absolute value of a scalar.

Notice that in Sal's video here, he has |**x**dot**y**|. Remember that a dot product produces a scalar, so he is taking the absolute value of the scalar that comes from that particular dot product. In other words

||**x**dot**y**|| would not make sense, or maybe is just uncommon notation, as you don't usually see ||c|| (I will ask my professor if ||c|| exists to clarify later).

Edit: Just talked to my professor about ||c|| where c is a scalar. He said it is fine to write that, but uncommon. It still means the absolute value of the scalar.(28 votes)

- What irritates me a lot is the strategy for the proof. In former videos the strategy is clear. He wants to proof a certain relationship, for instance that the span of independent vectors with rank n spans R^n, so he represents this relationship in an equation and looks if it solves.

In this video it appears all so random for. The proof starts with "Well, let's take some function..." Why a function? Why this function? "And now, let's substitute..." Yest, but why? I can follow the whole process, but it is not understandable why he does one thing in the moment he does it. Can anyone explain the strategy of the proof a little bit more concrete? Thx. so much.(13 votes) - Where is the equation p(t) = ||ty-x||^2 from? How did you choose it?(11 votes)
- Look at the earlier videos introducing vectors and recall the insight about what happens when two vectors are colinear, that one is just a scalar of the other, and then think about what it takes to think of a simple formula where you know it is not guaranteed that two vectors are colinear.

Also, watching early videos in the Geometry section regarding lengths of sides of a triangle and the conclusions you can draw about the relations between those lengths for a group of 3 lines (or intervals) to form a triangle. Can you form a triangular shape if one side is the sum of the lengths of the other two sides? Or can you only form a line? Don't even worry about the right answer if you find it difficult, just try and think about what you concepts you are using in your head to prove it to yourself.(3 votes)

- Oh my god this solution is genius! But what if x y ∈ C rather than R?(9 votes)
- Great question,

What facts did we use about the real numbers in this proof? It seems like all we used was that |x| = sqrt(x_1^2+x_2^2+...+x_n^2) right? So we need to define a "norm" on C. If z=a+bi what should |z| be? The answer is |z|=a^2+b^2. We can extend this to a norm on a vector by writing |z|=sqrt(|z_1|^2+|z_2|^2+...+|z_n|^2). Then the proof follows the video exactly from here.

If you want a stranger way to think about it you might see that if z=a+bi we can think of C as actually being a copy of R^2 since they both have real dimension 2. In this since we have the correspondence z = (a,b). This gives us |z| = sqrt(a^2+b^2) which agrees with the definition we choose above. Then C^n is a copy of R^(2n) and we are done.(7 votes)

- let a and b be any vectors and x be the angle between a and b. Since we know that a.b = ||a|| ||b|| cos x, can we prove cauchy-schwarz inequality as follow:

a . b = ||a|| ||b|| cos x

|a . b| = | ||a|| ||b|| cos x |

|a . b| = ||a|| ||b|| |cos x|

|a . b| <= ||a|| ||b|| since |cos x| <= 1 (proved, this way is much easy to understand in my point of view) Please comnment!(9 votes)- Well, this would be fine, but the thing is that we usually
*define*the the angle between two nonzero vectors

and**a**

in**b**`n`

-space to be the number`x`

for which`cos x =`

, and the Cauchy-Schwarz inequality shows us that there is a unique such**a**·**b**/ (||**a**|| ||**b**||)`x`

in the interval`[0, π]`

.(4 votes)

## Video transcript

Let's say that I have
two nonzero vectors. Let's say the first vector is
x, the second vector is y. They're both in the set Rn
and they're nonzero. It turns out that the absolute
value of their-- let me do it in a different color. This color's nice. The absolute value of their
dot product of the two vectors-- and remember, this is
just a scalar quantity-- is less than or equal to the
product of their lengths. And we've defined the dot
product and we've defined lengths already. It's less than or equal to the
product of their lengths and just to push it even further,
the only time that this is equal, so the dot product of the
two vectors is only going to be equal to the lengths of
this-- the equal and the less than or equal apply only in the
situation-- let me write that down-- where one of these
vectors is a scalar multiple of the other. Or they're collinear. You know, one's just kind of the
longer or shorter version of the other one. So only in the situation where
let's just say x is equal to some scalar multiple of y. These inequalities or I guess
the equality of this inequality, this is called the
Cauchy-Schwarz Inequality. So let's prove it because you
can't take something like this just at face value. You shouldn't just
accept that. So let me just construct a
somewhat artificial function. Let me construct some function
of-- that's a function of some variables, some scalar t. Let me define p of t to be equal
to the length of the vector t times the vector-- some
scalar t times the vector y minus the vector x. It's the length of
this vector. This is going to be
a vector now. That squared. Now before I move forward
I want to make one little point here. If I take the length of any
vector, I'll do it here. Let's say I take the length
of some vector v. I want you to accept that this
is going to be a positive number, or it's at least greater
than or equal to 0. Because this is just going to be
each of its terms squared. v2 squared all the way
to vn squared. All of these are real numbers. When you square a real number,
you get something greater than or equal to 0. When you sum them up, you're
going to have something greater than or equal to 0. And you take the square root
of it, the principal square root, the positive square root,
you're going to have something greater than
or equal to 0. So the length of any real vector
is going to be greater than or equal to 0. So this is the length
of a real vector. So this is going to be greater
than or equal to 0. Now, in the previous video, I
think it was two videos ago, I also showed that the magnitude
or the length of a vector squared can also be rewritten
as the dot product of that vector with itself. So let's rewrite this
vector that way. The length of this vector
squared is equal to the dot product of that vector
with itself. So it's ty minus x
dot ty minus x. In the last video, I showed
you that you can treat a multiplication or you can treat
the dot product very similar to regular
multiplication when it comes to the associative, distributive
and commutative properties. So when you multiplied these,
you know, you could kind of view this as multiplying
these two binomials. You can do it the same way as
you would just multiply two regular algebraic binomials. You're essentially just using
the distributive property. But remember, this isn't just
regular multiplication. This is the dot product
we're doing. This is vector multiplication
or one version of vector multiplication. So if we distribute it out, this
will become ty dot ty. So let me write that out. That'll be ty dot ty. And then we'll get a minus--
let me do it this way. Then we get the minus
x times this ty. Instead of saying times,
I should be very careful to say dot. So minus x dot ty. And then you have this ty
times this minus x. So then you have
minus ty dot x. And then finally, you have the
x's dot with each other. And you can view them as
minus 1x dot minus 1x. You could say plus minus 1x. I could just view this as plus
minus 1 or plus minus 1. So this is minus 1x
dot minus 1x. So let's see. So this is what my whole
expression simplified to or expanded to. I can't really call this
a simplification. But we can use the fact that
this is commutative and associative to rewrite this
expression right here. This is equal to y dot
y times t squared. t is just a scalar. Minus-- and actually,
this is 2. These two things
are equivalent. They're just rearrangements of
the same thing and we saw that the dot product is
associative. So this is just equal to 2
times x dot y times t. And I should do that in maybe
a different color. So these two terms result in
that term right there. And then if you just rearrange
these you have a minus 1 times a minus 1. They cancel out, so those will
become plus and you're just left with plus x dot x. And I should do that in a
different color as well. I'll do that in an
orange color. So those terms end up
with that term. Then of course, that term
results in that term. And remember, all I did
is I rewrote this thing and said, look. This has got to be greater
than or equal to 0. So I could rewrite that here. This thing is still just
the same thing. I've just rewritten it. So this is all going to be
greater than or equal to 0. Now let's make a little bit of
a substitution just to clean up our expression
a little bit. And we'll later back substitute
into this. Let's define this as a. Let's define this piece
right here as b. So the whole thing
minus 2x dot y. I'll leave the t there. And let's define this or
let me just define this right here as c. X dot x as c. So then, what does our
expression become? It becomes a times t squared
minus-- I want to be careful with the colors-- b
times t plus c. And of course, we know that it's
going to be greater than or equal to 0. It's the same thing as
this up here, greater than or equal to 0. I could write p of t here. Now this is greater than or
equal to 0 for any t that I put in here. For any real t that
I put in there. Let me evaluate our function
at b over 2a. And I can definitely do this
because what was a? I just have to make sure I'm not
dividing by 0 any place. So a was this vector
dotted with itself. And we said this was
a nonzero vector. So this is the square
of its length. It's a nonzero vector, so some
of these terms up here would end up becoming positively
when you take its length. So this thing right
here is nonzero. This is a nonzero vector. Then 2 times the dot product
with itself is also going to be nonzero. So we can do this. We don't worry about dividing
by 0, whatever else. But what will this
be equal to? This'll be equal to-- and I'll
just stick to the green. It takes too long to keep
switching between colors. This is equal to a times this
expression squared. So it's b squared
over 4a squared. I just squared 2a to
get the 4a squared. Minus b times this. So b times-- this is just
regular multiplication. b times b over 2a. Just write regular
multiplication there. Plus c. And we know all of that is
greater than or equal to 0. Now if we simplify this a little
bit, what do we get? Well this a cancels out with
this exponent there and you end up with a b squared
right there. So we get b squared over 4a
minus b squared over 2a. That's that term over there. Plus c is greater than
or equal to 0. Let me rewrite this. If I multiply the numerator and
denominator of this by 2, what do I get? I get 2b squared over 4a. And the whole reason I did
that is to get a common denominator here. So what do you get? You get b squared over 4a minus
2b squared over 4a. So what do these two
terms simplify to? Well the numerator is b squared
minus 2b squared. So that just becomes minus b
squared over 4a plus c is greater than or equal to 0. These two terms add up to
this one right here. Now if we add this to both sides
of the equation, we get c is greater than or equal
to b squared over 4a. It was a negative on
the left-hand side. If I add it to both sides it's
going to be a positive on the right-hand side. We're approaching something that
looks like an inequality, so let's back substitute our
original substitutions to see what we have now. So where was my original
substitutions that I made? It was right here. And actually, just to simplify
more, let me multiply both sides by 4a. I said a, not only
is it nonzero, it's going to be positive. This is the square
of its length. And I already showed you that
the length of any real vector's going to be positive. And the reason why I'm taking
great pains to show that a is positive is because if I
multiply both sides of it I don't want to change the
inequality sign. So let me multiply both sides
of this by a before I substitute. So we get 4ac is greater than
or equal to b squared. There you go. And remember, I took
great pains. I just said a is definitely a
positive number because it is essentially the square of the
length. y dot y is the square of the length of y, and that's
a positive value. It has to be positive. We're dealing with
real vectors. Now let's back substitute
this. So 4 times a, 4 times y dot y. y dot y is also-- I might as
well just write it there. y dot y is the same thing as
the magnitude of y squared. That's y dot y. This is a. y dot y, I showed you that
in the previous video. Times c. c is x dot x. Well x dot x is the
same thing as the length of vector x squared. So this was c. So 4 times a times c is going
to be greater than or equal to b squared. Now what was b? b was
this thing here. So b squared would be 2
times x dot y squared. So we've gotten to this
result so far. And so what can we
do with this? Oh sorry, and this whole
thing is squared. This whole thing right
here is b. So let's see if we can
simplify this. So we get-- let me switch
to a different color. 4 times the length of y squared
times the length of x squared is greater than or equal
to-- if we squared this quantity right here, we
get 4 times x dot y. 4 times x dot y times x dot y. Actually, even better, let me
just write it like this. Let me just write 4 times
x dot y squared. Now we can divide
both sides by 4. That won't change
our inequality. So that just cancels
out there. And now let's take the
square root of both sides of this equation. So the square roots of both
sides of this equation-- these are positive values, so the
square root of this side is the square root of each
of its terms. That's just an exponent property. So if you take the square root
of both sides you get the length of y times the length of
x is greater than or equal to the square root of this. And we're going to take the
positive square root. We're going to take the positive
square root on both sides of this equation. That keeps us from having to
mess with anything on the inequality or anything
like that. So the positive square root is
going to be the absolute value of x dot y. And I want to be very careful
to say this is the absolute value because it's possible that
this thing right here is a negative value. But when you square it, you want
to be careful that when you take the square root
of it that you stay a positive value. Because otherwise when we take
the principal square root, we might mess with the inquality. We're taking the positive square
root, which will be-- so if you take the absolute
value, you're ensuring that it's going to be positive. But this is our result. The absolute value of the dot
product of our vectors is less than the product of the
two vectors lengths. So we got our Cauchy-Schwarz
inequality. Now the last thing I said is
look, what happens if x is equal to some scalar
multiple of y? Well in that case, what's
the absolute value? The absolute value of x dot y? Well that equals--
that equals what? If we make the substitution that
equals the absolute value of c times y. That's just x dot y, which
is equal to just from the associative property. It's equal to the absolute value
of c times-- we want to make sure our absolute value,
keep everything positive. y dot y. Well this is just equal to c
times the magnitude of y-- the length of y squared. Well that just is equal to the
magnitude of c times-- or the absolute value of our scalar
c times our length of y. Well this right here,
I can rewrite this. I mean you can prove this to
yourself if you don't believe it, but this-- we could put the
c inside of the magnitude and that could be a good
exercise for you to prove. But it's pretty straightforward. You just do the definition
of length. And you multiply it by c. This is equal to the magnitude
of cy times-- let me say the length of cy times
the length of y. I've lost my vector notation
someplace over here. There you go. Now, this is x. So this is equal to the length
of x times the length of y. So I showed you kind of
the second part of the Cauchy-Schwarz Inequality that
this is only equal to each other if one of them is a scalar
multiple of the other. If you're a little uncomfortable
with some of these steps I took, it might
be a good exercise to actually prove it. For example, to prove that the
absolute value of c times the length of the vector y is
the same thing as the length of c times y. Anyway, hopefully you found
this pretty useful. The Cauchy-Schwarz Inequality
we'll use a lot when we prove other results in
linear algebra. And in a future video, I'll
give you a little more intuition about why this makes a
lot of sense relative to the dot product.