So what we wrote in English in the previous video, we'll try to convert it mathematically.
So
again, this is my f one axis, my f two axis. This is my f one dash axis, on which my variance,
my variance of my projected points. Because if I project all my points, if I project each of
these points onto f one dash, the variance will be very high. Right. I'd, I want to find f one
dash. For simplicity, let me call my f one dash as u one, because this is what you'll typically
find in most lecture nodes or in most textbooks. So I'll use the same terminology here. Let
me call it u one dash. Now that's the first thing, okay? The second thing is, I really don't care
about finding this whole line. I only care about finding this direction, because once I know
the direction, I can project any point on this direction, right? So what I care about is finding
a direction. And we know that we represent directions using unit vectors. So what is a unit
vector? I'll make my u one a unit vector. And what is a unit vector? Unit vector basically
means the length of this vector equals to one, because I really don't care. I only want to find
this direction. That's all I care about. Okay, so this is what my u one actually is. My u one is
the direction. My u one is. The direction is the direction on which, if I project each of my
points, the variance will be maximal. Right? Let's try to build the math around it. So let me
just draw a much more simplified system here. Let's assume this is my f one. This is my f
two. Let's assume this is my direction u one. Okay? Of course this direction, I can draw a
line. Okay, this is my direction u one. So let's assume I have a point here. Let's call it xi. A
point can be represented as a vector. If I want to project this xi. Let's assume this point, this
projected point here is x I dash. So what am I saying here? X I dash is nothing but projection
of x I onto u one. Okay? Let's just define, let's just call it the projection of x I on u one. Let's
call it x I dash. Right? So now my data set, actually given data set, is xi. Let's just ignore yis,
right? Now, I equals to one to n. I'm creating a new data set from this called dash of x I dash,
such that my x I dash is nothing but projection of x I on u one, right? So now let's go and see.
So my x I dash is projection of x I on u one. And we learned in the linear algebra, in the
linear algebra series of videos, we learned that this is nothing but u one dot xi divided by
length of u one square. We just learned it, right? We learned this in the projection when we
understood what is a projection, what is a unit vector? And ideas like that. Now we know
that u one is a unit vector, which means this value is one, which means my x I dash is
nothing but u one transpose x I can. This is the, this is how I can represent my dot product,
right? This is one of the interpretations of dot product. Right? We saw this again in linear
algebra chapter. So each of my x I dash now is nothing but u one transpose x I. So given any
point x I can convert into x I dash using u one dash. Okay, now one thing that we should
remember is if I'm given x bar, let x bar be the mean vector, the mean vector of x. I can
multiply that with u one transpose and get x I, sorry, x dash bar. This is the mean of x I dash
I equals to one to n. So I can get the mean vector for x I dash by just taking the mean vector
of x I's and multiplying and doing a dot product with u one. Okay, so this is important. We'll
see why to use this. How to use this? Right. So what is our whole task? Let's write it
mathematically. Our task is we are saying that u one is the direction we want to find. Find u
one, find u one. Let me write it much more clearly here. Instead of this, find u one, such that
the variance of points xi projected onto u one. Again, I, going from one to n is maximal or
maximum. Okay, this is a task, right? This is the problem that we want to solve. So first, let's
solve this problem. What is the variance of this? This is nothing, but let me write it carefully.
This is nothing but variance of u one transpose x I equals to one to n by, from simple
definition, right? What am I doing here? I am just replacing this with this. From definition,
nothing very fancy. Okay, now let's go back. Let's go back and understand this. What does
this mean? Let's write, write it mathematically. What is the formula for variance? It is the
average. It's the average. This part is nothing, but it's the average dispersion of each of the
points from the mean, right? That's what variance is squared. Of course. Of course. Squared.
Okay, so let me just write it for you first, it is u one transpose x I minus u one transpose x
bar square. Now, you might ask, why did you write this? So what is this? This is x I dash, and
this is mean of x I dash. This is a formula for variance, isn't it? This is the simple formula for
variance. Now remember your projection. When you do ui transpose x I, you should
remember that u one, u one transpose multiplied by x I x I is a column vector, right? Which
means it has n rows and one column. What about this? This has one row and n columns. By
multiplying it, you're going to get a scalar. You're going to get one scalar, right? So these are
actually numbers, even though they're represented as vectors. This product results in one
scalar value, not a vector. Okay, so the mean everything works out well, now we know. So
when we started off, we said x, our original data set has been column standardized. So what
does column standardization mean? It means my mean vector is at zero is at origin, because
for each column, because I've already standardized the data, right? Which means my mean
vector will be at zero, which means if my mean vector is at zero, this is where things will get
interesting. This will become zero, right? My mean vector literally becomes zero. Right? So
now let's look at the formula. Now, what is variance of my x I dash? I equals to one to n is
nothing, but it's the average I equals to one to n, right? Un transpose x I square. Now let's
write the actual mathematical objective that we want to find. Okay, so I'll write something
called an optimization function. We learn more about optimization little later and how to
solve optimization problems. Okay, for now, let's write it. Let me explain what it is right
now. So what do we want to find? We want to maximize the variance. What is variance
here? We want to maximize this term u one transpose x I square. This is nothing, but this is
a variance, right? Okay, we want to find u one because x I's already given to us. Our excise
are part of a data matrix. This is already given to us. What is it that we have to find? We have
to find u one. Okay, we have to find u one such that this is maximized. So this is the problem
that we have to solve. Again, sorry, again, I'm not able to draw it properly. Okay, let me just
put a bounding box around it. But there is one constraint here such that my u one is a unit
vector, which means u one transpose u equals to one which is nothing but norm of u one
square. Right? So let me reread this. Let me try to explain this step by step. It's always
important to understand optimization problems. We want to maximize the variance of x I
dash, right? This is nothing but, this is nothing but variance of x I dash, right? Since x I's
already given to us as part of a data matrix, we already have all the x I's, right? We want to
maximize the variance of x I dash. And we want to find u one. You want to find that
direction. We want to find u one or the direction that maximizes the variance. And here we
are putting a constraint. So this is called the objective, this is called the objective of an
optimization problem. Of an optimization problem. And this is called constraint. We won't
let u to be any value. We won't let u one be any value. We want u one to be a unit vector,
okay? This says that u one is a unit vector because we want it to represent a direction, right?
See, if I just leave u one to be anything, right? It can just make my u one vector infinity, isn't
it? If I make my u one vector infinity, comma infinity, or very large values, this will always
be maximal, right? Multiplying even a small value with a very large value will make it large,
isn't it? So, I don't want any u one. I want my u one to be a unit vector. This is called an
optimization problem. This is called an optimization problem. Okay, we will learn about
solving optimization problems later when we learn about, after we learn a couple of
machine learning techniques. And I promise you, I will teach you how to solve this problem.
We will revisit this optimization problem and solve it when we learn optimization. For now,
since I don't want to divert and go and teach you whole of optimization, just in the interest
of not diverting, I will solve this problem much more. I'll tell you the solution to this
problem without telling you how I got there. But you can trust me there. I promise you, I
will solve this problem for you later. When we learn about optimization, when we learn
some basic calculus, remember, I'll teach you the basics of what differentiation means. I
promise you we'll revisit this and solve this problem. But for a while, for the first few
techniques, what I'll do is I'll just explain you the mathematical optimization problem that
we are solving, right here. We are solving a mathematical optimization problem, how to
solve it, and all. I'll teach you later. Right now, I'll explain you what the problem is how to
solve it. I promise you, I'll surely revisit it. I'm sorry for having to repeat it so many times,
but I think if I don't tell you how to solve it, I'll not be doing justice to you. I don't want to
water down the math. That's what I told you multiple times. I really, really don't want to
water down the math. Okay? But right now, this is where we will stop and will directly go to
the solution to this, to this optimization problem. How we arrive at that solution, we will
learn in the optimization section.