Introduction
In this post we introduce two important concepts in multivariate calculus: the gradient vector and the directional derivative. These both extend the idea of the derivative of a function of one variable, each in a different way. The aim of this post is to clarify what these concepts are, how they differ and show that the directional derivative is maximised in the direction of the gradient vector.
The gradient vector
The gradient vector, is, simply, a vector of partial derivatives. So to find this, we can 1) find the partial derivatives 2) put them into a vector. So far so good. Let’s start this on some familiar territory: a function of 2 variables.
That is, let be a function of 2 variables, x,y. Then the gradient vector can be written as:
For a more tangible example, let , then:
So far, so good. Now we can generalise this for a function taking in a vector . Our gradient vector is now a vector of length and is written in the expected way:
When we evaluate this vector at a point, we get a vector pointing in the direction of the rates of change of the function with respect to the input vector .
The directional derivative
The most important thing to note about the directional derivative is that it is a scalar whereas the gradient is a vector. The directional derivative of a differentiable function is defined as the dot product between and a unit vector, :
and it tells us, at any given point, the rate of change of in the direction .
That is to say, if we start at some values of our input variables and change them in some way, how much does the value of change?
Let’s see this in action with our initial example. Let’s say we are at the point .
Now let’s see how fast we are moving in each direction. First we want to see the rate of change when we change only x and leave y constant i.e. along . We evaluate:
and get . This is clearly the same as the value of the partial derivative with respect to x. Similarly if we multiplied the gradient into , we would get 4.
What about if we vary x and y the same amount? Then we would be taking the dot product of our gradient with a unit vector: . Taking the dot product and evaluating this we now have:
which evaluates to
So this would be the rate of change at the point if we moved in both the x and y directions at the same rate. Loosely speaking, the numerator is made up of the 10 from going in the direction of x, the 4 is made up of going in the direction of y and the denominator is a normalising constant that reduces the value of the derivative to compensate for the fact that we are now going in 2 different directions (because for a sensible comparison of the rates of change in different directions we’d the same distance in each direction- it is easily checked that the size of the proposed vector is 1).
Direction of maximum change
Great, so now we have our gradient and we know how to find how fast the function is changing at any point in any direction. But what if we want to move in the direction that increases the value of the function as quickly as possible. That is, we want to find a vector such that is maximised subject to .
We can write this as a system of equations, where we find the maximum of subject to . But we won’t do this- as it is an unnecessary amount of work to do- however anyone up for some Lagrangian optimisation should give it a try in the 2 variable case with the given example.
Instead we will look at the dot product definition:
where is the angle between the two vectors. This is maximised when (as is a bounded between -1 and 1) and therefore . If we think about as the angle between the 2 vectors, when the angle is 0 the vectors point in the same direction.
Therefore the direction of the maximum rate of change of the function is given by the direction of the gradient vector i.e. .
This is can be a very useful fact to know when optimising functions and is exploited extensively in numerical optimisations of non-convex functions.
Summary
Hopefully it is now clear that what the gradient vector is, how to find a directional derivative in any direction and that the gradient vector gives us the direction of maximum change for a given function.
Leave a Reply