Let's define a generalized gradient function (the one built into *Mathematica* is strictly for functions of 3 variables):

(This makes a vector by differentiating by each element of the variableList successively. The ?VariableQ part forces you to enter the list of variables.)

The way you use this function is as follows:

Or, for a function of 3 variables:

Or, for that matter, you can use it on a function of 6 variables:

This makes it easier to set vector equations equal to each other and solve them:

(Hint: we just found the critical points of this function.)

To find the critical points of a function, you need to find all points where the gradient is equal to the zero vector or undefined. So, we can use the definition I made for grad earlier:

This is never undefined, so we just need to set each component equal to 0 and solve the system of equations:

There are two critical points for this function. (Notice the compact way we do this. You *could* type in the equations individually, but this is much nicer.) Also, we give the set of critical points a name (critPts) so we can refer back to them later and use them to calculate with. You can get *Mathematica* to pick out a single one of these points using the following notation:

(Yes, you have to use the double square brackets like this.)

Of course, now you have to decide whether each of these is a local minimum, maximum, saddle point, or none of these. One way to do this would be to graph the function around the critical points and see what it looks like:

Perhaps you can tell what we have here. If you were to "zoom in" on each critical point more closely (i.e., change your domain displayed), you could probably tell better. However, it is often easier to look at the contour plot instead:

Notice that this makes it much clearer what is going on. (It is often useful to use the Contours->50 or some large number so all minima/maxima get completely encircled.) So, the critical point with the positive x value is a minimum, while the one with the negative x value is a saddle point. (Be certain that you understand how to tell this from the graph before continuing.)

Let's use the 2 variable second derivative test to check this for sure. In doing this, I will demonstrate a few tricks you can do with *Mathematica* to streamline this process (cutting and pasting gets to be a big pain after awhile).

We could do this a couple of different ways. First, lets define D from the second derivative test:

Now, to plug the critical points in, we could cut and paste them (bad idea), or we could use the replacement operator /.

This replaces x with and y with . These are called replacement rules. However, if you notice, our list of critical points is already composed of a list of these replacement rules:

So, we could do the following:

The first value corresponds to the value at the first critical point and the second one to the second point. You can do the same thing to check :

We can handle all of our critical points simultaneously by doing things this way. Thus, the first critical point has a negative value for D, so it is a saddle point. The second critical point has a positive D and is positive, so it is a minimum.

Unfortunately, Solve doesn't always work to find the critical points (if your function is exponential or trigonometric, for example). In this case, there are two options.

The problem here is that, since it is based on Newton's method, you need an initial guess. This means two things: First, you need to look at a graph of the function (contour lines are good here) to make a reasonable guess. Second, you can only realistically find critical points in a particular region (not true if Solve works; you can find *all* critical points then). After finding each critical point, you still have to classify it (min/max/saddle point/neither).

This command is kind of like the FindRoot command (i.e., you must give it an initial guess), but it then automatically finds a nearby minimum of the function. This has the advantage that you know your answer will be a minimum. Of course, it won't help you find saddle points. You can find a maximum by using FindMinimum on , so the minimum of will be the maximum of .

Consider the function:

In other words, forget it. So, let's get a quick look at this near the origin:

It appears there is a minimum around (0.8,-1.5) and a maximum around (0,0.2) or so. Let's find them:

You can use the graph (or the second derivative test) to see what each of these is. If there were a saddle point, you could find it this way as well.

Or, using the FindMinimum command:

Notice that you use this command on , not on ∇*f* (and it gives you the value of the function at the point as an added bonus). Also, since there is only one minimum nearby, you don't have to be very accurate with your initial guess.

In order to find the maximum:

You would *not* be able to find saddle points reliably this way, however. (Sometimes it will find them, but not always.)

For each of the following, find the critical points, show them graphically on a contour plot (and 3-D if you wish), and classify them using the second derivative test as minima/maxima/saddle points/none of these. If possible, use Solve to find *all* critical points. If Solve doesn't work (or you don't get an answer back from it in a few minutes of waiting), then use FindRoot or FindMinimum to find all critical points on the region .

One of the important applications of optimizing a multi-variable function is in finding a function that fits a set of experimental data. This is useful in many fields, from applied science to economics. Most spreadsheet, like Excel, have a built-in function to fit data to certain functions (usually called "regression" or something similar). You can use *Mathematica* to fit your data to any function you wish, however.

So, first we need some data to work with (it is possible to read this in from an external file, if you save it as a text file):

We can use the ListPlot command to take a look at all this data:

This looks basically linear, so let's see if we can find the "best" linear fit for this data. (The deviations from a linear graph could be due to experimental error, chance, or other complicating factors.) There are different definitions of what line/curve would best fit this data, but the one we will use is called the "least square fit". The idea is to define a function with appropriate unknown parameters:

where *a* and *b* are the parameters you are trying to find.

Now, at each point we want to examine the "residual", which is the difference between *f(x)* and the actual *y* value of the data. This tells you how far the actual data point is from the line. So, for example, at *x=1*, the residual would be , which is a function of the parameters *a* and *b*. You could then add together the residuals for *all* the points. This would give you the total amount all the data points differ from a given line. For the "best" line, you would like this total of the residuals to be as small as possible (so it is the "closest" line to as many points as possible).

Unfortunately, since some points are above the line and some are below it, some of their residuals will cancel out (some are positive and some negative). We can fix this by squaring each residual to make it positive. If we then minimize the sum of these squares of the residuals, we will find the line that is the "closest" to the most points. This is called the "method of least squares".

So, to actually set this up, we could do the following:

In this function, dataSet[[j,1]] refers to the *x* coordinate of the point and dataSet[[j,2]] refers to the *y* coordinate. (howManyPts tells you exactly what it says...)

Now, sumOfSquares is a function of two variables *a* and *b*, which we want to minimize. So we can simply find the critical points and decide whether they are minima or not.

Since there is only one critical point, you should be able to convince yourself that this is indeed a minimum by any of the methods mentioned above (or by simple logic; it can't be a maximum since you can always make the line as far from the data as you want). The best fit line is therefore:

Let's plot both the data and the line we just found together on the same graph:

If your data doesn't look like it is linear, then you need to decide what kind of function it *does* look like. Then, use the formula for that type of function with appropriate parameters and use the same process on it (using a different function besides linearFit).

Here are some typical functions you might use:

On this last one, you may assume that and a, b, c>0.

When finding the critical points, if Solve doesn't work, then you are forced to use FindMinimum (or FindRoot, though FindMinimum would probably be better here). For the functions with 3 (or more) parameters, it can be tricky to pick a starting guess. Sometimes it is useful to just pick anything reasonable for a starting guess (thinking about x or y-intercepts and any asymptotes, for example). If *Mathematica* gives you an error message saying something like "Machine precision is insufficient to achieve the requested accuracy or precision.", then take the answers it gives you for this and use *those* numbers for your initial guess instead.

Fortunately, the method of Lagrange multipliers can be relatively easy to implement using *Mathematica*. For example, if you need to optimize subject to the contraint that , you could set it up as follows:

The //*Flatten* command at the end is necessary here for the Solve command to work properly with this. It strips out the extra {}'s you would get otherwise. Now you can use the Solve command to solve the system (even though you don't care what λ equals, you should still solve for it):

Notice that, again, we can give a name to the entire *set* of solutions to this sytem. This way, we can work with all of them at one time, or pick a single one out to work with.

Now, we need to find which of these is the minimum and which is the maximum:

(This plugs all of the solution points into *f* and displays the results.) So, the minimum value is 0 (which occurs at the first 4 solution points) and the maximum value is 12 (which occurs at the last 4 solution points).

So, what do you do when the Solve command doesn't work here? Not surprisingly, you use the FindRoot command. This is both easier and harder than it is in the case of finding critical points. To do this, you *must* display a contour plot and a graph of the contraint equation on the same axes. So, let's consider optimizing subject to .

Well, I left this running for over 2 hours and it still hadn't come up with a solution, so it's time to look at the FindRoot function. However, to use it, we need an initial guess. This is trickier than it was with finding critical points. Remember, what you are looking for is where the gradients of *f* and *g* are parallel. An easier thing to look for, however, is where the contour curves are "parallel" (which is equivalent, if you think about it).

So, first, let's graph the contour plot for *f *on the same graph with *g* (not contour curves):

Notice that we don't have to see all of *f*; we only need to see the region around the constraint curve *g*. According to this graph, it looks like there are probably two points where the contour curves run parallel to *g*: near (0.6,-0.4) and near (-0.6,-0.2). Let's use those as our initial guesses for FindRoot. Since we have no idea what λ equals for those points (and we don't really care), we will just guess that λ = 0.

Now let's check the other one:

Therefore, solution1 is the minimum and solution2 is the maximum. (There are fancier ways to check both of these at the same time, but I am trying to keep things simple here.)

The problem with this method is that it isn't always easy to find *all* the places where the curves are parallel.

For each of the following, find the minimum and maximum values of the function *f* subject to the given restriction. If *f* is a function of two variables, plot its contours and the constraint on the same graph. If Solve doesn't work (or you don't get an answer back from it in a few minutes of waiting), then use FindRoot instead, using your graph to pick your initial guesses. Hint: If you have a function of more than two variables, you treat it exactly the same as a function of two variables (our grad function can handle this, as long as you remember to give it a list of *all* the variables), except you have to adjust all the variable lists appropriately. (Also, really hope that Solve works on these. It would be very hard to find them one at a time with FindRoot.)

subject to

subject to

Converted by