(c) copyright Foundation Coalition (S. A. Fulling) 1998

Class 17.T

Differentials and Linear Approximations

Reading assignment for Tuesday, January 27

Stewart 2.9
This Web page
Stewart 2.10 (Newton's Method)

Note: Since the Web does not yet speak Greek easily, we will use the capital D for "delta", and the grade-school division sign, ÷, for "approximately equal".

The language of differentials

Today's topic is very important for understanding how derivatives are thought about and talked about in applications of calculus. It is a warmup for the even more important subject of Taylor series, which will occupy our attention for several weeks toward the end of this semester.

When the derivative was defined as

f '(a) = lim_x->a Dy/Dx,

we thought of the difference quotient on the right, Dy/Dx = [f(x) - f(a)]/(x - a) , as an approximation to the derivative on the left: taking x closer to a gives a better numerical estimate for f '(a). Today we are going to turn the equation around and think of the left side as an approximation to the right side! Then we can solve the approximate equation for Dy:

Dy ÷ f '(a) Dx.

Using the definition of the increments Dy and Dx, we can rewrite this equation further, as

f(x) ÷ f_approx(x) = f(a) + f '(a) (x - a).

From this point of view, the importance of the number f '(a) is that it tells us how to build a linear approximation to f(x) that is accurate for x near a. (f_approx is the function whose graph is the tangent line at a to the graph of f.)

In this context it is traditional to write "dy" for the linear approximation to Dy, and also to write "dx" for Dx, so that

dy = f '(a) dx

holds as an exact equation. In other words, "dx" is what Dx is called when you are announcing your intention to use a linear approximation. In practice, this often means doing a calculation in which you will throw away any (positive integral) powers of Dx that might arise. More precisely, you will ignore anything that vanishes faster than Dx as Dx goes to 0. (This means that it vanishes even after being divided by Dx. For instance, (Dx)³/Dx = (Dx)² -> 0 in that limit.) Then, in that approximation, Dy equals dy.

The quantities dx and dy are called differentials. Note that

dy/dx = f '(a).

Heretofore we have thought of the notation "dy/dx" as just another name for the derivative, f '. That is, it was to be broken up into d/dx, the operation of differentiation, acting on y, some function or physical quantity. Now we see that it can also be broken up literally as a fraction, dy divided by dx. Here dx is any (sufficiently small) change in x, amd dy is the corresponding change in y when you adopt the tangent-line approximation.

In fact, Leibnitz (the co-inventor of calculus) and other early mathematicians thought of dx and dy as "infinitesimal" numbers -- numbers so small that the approximate equation

Dy ÷ f '(a) Dx

somehow became exact (and the "D" shrank to "d" to indicate this smallness). Today we avoid this kind of talk in discussing mathematical fundamentals; we use the concept of limit instead. But thinking of f ' = dy/dx as a ratio of small changes is still very helpful in applying calculus.

Differentials in theoretical arguments in applications

The most important use of differentials is in discussions that are "applied" from the point of view of mathematics, but "theoretical" from the point of view of the discipline making the application. These are the sections of your science and engineering textbooks that derive the basic formulas and equations that are applied to concrete problems in the rest of the book. Here is a simple model of this type of argument:

Example: Find a formula for the rate of change of the area of a circle with respect to its radius.

Note: We are going to pretend, just for the sake of this example, that we don't know the formula for the derivative of x²; we will rederive that formula.

Derivation in the language of differentials (PDF version) <-- READ ME
Notice that the distinction between dA (the differential) and DA (the exact increment) has been deliberately blurred here. Scientists and engineers have been doing this successfully for several centuries, regardless of whether the mathematicians who taught them calculus told them not to do it.
Careful restatement in the language of difference quotients and limits (PDF version) <-- READ ME
The differential argument is just a shorthand for this one. Note, incidentally, that dA in this problem is the area of a straight strip whose length equals the circumference and whose width is dr.

Differentials in calculations in applications

In practice you would never differentiate the function x² as we did it in the example above, since you know the power rule already. The example was intended as a simple model of more serious problems, where one needs to differentiate unknown or arbitrary functions in the process of deriving a differential equation or other formula from physical principles. (See the chain-rule example below.)

In calculus courses a typical homework exercise on differentials requires one to estimate the change in some physical quantity, using differential notation. Here one is expected to evaluate the derivative in the ordinary way, using standard formulas.

Example: The radius of a circle changes from 5 cm to 5.2 cm. Estimate ("to first order") the change in the area of the circle.

Solution (PDF version) <-- READ ME

Differentials and the chain rule

The chain rule is very intuitive when expressed and "derived" in the language of differentials. Let's consider and generalize a classic related-rates problem:

Problem: Water flows into a conical tank at a rate of 3 cubic meters per minute. How fast is the water level rising when the height of the water in the tank is 2 meters? The tank is an inverted circular cone with base radius 2 meters and height 5 meters. (See Example 3, Sec. 2.8, of Stewart (ed. 3) for a sketch.)

Discussion: Today we are more interested in the generalities of this type of problem than in the details of this particular problem. Introduce notation for the variable quantities:

h = height of water
V = volume of water
t = time

Note that dV/dt is the rate at which water flows in, which is one of the givens of the problem. Also, after some geometry and algebra you can write a formula for V as a function of h, or vice versa (see p. 158 of Stewart (ed. 3) for the details). But here is the main point:

dh = (dh/dV) dV = (approximate) change in height when volume changes by dV.
dV = (dV/dt) dt = (approximate) change in volume in a small time interval dt.

Therefore, the (approximate) change in height in time dt is

dh = (dh/dV) (dV/dt) dt.

That is, the rate of change of height with respect to time is

dh/dt = (dh/dV) (dV/dt).

In the example problem, we are given enough information to calculate the two factors on the right side of this equation. More important is that the equation applies regardless of what the functions h = f(V) and V = g(t) are (so long as they are differentiable). It is an instance of the chain rule. In fact, our discussion is almost a proof of the chain rule; at least, it contains the intuitive idea of why the formula is true. (For a real proof we would need to keep track of the terms neglected in the approximate equations, and check that their contribution to the difference quotient approaches 0 as Dt -> 0. See Sec. 2.5 of Stewart (ed. 3).)

Experimental error

Let's go back once more to the circle problem. In practice the radius of a circle can be measured only to within a certain accuracy. A better measuring instrument may reduce the uncertainty in the radius, but it can never completely eliminate it. (In fact, any real circle is never a "perfect" circle, so the radius is not even defined to a precision beyond the scale of the irregularities in the curve, or the width of the pencil mark used to draw it.)

Suppose that a competent experimenter or field engineer has measured the radius to be 10.00 plus-or-minus 0.05 cm. This means that r may be as large as 10.05, or as small as 9.95, or anything in between. We say that r = 10.00 cm and dr, the error, is 0.05 cm. Then dA = 2[pi]r dr (which is approximately 3 cm²) is the estimated error or uncertainty in A. Since dr is rather small (compared to r), and all the quantities are uncertain anyway, we expect the difference between dA and DA to be unimportant in this context.

(How do we know that dr is "rather small"? What we really need for a linear approximation f_approx to be accurate is that the graph of the function f(x) be almost a straight line when viewed on the small scale set by the increment dx. When f is a power function, this will be true when dx << x. We will study more careful ways to judge the accuracy of such approximations later, in connection with Taylor series.)

The importance of a given error in A depends on the scale of the problem: An uncertainty dA = 3 cm² is not much when A ÷ 300 cm², but it would be a lot if A were 4 cm², and it would be utterly negligible if A were 10¹⁰ cm². Therefore, in many circumstances a more useful measure of the seriousness of the uncertainty is the so-called relative error, dA/A. In our example, the relative error in A is approximately 0.01, or 1%, while the relative error in r is half that. (Note that a relative error does not have physical dimensions, such as centimeters, but is stated as a pure number or a percentage.)