The proof of the single-variable chain rule below is based on one given by Prof. Frederick Fong in his MATH1023 lecture notes, taught at HKUST. Before reading this post, it is strongly recommended to have a read at a proof of the chain rule yourself.

The chain rule is one of the theorems in calculus that was not treated properly in a standard university course. It is one of the more subtle proofs in a rigorous calculus course.

The statement: For $g(x)$ differentiable at $x=a$, and $f(y)$ differentiable at $y=g(a)$, we have $(f\circ g)(x)$ differentiable at $x=a$ where $(f\circ g)’(a) = f’(g(a))g’(a)$.

Consider the linear approximations \(g(x) = g(a) + g'(a)(x-a) + h(x),\) \(f(y) = f(g(a)) + f'(g(a))(y-g(a))+k(x),\) where $h(x) \in o(x-a)$ as $x\to a$ and $k(y) \in o(y-g(a))$ as $y\to g(a)$.

In the proof, we obtain that for any $\epsilon > 0$, there exists $\delta > 0$ such that $0 < |y-g(a)| < \delta$ implies that $|k(y)| < \epsilon |y-g(a)|$. Notice that we do not have any information on what happens when $y=g(a)$. But there is no guarantee that $y \neq g(a)$ for $x$ close enough to $a$!. In fact if your $g(x)=x$, it is always equal. Therefore it is not possible for us to use this directly.

To find out what happens, we directly substitute into the linear approximation. This is a recurring theme in mathematics. When things are not nice, you have to be a bit fiddly. Such a phenomenon is similar to the first derivative test being inconclusive in detecting extrema on a closed interval: you need to manually check the value at the endpoints because the derivative does not exist.

We obtain $k(g(a)) = 0$. Then the proof is routine and less technical.