My work deals with approximating complicated functions f(x) by simpler ones. You might think that the goal was just to match the data accurately. But no, the deeper aim is to obtain a model that can predict how f will behave at other values of x, not yet seen. For this purpose, simple approximations do better than complicated ones. And since optimality means maximal simplicity by some measure, it follows that optimal approximants are better than suboptimal ones not just because they are more efficient at capturing the data.
The classical example goes by the name of the Runge phenomenon. A polynomial that interpolates the data exactly may be useless between the sample points. It has merely fit the data, not understood it. In the current era, we know that machine learning algorithms must avoid “overfitting”.
Henceforth this will be my take on Occam’s razor. Complexity does not just make a model inelegant. It lessens its power.
[22 November 2025]