Or perhaps this angle:
Suppose you fit one of your methods to the ozone data. Before going into model choice: what possible purpose would be served by fitting a function through the data?
Picking a curve fit method seems to involve a combination of:1. Selecting the category of the approximating curve (e.g., a global polynomial, chain of splines, sinusoids, wavelets, multi-population model with independent curves, etc., etc
So, what's the answer? What are the criteria that lead to our choosing method 1 over method 2?
The probable reason for using global polynomials is an appeal to Taylor series and their ability to fit a great many types of functions (but not all!); computational simplicity; internal encoding of the derivatives of the curve; and familiarity to those who know basic maths (algebra & calculus).Let me try in this way by a question: what is the rationale/reason for using global polynomials in AI? I did look in a few books but no reason was given. I suppose someone can come out and give an answer.
Global polynomials are not used in most applications of numerical analysis these days AFAIK. It is pre-1960s technique. In fairness, maybe AI is dealing with other issues.
That remark about using a 300-degree polynomial was hilarious. I hope you were joking. Was the example taken from Geron's book?
How out-of-sample testing could reduce overfitting? TBH, I cannot see its any practical value for testing data analysis models: if my samples are representative, out-of-sample testing will be positive; if they are atypical, it will be negative. It says more about the data than my model.Excellent points!
Yet the human eye (and brain) is among the worst overfitters in history. The penchant for overfitting seems to be one of the best and worst qualities of human cognition. People assume there is no noise -- there's only structure as created by physical phenomena or gods (and even gods don't play dice). It's both the driver of science and superstition. There's even math (Ramsey theory) to prove that patterns are guaranteed to occur where none exist.
You are right that naive applications of ML (or any sufficiently combinatoric statistical method) will overfit badly. The challenge is in adding additional methods that characterize the chance of an overfit, restrict the original process to a modest M-tries to fit N data points, or construct a prudent amount of out-of-sample testing.
Personally, I'd think that quantum computing will overfit even more so than traditional computing in that it potentially calculates something on every possible superimposed state. How would a D-wave machine handle accidental coincidence?