October 16th, 2017, 1:32 pm
It seems like there are three problems here:
1. Selecting the category of the approximating curve (e.g., a global polynomial, chain of splines, sinusoids, wavelets, multi-population model with independent curves, etc., etc.)
2. Estimating N: the appropriate number of terms, modes, segments, or model complexity within the selected category of model.
3. Estimating the "best fit" values of O(N) coefficients for the selected number of terms for the selected category of model.
The "overfitting" issue seems to be primarily caused by #2 in which insufficient N fails to capture valid structure in the data but excessive N is fitting to the noise. Yet it's coupled to the choices in problems #1 and #3. First, for example, it may take a large number of polynomial terms to get a good fit for a simple sinusoidal data set. Second, the coefficient estimator's treatment of noise or outliers can modulate the degree of overfitting if an excessive N is chosen (e.g., a robust outlier-ignoring estimator might avoid overfitting even if too high an N is chosen).
BTW, that "overfit" example graph may actually be the best fit curve in some scenarios. For example, in many astronomical data fitting problems there's strong evidence for (or expectation of) periodic phenomena which may be extremely sparsely sampled by the data (e.g., an object might orbit every few months but be sparsely sampled by telescope every few years). A astronomical dataset of only a half dozen samples might correctly resolve to a orbit estimate that has 20 "wiggles" during the sampling period. Solving problem #1 depends on metadata.