Suppose that I have a model for implied volatility surface and want to figure out required recalibration frequency based on historical quotes. Since I have a large range of strikes and tenors over a long period of time I need to somehow automate this process, i.e. I need a computable metric rather than "ahh it seems pretty close to market".
What kind of statistical metric can I use for that purpose? I'm thinking about the mean of percentage differences between market and model quotes, i.e. the mean value of
$$100\cdot\frac{\sigma^{market}-\sigma^{model}}{\sigma^{market}}$$
over the entire volatility surface, however the mean over the entire surface can be quite misleading as it will not capture large single outliers on a big enough surface and will cancel out differences with similar magnitude but opposite signs. Nevertheless I can't see a better single metric to assess an overall surface fit.
How much sense does an average percentage difference over the entire surface make to assess the quality of a fit? Is there a better metric?