Tests for normality?

Collector · February 7th, 2002, 3:50 am

I am sure there must be a lot of different tests developed to test data for normality.

What methods are there out there ? Kolmogorov-Smirnov test, chi squared
goodness of fit tests, Shapiro-Wilk statistic, Bowman-Shelton, Jarque-Bera.... ??

What are the pros and cons for the different methods? How good are quick and simple tests like the Bowman-Shelton or Jarque-Bera?

Vincent · February 7th, 2002, 7:09 am

Some statistic softwares have methods to test normality. e.g minitab

spartak · February 7th, 2002, 9:32 am

I often use SAS which simultaneously calculates value and p-value for four test statistics: Shapiro-Wilk, Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling. As I remember, I've never met a situation when for a given level of confidence (95%) one or more tests indicate that data is not normally distributed while others indicate the opposite.

As I know, Kolmogorov-Smirnov and others chi-squared based statistics are asymptotical tests. This means that they work well when sample is sufficiently large, say n => 100.

spartak · February 7th, 2002, 11:08 am

I've found something about these tests in the SAS system help.

The Shapiro-Wilk statistic is the ratio of the best estimator of the variance (based on the square of a linear combination of the order statistics) to the usual corrected sum of squares estimator of the variance. W must be greater than zero and less than or equal to one, with small values of W leading to rejection of the null hypothesis of normality. Note that the distribution of W is highly skewed. Seemingly large values of W (such as 0.90) may be considered small and lead to the rejection of the null hypothesis. The W statistic is computed when the sample size is less than or equal to 2000.

The Kolmogorov statistic assesses the discrepancy between the empirical distribution and the estimated hypothesized distribution. For a test of normality, the hypothesized distribution is a normal distribution function with parameters and estimated by the sample mean and standard deviation.

The Cramer-von Mises statistic (W^2) is defined as (there comes an ugly integral)

The Anderson-Darling statistic (A^2) is defined as (there comes an ugly integral)

It seems that SAS relies more on the Anderson-Darling statistic, since, by default, all of automatic selection options use this test to accept or reject some decisions about a distribution.

matthewcroberts · February 8th, 2002, 2:54 pm

Collector,

Just a few thoughts, first if you want a thorough answer, post this question on Usenet at sci.stat.math. All of the tests mentioned by you & spartak are aymptotic tests; there is no such thing as a finite-sample distribution test. The difference is that the different tests have different levels of power; i.e. probabilities of rejecting the normal when true and failing to reject when false. I can't remember all of the pros & cons of the tests (there have been entire dissertations written on the subject, sci.stat.math will be excellent for this) but I rarely see Bera-Jarque tests used, they have very low power, much better tests are Kolmogorov and Anderson-Darling statistics.

My own $0.02: testing for normality is fraught with peril, proceed with great caution.

What are you trying to do?

matt.

Aaron · February 8th, 2002, 3:23 pm

I echo matthewcroberts points. I know of no reason to ever test for normality. What you want to test for is specific types of deviations that will invalidate your methods. In some cases skewness is most important, in other cases kurtosis and in the multivariate case it is often true that non-normality in the marginals is less important than multivariate deviations.

The first question I always ask is are you testing for normality because you hope to find it or you hope not to find it? The first case is generally that you have a technique that is optimal for normal data, and you want to make sure its safe to use. The second case is that you want to learn something from your data, and deviations from normality tell you something. Typically it is applied to residuals from prior modeling, and you want to know if you have extracted all information and reduced your data to white noise (the effect of a large number of independent factors, none of which is individually important). Although this is logically suspect, there can be important information in normal variates, and there is certainly lots of worthless non-normal data, it works surprisingly often.

In the first case, the common statistical tools come with normality tests tailored to their needs. If your tool does not have one, and it's too complicated for analysis, a bootstrap is a good general approach. Resampled normal data are also normal, so if you put them through your tool the distribution of answers should have the theoretical standard deviation computed for normal data. If it is significantly different, your data are not normal enough to use. This is simple, requires no math, and works for any distribution or tool.

In the second case, you are looking for exploitable deviations. In finance, for example, it's often interesting to compute the dollar value of knowing the exact deviations from normality in your data. For stock returns, for example, assume you create a zero-premium portfolio by buying and selling call options at different exercise prices, using the BS price from the standard deviation, but collecting payoffs based on a random draw from your data. If the best portfolio has an expected return more than a few percent of the standard deviation of your position, there are economically important non-normalities in your data. Otherwise, no.