Transforming data in order to forecast

wuf · May 21st, 2004, 5:59 pm

I would like to do some forcasting of data using ARIMA models, however, the raw data set is not normally distributed. Simply plotting the data shows a slight trend which can be easily removed; however, the data also shows a nonstationary variance. From various sources, I have found transformations using logarithms and exponentials to make the data more normally distributed; however, the data set I am working with has negative values which precludes the usage of logarithms and powers transformation as complex numbers are returned.Does anybody have any suggestions as to how to go about transforming the raw data (with positive and negative values) into a more normal distribution such that ARMIA models can be applied? If there are any other suggestions about using forecasting models not under the constraint of a normal distribution, I am welcome to those also.ThanksFrank

tigerbill · May 22nd, 2004, 9:24 am

how about taking the first difference?

Boundary · May 23rd, 2004, 8:41 am

Wuf,Why don't you try the first difference, or try doing the LN first and then the first difference.Then run the distribution tests...

chiral3 · May 23rd, 2004, 7:08 pm

Recenter by subtracting the mean off.If you have your heart set on some Cox/Box type transform you could probably add 1 + |largest neg value| to everything. This would make your whole set >= 1 and you could take logs.

wuf · May 24th, 2004, 1:21 pm

Thanks everyone for all the replies. I have actually tried a number of the techniques you all suggested.First order difference: I've read this is really effective at detrending a nonstationary mean. The data set I am working with has a slight, visually, decrease in the mean, but nothing a linear fit could not remove. Or, taking the 1st order difference also makes the mean closer to 0 but the nonstationary variance still exists and is the more problematic issue.Log Type Transforms:First, the data set has negative numbers so in order to do so, we had to make all the numbers positive by adding the smallest number plus some fraction (not exactly 1). This helps, but our values are not spread out enough to make the transformation very effective.Box-Cox Transformations:This is the latest type of transform I'm trying. The general form is of the type y[t]=(x[t]^lambda -1) / lambda for lambda != 0. I still have to play around with lambda in order to get a more normally distributed transformed data set. Of course, I still have to add some offset in order to avoid imaginary numbers after the tansformation. I'm still trying to work on this presently.Ultimately, we want to do forecasting of the dataset. All the time series book I've looked at use ARIMA type models on normally distributed data. Any suggestions about other type of forecasting methods? Or more suggestions of data transformations would be helpful also.Thanks a log guys.Frank

Zed · May 24th, 2004, 1:45 pm

If your problem is just that the data is heteroscedastic, then why don't you try GARCH (which can be extend to non-normal distribution as well)?Anyway, how non-normal is your data? Do you think it will pose a serious problem? If it is from some 'nasty' distribution you might need to re-think your approach...

jeanuff · May 25th, 2004, 1:53 pm

Zed,That sounds like overkill to me. Someone is trying to model and forecast y, and you suggest starting by modeling the var(y) (or y^2). Okay, then how do they get back to modeling y after the GARCH is done? I'm not sure, but here's a guess: run through the time series, forecast var(y) for each time t, and divide y(t) by that forecast to make the variance constant, namely all roughly equal to 1. Once that variance has been divided out, they can finally get back to modeling y, except that they've already fitted at least one-- more likely several-- parameters to the data, reducing the statistical significance of whatever they end up finding for y. Here's another thought: Why not just divide by the actual var(y) at each t rather than the forecast? This "actual variance" could be calculated with a moving window, for example, use the previous 3 days and next 3 days for a window width of 7. On the down side, the moving window results in even MORE estimated parameters! If there are 700 data points, then a window width of 7 results in estimating 100 variances!Zed, can you cite a paper that first estimates a GARCH in order to forecast the underlying variable itself?Jean