In order to apply bootstrap (statistically speaking) methodology, do we need to assume that the observations are iid ? Can we perform bootstrap even if the observations are correlated ? How can we correct the bias.Tx in advance !!

- ClosetChartist
**Posts:**190**Joined:**

No, you don't necessarily require iid observations but you are likely to run into consistency problems. A better approach for these problems is subsampling. I recommend "Subsampling" by Politis, Romano, and Wolf as a rigorous primer.By the way... find any takers on your proposition?

QuoteOriginally posted by: SPAAGGIn order to apply bootstrap (statistically speaking) methodology, do we need to assume that the observations are iid ? Can we perform bootstrap even if the observations are correlated ? How can we correct the bias.Tx in advance !!i am sure that to make inferences on the bootstrap estimator there would be some implicit assumption of IID, but i don't have my efron here to confirm this...On the second remark well you don't want to arbitrarily truncate a timeseries when estimating things like volatility, as we know, its a "stylized" fact that volatility exhibits clustering [positive autocorr]You will lose some of that structure while bootstrapping or using most other resampling procedures. Particularly a Jackknife bootstrap is to be abhored furtherI don't know of any way of avoiding this than to take a longer sized sub-sample in the bootstrap estimation.But that defeats the whole idea no?. [edited to fix a typo]

Last edited by quantie on March 21st, 2004, 11:00 pm, edited 1 time in total.

Lets say you are using bootstrap techniques to get quantiles: you have an estimation procedure for some statistic but you want to use boostrap techniques to get a confidence interval for your estimator.You do not only need the iid assumption, but you also need to assume that the pivotal statistic you get from your estimation procedure converges to the usual normal distribution. pivotal statistic means the usual 1/sqrt(n)(estimated(X)-real(X)) or whatever....Why would you like to use the boostrap here : you have an estimation procedure but you do not have the asymptotic variance of you estimator, so you need the bootstrap to get a confidence interval.You also need one or two technical assumptions that I cannot be bothered to remember now.Hope this helps

- ScilabGuru
**Posts:**297**Joined:**

One has to specify what he means by "bootsrap" . Generally it means filling by simulation some sample holes based on your model. Simply speaking:1. You have real data "y" and model y=f(x,theta,eps) depending on some parameters and noise assumptions2. You fit your parameters via your criterion i.e. MLNow you want to understand how really your model describes the data y?You simulate the data via model f(x,theta) and try to fit it again. In such way you can get some estimate for your parameters erros. There is nothing here about i.i.d. Everything is specified by your model. Of course the standard assumptionis thta eps is i.i.d, but your returns can be correlatiobe etc

The standard bootstrap requires the assumption that the observations are i.i.d.There have been many attempts to extend the bootstrap to correlated data, none of them are satisfactory in general, although some work well in specific cases. What sort of correlation do you have? Autocorrelation? Multivariate data with correlations among the individual elements?

Last edited by Aaron on March 24th, 2004, 11:00 pm, edited 1 time in total.

QuoteOriginally posted by: AaronThe standard bootstrap requires only the assumption that the observations are i.i.d.There have been many attempts to extend the bootstrap to correlated data, none of them are satisfactory in general, although some work well in specific cases. What sort of correlation do you have? Autocorrelation? Multivariate data with correlations among the individual elements?Probably both...

Okay, let's take them one at a time as they present different types of problems.Loosely speaking, the bootstrap works because each resample is equally likely as the original sample. To take the discrete case, suppose I offer you a bet. We will take two observations from an unknown i.i.d. distribution, x and y. We will then continue sampling from that distribution until one of the four pairs, x,x; x,y; y,x; or y,y appears. There is one chance in four of any of those winning.Now, clearly this is not true for autocorrelated data. x,y will be more likely than the other three.One solution is to transform the data to remove the autocorrelation. Differencing will sometimes do that. In general, you could fit a time series model to the data, ARIMA or GARCH for example, then bootstrap the residuals. That goes against the non-parametric spirit of the bootstrap and introduces some theoretical difficulties (even if your model is perfect and the true residuals are i.i.d., the fitted residuals will not be). But it can give reasonable results.Another approach is to use subsamples as ClosetChartist suggested. Suppose you have N points. The simplest subsample technique is to compute your statistic on the k subsamples of length N - k + 1 for some k (k=30 might be a good choice for a well-behaved statistic), the first starting with the first point, the second starting with the second point and so on up to the kth starting at the kth point; each subsample just running forward from the starting point. You then have to adjust your distribution for the larger sample size (N, instead of N - k + 1). You can do that either by assumption (multiply by [(N - k + 1)/N]^0.5 for standard deviation for example) or by using different k's and extrapolating.Or you can ignore the autocorrelation. If your statistic does not depend on the short-term order of the points, and the series is stationary, that can work pretty well. Or not.The problem with multivariate samples with internal correlation is different. For example, suppose we have N draws of x and e, each from its own i.i.d. distribution, but the data is reported to us only as x,y with y = b*x + e and we don't know b. x and y will be correlated. The bootstrap resamples will still be equally likely in the sense above, but many of the statistics we want to compute will not bootstrap well. This is increasingly true as the dimensionality increases.An example of a statistic that bootstraps badly is the number of repeated data points in the sample. For a continuous distribution, this statistic is zero with probability one. But all the resamples except the original sample have non-zero values for it. Many statistics on high-dimensional data do not bootstrap well, although the reason is less obvious.One solution is to reduce the dimensionality, say by principal components or by fitting a model. This has the disadvantages above, although it can work. Subsampling will not help. If you can assume that your variables are all the same in some sense, say a vector of returns on different stocks for the same time period, you can compute a cross-sectional statistic first, then bootstrap in one dimension.

GZIP: On