fixed-length rolling window median calculation

ebenezer · April 10th, 2009, 9:10 pm

Given a data stream, with each incoming data, we want to calculate q% quantile for the previous N data points. Both q and N are fixed. The naive calculation will require o(log(N)). Is it possible to calculate the quantile in o(1) time? I suspect that it is impossible. But I cannot come with a proof at this moment. Thanks.

ebenezer · April 11th, 2009, 2:56 pm

Nobody has any suggestion? How about just finding median with fixed rolling window? Can you do so in o(n) time? You can use whatever data structure.

Traden4Alpha · April 11th, 2009, 3:31 pm

The most basic algorithm would use a bisection search (O(log(N)) to find/insert the next incoming data value into the rolling sorted list of N data points. Processes to: 1) cull the out-going out-of-window value (via a rolling window of time-ordered pointers) and 2) decide the median value (using a bit of if-then logic with the prior median value, its neighbors, and the incoming value) would be O(1).You might find some very clever ways to progressively cull data points from the rolling window of values that are still in-window but that can't possibly ever become the sought value. For example, if a value is 2-steps away from being out-of-window and 3 steps away from the current median value, then that data point can never become the median value and can be removed. Thus, one could cull the dataset to M < N values for a O(log(M)) process. On I.I.D random data, this trick will cull an average of half the data-points. But this trick does not guarantee better than O(log(N)) time because one can construct pathological cases that require retaining all N data point in the window. Whether the overhead is worth it is another issue. (NOTE: this culling logic can be adapted to the q% quantile case.)