November 21st, 2006, 4:12 am
You are given some multinomial distribution over N events. That is, you are given p(1), p(2), ..., p(N), where the sum over "i" of the p(i) is 1and all of the p(i) are real and non-negative.The goal is to write a fast algorithm to draw an event from the distribution. For example, you may want to create long sequences of IID events drawn from this distribution, and you need each draw to be very fast, even though N is large. Note that the values p(i) are REAL [yes this matters]. Also, although N is large, N items fit in memory. N^2 might not... An O(1) algorithm would be cool...I don't know one. An O(log N) would be good too. I think an O(N) algorithm might be a good start to help understand the problem, but maybe isn't a final solution.Additional: Does the solution change if the all of the p(i) are "discretized" as p(1) = x1 / a , p(2) = x2 / a, p(3) = x3 / a, ...Notes:This problem came up for me in modelling disk traces. Some disk blocks are popular, some aren't. I wanted to be able to create synthetic disk traces from the observed distribution over blocks, and thus I needed to draw many many times from a distribution telling how popular each block is. Do a search for "disk traffic modelling" in google, click on first result (whoo hoo my paper), if you care.Also, I guess I'm looking for good intuitive answers. Referring to complex algorithms in published papers is cool and informative, but it might not slide in an interview.