Algorithms - Drawing from a multi-nomial distribution

cdmurray80 · November 21st, 2006, 4:12 am

You are given some multinomial distribution over N events. That is, you are given p(1), p(2), ..., p(N), where the sum over "i" of the p(i) is 1and all of the p(i) are real and non-negative.The goal is to write a fast algorithm to draw an event from the distribution. For example, you may want to create long sequences of IID events drawn from this distribution, and you need each draw to be very fast, even though N is large. Note that the values p(i) are REAL [yes this matters]. Also, although N is large, N items fit in memory. N^2 might not... An O(1) algorithm would be cool...I don't know one. An O(log N) would be good too. I think an O(N) algorithm might be a good start to help understand the problem, but maybe isn't a final solution.Additional: Does the solution change if the all of the p(i) are "discretized" as p(1) = x1 / a , p(2) = x2 / a, p(3) = x3 / a, ...Notes:This problem came up for me in modelling disk traces. Some disk blocks are popular, some aren't. I wanted to be able to create synthetic disk traces from the observed distribution over blocks, and thus I needed to draw many many times from a distribution telling how popular each block is. Do a search for "disk traffic modelling" in google, click on first result (whoo hoo my paper), if you care.Also, I guess I'm looking for good intuitive answers. Referring to complex algorithms in published papers is cool and informative, but it might not slide in an interview.

mhughes · November 21st, 2006, 2:04 pm

Well, it seems to me that O(log N) shouldn't be too difficult. Just store the cumulative distribution function in a binary tree (i.e. cdf(k) is the sum of p(i) for all i <= k). Then get a random number p between 0 and 1, and do a binary search for it in the cdf binary tree to find the k such that cdf(k-1) <= p < cdf(k), and k is your randomly chosen event (note that cdf(0) = 0). The cdf is O(N) space and the binary search is O(log N) time.

cdmurray80 · November 21st, 2006, 3:12 pm

Beautiful, that's what I was looking for! Presumably, after calculating the CDF, you can build the tree so its balanced: i.e. event N/2 is the root, events N/4 and 3N/4 are its children, etcI will post a second version of this problem that is a little harder in a new threadany solution to the additional problem??? looking for O(1)

cdmurray80 · November 29th, 2006, 1:31 am

Anyone wanna take a shot at the "discretized" version: p(1) = x1 / a , p(2) = x2 / a, p(3) = x3 / a, ... and the parameter "a" fits in memoryO(1) is possible

peterw · December 3rd, 2006, 11:59 am

if you want a large number of IID samples, Walker alias 'rejection sampling' is hard to beat. Sure, you have to set up a setup a table (IIRC O(N)) but it's O(1) then on

cdmurray80 · December 3rd, 2006, 11:47 pm

That's cool, I've never heard of that. Can you give a short description, or do I need to refer to a paper??

Aaron · December 3rd, 2006, 11:54 pm

Don't consider only the order of time increase. If a few of the p(i) make up most of the probability, you might be better off with a two-tier approach that quickly gets you the answer most of the time, at the small cost of a small add-on in rare cases.It's also worth considering if you need to simulate each event in order. If instead you only cared about, say, the number of each events over a million samples, there are faster ways of getting it.

cdmurray80 · December 4th, 2006, 12:33 am

Aaron, that's pretty interesting. That's one way to take advantage of special problem structure. Here's anotherIF you can approximate the probabilities with rational numbers, and there is some constant integer a, and "a" items fit in memory, then there are integers x1 ... xN so thatp(1) ~= x1 / a , p(2) ~= x2 / a, p(3) != x3 / a, ...Now (1) allocate an array with N items (2) fill in the first x1 entries with ones, the next x2 entries with twos, ... (3) draw random uniform from the arrayThis is O(1)...you get a lot of cache misses but still should be fastI'm interested to hear about Walker alias 'rejection sampling' ...

Aaron · December 4th, 2006, 4:56 pm

One way to exploit the rejection idea is to use your method with a smaller a, but reject some samples.For example, suppose you had three values: 1 if x < 1/pi, 2 if 1/pi <=x < 1/e and 3 if x > 1/e. You implement your method using 318/999 and 368/999. But if you draw 318, 369 or 999; you have to throw away some of the samples. For example, since 1/pi = 317.9918/999, if you draw 318, you must throw away with probability 0.0192. This algorithm is O(1) and can use far less memory than the exact version of your method. A small fraction of the time you need to do an extra selection step, and a fraction of those times, you need to throw away the sample.

cdmurray80 · December 4th, 2006, 10:46 pm

That's cool. That was suggested to me by my graduate algorithms teaching assistant, and he knows his stuff!For a slightly harder problem, try my other post "Drawing from a multi-nomial distribution...part II"I think Aaron and Peterw basically gave optimal answers to this question...thank you!!