random number sampling from discrete distributions

PointerLover · March 20th, 2006, 3:47 pm

Hi,for my thesis, I'm currently working on a C++ program for evaluating options allowing for non-normal underlying yield distributions.My problem is as follows: amongst other methods, I want to be able to use MC simulations. Thus I need the inverse cumulative distribution for coverting my 0-1-samples to the desired density. In case of discrete distributions, I tried to sum up all the probabilities equal or below the current point and then use interpolation to get the values inbetween. An easy example for the binomial distribution b(0.5, 4):[x-value -> pdf -> cdf]-2.00 -> 0.0625 -> 0.0625-1.00 -> 0.2500 -> 0.31250.00 -> 0.3750 -> 0.68751.00 -> 0.2500 -> 0.93752.00 -> 0.0625 -> 1.0000And here comes the problem: of course, the cumulative probability assigned to the x-value of 0.00 should be 0.5000 instead of 0.6875. I didn't consider this, but noticed that all my results seemed to be somehow biased (even though I'm working with really big values for n around 512). As a quick and dirty fix, I always added only half of the pdf at all x-values and the other half at the next x-value. Although this helped, its still far away from optimal (my mean of the random samples is still away from zero)..What is the standard / a better way of doing this? I thought about things like applying different weightings to the 'left' and 'right' portion of the probability at one point depending on the 'slope' of the pdf through the point but didn't try this yet? Any recommended texts?Thanks for your help!

Stylz · March 23rd, 2006, 12:33 am

Somehow I think I may be misunderstanding your question. If the attached file answers your question, great. Otherwise I don't know what you are talking about.

Aaron · March 23rd, 2006, 2:08 pm

If I understand you, you're doing it right, you just don't believe it. You've fooled yourself with:Quotethe cumulative probability assigned to the x-value of 0.00 should be 0.5000 instead of 0.6875What you should say is "the cumulative probability assigned to the x-value of 0 should be centered on 0.5," which is true. You assign the interval 0.3125 to 0.6875 to the value 0, that interval is centered on 0.5.A couple other things. Don't use trailing zeros for integers (Fischer Black would have fired you for that). And there's no reason to make a monotonic assignment. It's one natural way to do it, and there's nothing wrong with it, but you get the same result with:(0, 0.0625) => 2(0.0625, 0.1250) => -2(0.1250, 0.3750) => 1(0.3750, 0.6250) => -1(0.6250, 1) => 0or any other assignment. In mine, the points assigned to 0 neither include 0.5 nor are centered on it.All that matters is the probabilities of the various assignments, not the order.

PointerLover · March 24th, 2006, 6:01 am

Hi Aaron and Stylz,thanks for your answers.Aaron:You're right - I assign the interval 0.3125 to 0.6875 to the value 0. But my real problem is: what is the x-value belonging to a randomly sampled cumulative probability of - lets say 0.4000 (I want to use 0-1 distributed numbers to generate random samples from the used discrete density)? At the moment I'm doing this the following way: The probability assigned to the x-value of -1 is centered on 0.1875 (wrong!?), the one assigned to 0 is centered on 0.5000. Then I use linear interpolation to get the x-value belonging to 0.4000. Even though my distribution has 512 points, the resulting random numbers still don't get the desired variance of one.If it helps - here is the source code that I use to compute the distribution and the one for generating random numbers (see arrows):void Binomial_Distribution::Compute_Distribution() { if (changed_parameters_) { Matrix<double> interpolation_data_cdf_inverse; distribution_data_.Resize(tree_steps_ + 1, 3); for (unsigned int k = 0; k <= tree_steps_; k++) { distribution_data_(k, Defaults::MATRIX_X_VALUES) = (2 * k - double(tree_steps_)) / sqrt(tree_steps_); distribution_data_(k, Defaults::MATRIX_PDF) = Get_Value(tree_steps_, k, p_, Defaults::RETURN_PDF); ==> distribution_data_(k, Defaults::MATRIX_CDF) = distribution_data_(k, Defaults::MATRIX_PDF) / 2; ==> if (k != 0) ==> distribution_data_(k, Defaults::MATRIX_CDF) += distribution_data_(k - 1, Defaults::MATRIX_CDF) + distribution_data_(k - 1, Defaults::MATRIX_PDF) / 2; } interpolation_data_cdf_inverse.Resize(tree_steps_ + 1, 2); Matrix<double>::Copy_Column(&distribution_data_, &interpolation_data_cdf_inverse, Defaults::MATRIX_X_VALUES, 1); Matrix<double>::Copy_Column(&distribution_data_, &interpolation_data_cdf_inverse, Defaults::MATRIX_CDF, 0); interpolation_cdf_inverse_.Set_Data_Points(&interpolation_data_cdf_inverse, true); changed_parameters_ = false; }}double Binomial_Distribution::Get_Random() { Compute_Distribution(); double random = mt_random_.rand(); return interpolation_cdf_inverse_.Get_Data_Point(random);}( with the member variables:bool changed_parameters_;MTRand mt_random_;Interpolation_Linear interpolation_cdf_inverse_; )Here are the results from 10 times running 10 Mio. samples [mean] / [variance]:-0.000098 / 1.002778-0.000487 / 1.0016850.000173 / 1.0022560.000079 / 1.002803-0.000228 / 1.002739-0.000146 / 1.0023760.000316 / 1.0023280.000003 / 1.002504-0.000280 / 1.0026580.000401 / 1.002630While the mean looks good, the variance is always to high. That's why all the option prices I'm calculating with these numbers are a little bit too high too. The reason is, that I just divide the probability assigned to one point by two (see arrows in the source code) and don't consider that the distribution is upward sloping for negative x-values and downward sloping for positive ones.Sorry for the confusing notation - I'm not a mathematician..

Aaron · March 26th, 2006, 10:49 pm

I don't see the Matrix values for your code, but if these are correct, the code should work. I suspect the problem is with the interpolation algorithm, which is assigning slightly higher probabilities to the extreme events.I suggest printing out the counts for your individual values. This will help you zero in on which values are under-represented and which are over-represented.