Efficient choosing algorithm

Cuchulainn · October 5th, 2013, 4:37 pm

QuoteOriginally posted by: Traden4AlphaQuoteOriginally posted by: CuchulainnQuoteOriginally posted by: outrunYes on a functional level, but I can't look inside those calls. If these are precisely documented, what else do you need to know? O(n) Analogy: in the olde mainframe days you had to pre-estimate:a. the number of operationsb. the number of words in memory.If you know a. and b. for shuffle() then finito, yes?Assuming one trusts the documentation! That's not a problem; most developers don't document their work. Plan B: adopt NIH and write your own algorithm.

Cuchulainn · October 6th, 2013, 9:07 am

QuoteOriginally posted by: outrunI expect the C++ random shuffle to have zero probability of generating an unshuffled sequence? If so then it's not usable because that will cause statistical biases in the generated sample sequences!¿ Knuth 3.4.2 Algorithm P ¿

Cuchulainn · October 7th, 2013, 9:21 am

QuoteOriginally posted by: outrunQuoteOriginally posted by: CuchulainnQuoteOriginally posted by: outrunI expect the C++ random shuffle to have zero probability of generating an unshuffled sequence? If so then it's not usable because that will cause statistical biases in the generated sample sequences!¿ Knuth 3.4.2 Algorithm P ¿I still need to check this, what would you expect the outcome to be? How many times will it print "Yes", 0 or +/-1000?Not sure, new way of thinking for me Has it got to do with riffling? And this looks relevant.

acastaldo · October 7th, 2013, 5:05 pm

The following is the Fisher-Yates shuffle, also known as Knuth 3.4.2 Algorithm P: /* To shuffle an array a() of n elements (indices 0..n-1): */for i from n - 1 downto 1 do j <- random integer with 0 <= j <= i exchange a(j) and a(i)end forthis is known to give each permutation with equal probabilitynotice that i may be equal to j, which causes an "empty shuffle" sometimes; this is the secret of why (1,2) is not always shuffled to (2,1), sometimes it is left the same (hopefully half the time!)

Cuchulainn · October 8th, 2013, 8:12 am

QuoteOriginally posted by: outrunExcellent acastaldo, .. that's smart allowing for i to be j, and that will indeed solve it.I have opened a post on Programming to inquire if this algo is in C++ 11. BTW is it possible to devise a test to see if the algo is 'good'?

Traden4Alpha · October 8th, 2013, 1:16 pm

QuoteOriginally posted by: outrunQuoteOriginally posted by: CuchulainnQuoteOriginally posted by: outrunExcellent acastaldo, .. that's smart allowing for i to be j, and that will indeed solve it.I have opened a post on Programming to inquire if this algo is in C++ 11. BTW is it possible to devise a test to see if the algo is 'good'?Yes, a good algorithm will generate all possible permutations of the elements with equal probability. A bad algorithm will have a bias in the probabilities. Eg when supplied with a set of elements in some initial order (like a vector) a bad algorithm might *never* return the elements in their original order. That's what the snippet with the loop to 6000 tests. In one of 6 shuffles is should leave the set of 3 elements in their original order (3 elements -> 3!=6 possible orderings of the elements)There would be other tests, too. Given an input of 1 to N, the position-weighted moments of a infinite run of permutations should have easily-calculated values. Any non-uniformity of the shuffler will resolve as statistical discrepancy between the theoretical value and the computational/empirical value of those moments.

Cuchulainn · October 10th, 2013, 11:02 am

QuoteThis simplifies the statistical test we have to do, right? We don't need to test the random engine, only the shuffle. We might even feed it with a fake generator and prove it's validity by examining all possible internal code paths?That's an interesting remark indeed. Then you test the shuffle w/o getting ruffled by rand(). Many applications won't need 2 * 10^7 Rns anyways.

Traden4Alpha · October 10th, 2013, 12:02 pm

QuoteOriginally posted by: CuchulainnQuoteThis simplifies the statistical test we have to do, right? We don't need to test the random engine, only the shuffle. We might even feed it with a fake generator and prove it's validity by examining all possible internal code paths?That's an interesting remark indeed. Then you test the shuffle w/o getting ruffled by rand(). Many applications won't need 2 * 10^7 Rns anyways.Only the smallest toy problems are safe with such an RNG. If N≥11, there's more than 2 * 10^7 possible shuffle orders. If rand() has only 2 * 10^7 unique values, and you have a mere N=15 items, then shuffle() will miss 99.9985% of the possible shuffle orders.