Missing numbers

mywil05 · May 5th, 2005, 6:43 pm

Suppose we have 98 distinct integers from 1 to 100. What is the best way to find out the other two missing integers (within [1, 100])? Thanks.

leherbert · May 5th, 2005, 6:54 pm

Without any other information, you can't do better than trying all of them one after the other with your favorite ordering...

rblaser · May 5th, 2005, 7:18 pm

You could sort in O(n) in this case but there isn't really a need because you could also keep running sums of the values and their squares and solve the two equations with two unknowns at the end. The second approach is also O(n).

mywil05 · May 5th, 2005, 7:19 pm

Actually this one comes along with another problem: Suppose we have 99 distinct integers from 1 to 100. What is the best way to find out the other two missing integers (within [1, 100])? Solution: just sum these 99 numbers and subtract it from sum(1,2,...,100).So is it possible to get 2 missing numbers similarly? Of course not the same as 1 missing number problem since it is possible to have combinations to equal to the same now, such as 1+14 = 2+13 = ... = 7+8

mywil05 · May 5th, 2005, 7:49 pm

rblaser, your answer is pretty good, one question is, is it really better off to solve two equations with two variables than just use an extra table to mark if each number appears in the array or not? i guess solving equations costs lots of time.

karakfa · May 5th, 2005, 9:19 pm

A better way to do is keep track of the numbers in a bitset with setting the corresponding bit and finally check which bits are not set to find out the missing numbers.

mywil05 · May 6th, 2005, 12:40 am

Well, the method you mentioned should be the best way to search multiple missing numbers, however, it takes extra storage too. What if we don't have extra storage for bitset?

hanzotutu · May 6th, 2005, 5:42 am

Edit: the same method as rblaser's

karakfa · May 6th, 2005, 2:18 pm

the "storage" you need is one bit per number. Generally an integer is represented as 4 bytes (= 32 bits). For a list of hundred numbers at most you'll need 4 integer worth of "storage".The sum and square sum also need "storage", well two Anyway, this is how it's done in most applications.

Aaron · May 10th, 2005, 8:52 pm

I think the situation is you read the numbers one at a time (say they are coming in as input) and do not have the storage to store them all. So you need an algorithm that works number by number.You could store the product of all the numbers as well as the sum. No two pairs of numbers have the same sum and product. Unfortunately, a product requires the number of bits as all of the numbers. You could store the sum of the digits as well as the sum of the numbers, but you might not get a unique combination. For example, if you're missing 9 and 91 or 11 and 89 you get a sum of 100 and a sum of digits of 19. Similarly, you could take a count of even numbers and odd numbers separately, which works great if one missing number is even and one is odd, but not otherwise.I'm sure there's a clever rule, but I don't see it right away.

JWD · May 11th, 2005, 11:39 am

Well, this may not work because I haven’t thought it totally through, but you could try this:1. For every number r coming in, write down its prime number decomposition as r = 2^a2 * 3^a3 * 5^a5 * 7^a7 * ….2. Keep track of the number of times ak = n occurs for each ak and each possible n for the whole list; call this N(ak,n).3. You know the right answer Nright(ak,n) for N(ak,n) given a range of r in [1,100]; for example Nright(a7,2) = 2 because there are two numbers r = 49, 98 that have the factor 7^2. Also Nright(a7,1) = 12 for the 12 numbers r = 7,14,21,28,35,42,56,63,70,77,84,91 (but not 49 and 98).4. Get D(ak,n) = Nright(ak,n) - N(ak,n) for all k,n. This gives you the prime number decomposition of the missing number. For example, if r = 40 = 2^3 * 5^1 is missing, you will get D(a2,3) = 1 and D(a5,1) = 1 corresponding to factors 8*5 which is 40.5. For two missing numbers, e.g. r = 40 and r = 72 missing, you will get D(a2,3) = 2, D(a3,2) = 1, D(a5,1) = 1. Now you have to see the combinations of products of 8, 8, 9, and 5. You cannot use 64 and 45 because 64 = 2^6 and you know D(a2,6) is correct already. However, I don’t know if this is general.--------------

mensa0 · August 3rd, 2005, 5:52 am

I think there is an easy way on this. Say N=100. The sum of all the integers is [N*(N+1)]/2. Call this sum1. Sum up the 98 integers that you have (call this sum2). The sum of x and y (the two unknown integers) must equal sum1-sum2. Now, take the product of the integers 1...N and divide this by the product of the 98 integers that you know. This ratio will equal the product of x and y. So you now have two equations and two unknowns.Check this out and let me know if it works for all N. (I believe it does!)Mike

Aaron · August 3rd, 2005, 6:54 pm

It works, but storing the product of all the numbers requires the same number of bits as storing all the numbers.

bhutes · August 4th, 2005, 3:28 am

We could store the log of the product of N numbers ... and keep deducting the log of the numbers in the stream as they appear.If N were 1 million, we could still differentiate between 1,000,000 and 1,000,001 by storing the log upto 13 digits after decimal.LN(1,000,000)=13.8155105579643LN(1,000,001)=13.8155115579638I think with a million subtractions of uncertain last digits, a log with (13+6) digits after decimals should be used to store the product.The characteristic being of the order of 10 million (another 8 digits) ... so with 27 digits, the product of 1 million numbers could be stored ... with not much loss of gainful information.Not sure what problem I'll run across, when writing an actual program using this approach .. but I hope the library functions do generate log upto 19 digits after decimal (excel stops after 13 digits) !!Does this work?

karakfa · August 4th, 2005, 12:41 pm

I can't see how these methods are better than using an indicator for the numbers encountered. It can be easily done through the binary representations where each bit can be used as a flag for a corresponding number equal to the position of the bit. You need "storage" of N bits.As a side benefit this can be extended to any number of missing numbers.For example: assume N = 8numbers come as: 2,3,5,6,7,8 (1 and 4 are missing).Your bitset will be: (bit positions for 87654321)00000000 - initial00000010 - set 200000110 - set 300010110 - set 500110110 - set 601110110 - set 711110110 - set 8DONEthe missing numbers are 1 and 4.