another three doors problem

ebenezer · March 9th, 2012, 4:11 am

The game host will put prize behind one of the three doors, door 1, 2 or 3. You need to pay 1 dollar to pick a door. If you pick up correct door, you get the prize. Otherwise, the host will tell you the door number you pick up is too high or too low. The host is always honest. With his information, you pay another 1 dollar to pick up another door. And so on till you eventually pick up the correct door. But you need to pay $1 each time you pick up a door. For example, if host put prize behind door 1. You pick up door 2, then the host tells you that your number is too high. With this information, you know the prize is behind door 1. So you pay another $1 to pick up door 1 and win the prize. This game keeps on playing. The host learn from you behavior. Based on the host's observation on your picks, he makes decision to put the prize behind different door in every new round of the game. He always try his best with full information of the playing history so far to make your pay more to win the prize. For example, if you keep on picking door 2, the host learns and always put the prize behind door 1 or 3. In the long run, what is your strategy to win the prize with minimal cost to pay on average.

goekhan · March 9th, 2012, 10:02 am

What decisions does the host exactly mine? The initial door-guess of repeated games or also the decisions within a sub-game after the initial door-guess? For instance, if I choose door 2 but the prize is behind one of the other doors, I would have to make an additional (conditional) guess, 3 or 1. Does the host also mine these?If yes then (Recursive approach):start with 2. If 2 is correct: In the next game either choose 1 or 3 as the host will choose 1 or 3 with equal probability but not 2. If your choice, say 1, is correct, then start in the next game with 3, as the host will put the prize there for sure. Now you are at your initial state again -- you have choosen all doors the same amount. And you repeat your strategy.If your choise, 1 is not correct you pay an extra dollar to pick 3. Now you are again at your initial stateif 2 is not correct: pick 1 or 3 depending on what the host says. In the next game choose the remaining door as the host will have put the prize there for sure. Again, you are at your initial state (you have choosen all 3 doors equally often)

ebenezer · March 9th, 2012, 11:59 am

I guess I need to make some clarification.This is a repeated game and each repeated game consists of the following sub-game.sub-game:-----------------------------------------1) host puts prize behind door 1, 2 or 3.2) player pays $1 to guess a door number3) if the number is correct, player wins the prize4) if the number is not correct, host always tells the player whether the door number he guessed is too high or too low. Host always provides true information.5) player pays $1 to guess another door numberthis sub-game keeps on playing till eventually player win.We assume that both player and host are intelligent and rational. The player minimizes the payment and the host maximizes the payment. The host only makes the decision on where to hide the prize at the beginning of this sub-game. Then, he follow mechanical process. But in each new sub-game, he might change the door number for prize from what he has learned from the full history of previous plays.Here are few example of the sub-games:example 1:host hides prize behind door 1, player pays $1 and guesses door 2, host tells player that the number he guessed is too high, player pays $1 and guess door 1, player wins the prize.Note that the host know player are ration and intelligent. So after he knows that the player will definitely guess door 1 once he tells him that his first guess is too high.example 2:host hides prize behind door 2, player pays $1 and guesses door 1, host tells player that the number he guessed is too low, player pays $1 to guess door 3 or 1. It is part of player's strategy to pick up 1 or 3. If player picks up 2, he wins. If player picks up 3, host tells him that his number is too high. He will pay $1 to pick up 2 and win.-------------------------------------end of description of sub-gameThe sub-game is repeated again and again. Originally, the player will pick up 2 in the first sub-game, as this will make sure he win the sub-game with $2 cost. But if the host learned that player always guess door 2, he might change the prize location and hide the prize more often behind door 1 or 3. After the player found that the host hide the prize more often behind 1 or 3. Maybe it is better to guess 1 or 3 in the first guess. After a while, the host notices that the player keeps on guessing door 1, for example, the host will change his behavior to hide prize at some other doors. If we keep on playing this sub-games for ever, what is the expected cost per sub-game?

Vanubis1 · March 9th, 2012, 12:12 pm

A way to see this problem with game theoryYou have 5 possible strategies:* Open 1 then 2 * Open 1 then 3 * Open 2 then the right one* Open 3 then 1* Open 3 then 2On the long run, you have probabilities on these strategies.As 1 or 3 are similar, you have to choose A=p(2); B=p(1,3 then 2); C=p(1,3 then 3,1) with A+2B+2C=1So if the host choose to put the prize behind door 1, you will pay W1=(B+C)+2A+2C+3B=2A+4B+3CSo if the host choose to put the prize behind door 2, you will pay W2=A+4B+6CYou want to minimize Max(W1;W2)--> A=60% B=0% C=20%and you will pay 1.8On short term, you can optimize locally but i think it will converge soon to the solution above

ebenezer · March 9th, 2012, 1:28 pm

You cannot assume A+2B+2C=1, as there are some other cases missing from your list. It is possible that the player picks 1 or 2 or 3 and win directly. He does not have to do the second choice if his first choice is correct.

Vanubis1 · March 9th, 2012, 1:37 pm

I don't understand your remark, you can define your strategy if you don't find the right door and of course stop if you find it.That's why when the price is behind the first door, your probability is B+C because your second move doesn't matter.

ebenezer · March 9th, 2012, 1:52 pm

You are right. I reply too fast. B = p(1,3 then potentially 2 when necessary).