Sleeping Beauty, the Dungeon, and the Envelopes

Jun 22, 2025

This post is about various problems in philosophy that I find interesting to think about. If you’re a fan of the subject, you’ve probably heard of most or all of them (and may or may not like my commentary).1 But chances are you haven’t heard of these problems before, so read on! But before getting to them, we’ll look at something related that gets you thinking about the same ideas most of the problems involve.

Back in 2018, Elon Musk went on the Joe Rogan podcast and popularized the simulation hypothesis. This is the idea that our universe is simulated inside a computer, which should be fairly obvious from the name. Musk said that the argument for it is pretty strong, beginning by saying that if you assume any rate of improvement at all, video games will eventually be indistinguishable from reality. I’d like to “um, akshually” him here and say that this isn’t exactly true. It’s only true that if you assume any unchanging percentage improvement each year, this happens. But any rate of improvement that asymptotically approaches zero can bring us to a hypothetical limit below the level of realism we observe in our (presumably “real”) reality.

But anyway, yes, we’ll probably be able to simulate reality at some point. In the time since that episode aired, we’ve gone from hallucinogenic AI images to ones people often can’t distinguish from real photos. Video generation tech is progressing similarly, so this doesn’t look like the worst prediction in the world, even if there are some meaningful limitations we might find.

But let’s look closer at his argument, because what he really said is that we’re more likely to be in a simulation than base reality. Here is a stronger form of this argument:

Suppose that from the beginning of the universe, we were going to come into existence and eventually invent the technology to simulate a universe identical to our own while leaving the inhabitants ignorant of their being in a simulation, perhaps by accident (maybe we didn’t know they would arise naturally). There are 8 billion original observers, and let’s say 92 billion simulated ones. If each observer is asked whether they’re in a simulated universe, what probability should they assign to their being in a simulated universe?

In this case, the answer appears to clearly be 92%, since observers are assumed to be ignorant about their being in a simulated universe. There are two key steps to determining whether this is true in real life:

Can we really simulate a universe indiscernible from our own to observers?2
How many ignorant observers should we expect to be created by the original universe?

I suspect that the answer to #1 is “Yes”, but the answer to #2 is “Not that many”. Early versions of such a simulated universe would probably be rudimentary and slow. As the technology improves, developers would probably be able to identify any observers within their simulation and quickly do something about the problem (either enlighten them and take responsibility, or shut it down, probably).

After writing this, I did a little more searching and found that Nick Bostrom was the original author of the modern simulation hypothesis and gave a similar “trilemma” of things that might be true:

“The fraction of human-level civilizations that reach a posthuman stage (that is, one capable of running high-fidelity ancestor simulations) is very close to zero”, or
“The fraction of posthuman civilizations that are interested in running simulations of their evolutionary history, or variations thereof, is very close to zero”, or
“The fraction of all people with our kind of experiences that are living in a simulation is very close to one”.

If #1 is true, we probably aren’t in a simulation, and the same goes for #2. But both are not easy to argue for, so it seems like #3 is true. Like I said, my position is that something like #2 is probably true. They might be interested, but they don’t actually do it. I’m not confident in this, but if we knew the simulation hypothesis is wrong, I would guess that this explains why.

For our first problem, let’s take a detour from universe shenanigans.

The Sleeping Beauty problem

Here’s the Wikipedia description:

Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: On Sunday she will be put to sleep. Once or twice, during the experiment, Sleeping Beauty will be awakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes her forget that awakening. A fair coin will be tossed to determine which experimental procedure to undertake:
If the coin comes up heads, Sleeping Beauty will be awakened and interviewed on Monday only.
If the coin comes up tails, she will be awakened and interviewed on Monday and Tuesday.
In either case, she will be awakened on Wednesday without interview and the experiment ends.
Any time Sleeping Beauty is awakened and interviewed she will not be able to tell which day it is or whether she has been awakened before. During the interview Sleeping Beauty is asked: “What is your credence now for the proposition that the coin landed heads?”

Take your time and think about this as much as you want before proceeding.

My preferred “solution” is the ambiguous-question position. Probabilities are meaningful to me because they tell you what the average of the sequence describing whether an event occurred will converge to. For example, the probability of heads for a fair coin is 0.5 because the sequence describing whether the coin landed on heads (something like {0, 1, 1, 1, 0, 1, 0, 0,…}) converges to an average value of 0.5. (If you don’t have an intuitive sense for this, notice that 0 and 1 occur just as often as each other and 0.5 is between them.)

The ambiguity in the problem arises in how we construct the sequence describing whether the coin landed on heads or not. This experiment appears to only occur once, but for the sake of determining the probability, let’s imagine it occurring repeatedly.

The heads case seems to have no ambiguity; if the coin landed on heads and we wake up on Monday only, we count heads once and the sequence begins with {1}. But suppose in the next experiment it lands on tails. Do we count every time she wakes up as a case where the coin landed on tails, or do we just count it once? This seems to be the key question. If we count it twice, the sequence might proceed to {1, 0, 0}, then we keep repeating the experiment and get something like {1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,…}. This has an average that converges to 1/3rd. Clearly, if she wants to give a well-adjusted prediction of what the coin landed on, she should say she’s 66.66% confident it landed on tails. We can refer to this as the guessing interpretation, since we count each time Sleeping Beauty gets to provide a probabilistic guess of what happened.

If, instead, you only count an occurrence of tails once, the sequence becomes something like {0, 0, 0, 1, 0, 1, 1, 1, 0,…}, which has an average that converges to 0.5. Hence, the probability is 1/2. We can refer to this as the past interpretation, since we are solely counting what’s happened in the past.

This feels like it cleanly “solves” the problem, but I took one look at a paper on the subject and quickly saw a bunch of confusing math I can’t comment on. In a survey of philosophers, “The question is too unclear to answer” came in at 8.39% of the vote. The most popular answer was that her credence about the coin landing on heads should be 1/3, with 1/2 coming in second, though a surprising 40.33% were “Agnostic/Undecided”. I think it’s best to have humility here.

This problem doesn’t seem to have much direct bearing on our ideas about the simulation hypothesis. But it did allow me to provide you with some of the reasoning I’ll apply to a more relevant problem, and besides, it’s neat.

The incubator

Imagine a dungeon of 100 cells with no windows or openings. Each cell is empty until we turn on a machine called the incubator. It’s turned on, creating one observer in cell 1.

The incubator then flips a fair coin, and these are the possible outcomes:

Heads: 99 people are added to the remaining cells.
Tails: Nothing.

Much time has now passed since the coin was flipped, and you’re in one of the cells. (You don’t know which one.) You’re given a pamphlet explaining what the incubator has done, and if anyone else is there, they also get a pamphlet. This is the end of stage (a). At this point, how likely is it that the coin landed on tails?

After this stage comes stage (b). During this stage, you’re informed that you’re in cell 1. Knowing this, how likely is it that the coin landed on tails? Maybe take your time and devise answers yourself.

Let’s follow the same reasoning as before to get Pr(tails) at stage (a). If we want each observer to give a well-adjusted probability, then there are 101 observer cases to deal with: it landed on tails and nothing happened, it landed on heads and you’re in cell 1, and it landed on heads and you’re in another cell. We need to weigh these observations by their probability. That gives us 0.5*1 = 0.5 probability-weighted tails cases and 0.5*100 = 50 probability-weighted heads cases.3 Thus, the frequency of a tails case across observations is 0.5/50.5, or 1/101. That’s the probability it landed on tails in stage (a).

At stage (b), we’re told that we’re in cell 1. There’s never going to be a case where somebody who isn’t in cell 1 is given this information, so our cases have changed: either the coin landed on heads and we’re told we’re in cell 1, or it landed on tails and we’re told we’re in cell 1. Using the method from before, we have 0.5*1 = 0.5 probability-weighted tails cases and 0.5*1 = 0.5 probability-weighted heads cases. Thus, the odds that the coin landed on tails are 0.5, or 50%.

But this is just the work of Some Guy, that guy being me. What did Nick Bostrom, the author of the philosophy paper this originated from, have to say about it? I thought through the problem before reading his work, so he may show that I’m very wrong.

Bostrom went through multiple models used for arriving at probabilities. In the first model, at stage (a), you decide the probability of tails is 50% simply because you know the coin toss was fair.4 Then Bostrom applies Bayes’s theorem to get the probability of tails at stage (b). (Here’s a handy guide for beginners if you haven’t read my blog enough to know what this is.)

$\frac{Pr(cell1|tails)Pr(tails)}{Pr(cell1|tails)*Pr(tails)+Pr(cell1|heads)Pr(heads)} = \frac{1*0.5}{1*0.5+0.01*0.5} = \frac{100}{101}$

This is very different from our previous answer. Instead of thinking Pr(tails) = 0.5, we are now almost certain the coin landed on tails, because we’re much more likely to observe that we’re in cell 1 if the coin landed on tails than if the coin landed on heads.

If we follow the reasoning from before, this is a terrible idea. Suppose you keep asserting this probability across multiple experiments. Tails is still only going to occur half the time, so you’ll keep being way overconfident that it landed on tails. It’s like if Nate Silver kept saying the Democrat was sure to win the presidency.

What gives? Our magic formula doesn’t seem so magical right now. I think the trouble arises in thinking Pr(cell1|heads) = 0.01. At stage (b), we know that we’re in cell 1, so the probability is always 1, even if we condition on the coin landing on heads. Modify this part of the formula, and you get the same probability as before.

Bostrom’s second model arrives at the same conclusion I did. The reasoning appears to be pretty much the same:

Model 2. Since you know the coin toss to have been fair, and you haven’t got any other relevant information, your credence of tails at stage (b) should be 1/2. Since we know the conditional credences (same as in model 1) we can infer, via Bayes’s theorem, what your credence of tails should be at stage (a), and the result is that your prior credence of tails must equal 1/101. Answer: At stage (a) your credence of tails should be 1/101 and at stage (b) it should be 1/2.

Bostrom leaves the proof to the reader, as every good mathematician knows to do. I’ll help you understand what he did here if you want, but otherwise, you can just skip past the formulas below.

When Bostrom talks about inferring the prior credence of tails, he means we should figure out what prior probability of tails results in a probability of 0.5 at stage (b). That means we need to set the formula equal to 0.5 and isolate Pr(tails). In this first formula, notice that I have replaced Pr(heads) with 1 - Pr(tails), which means the same thing and allows us to isolate Pr(tails).

$\frac{Pr(cell1|tails)Pr(tails)}{Pr(cell1|tails)*Pr(tails)+Pr(cell1|heads)(1-Pr(tails))} = 0.5$

Now replace the condition probabilities with their values and simplify.

$\frac{Pr(tails)}{Pr(tails)+0.01-0.01*Pr(tails)} = 0.5$

Multiply both sides by the denominator.

$Pr(tails) = 0.5*Pr(tails)+0.5*0.01-0.5*0.01*Pr(tails)$

Solve for Pr(tails), starting by moving every Pr(tails) term to the left side of the equation.

$Pr(tails) - 0.5*Pr(tails) +0.005*Pr(tails) = 0.005$

$Pr(tails)*(1-0.5+0.005) = 0.005$

Pr(tails) = 0.005/0.505 = 1/101, like before.

The third model Bostrom introduces says that we have no relevant information at either stage, so in both cases, Pr(tails) = 0.5. Bostrom doesn’t really say much about this third model, only that it’s wrong because, as we’ve argued, “I’m in cell 1” really is relevant information. To be “egalitarian”, he provides a problem for each model. But I’m only an egalitarian when it comes to people, so I’m inclined to ignore the first critique of the first model (which I think would bore you).

Bostrom criticizes model 2 by pointing to a seemingly implied result in “The Presumptuous Philosopher”. In the future, scientists have narrowed down the search for a theory of everything to T₁ and T₂. These two theories describing the universe are the same in almost every way. They tell you there are lots of observers in the cosmos, except T₂ tells you there are a trillion times as many observers as there are in T₁. Physicists are preparing an experiment to falsify one of the theories, but then,

Enter the presumptuous philosopher: “Hey guys, it is completely unnecessary for you to do the experiment, because I can already show you that T₂ is about a trillion times more likely to be true than T₁!”

As Bostrom explains, in model 2, we make the “Self-Indication Assumption (SIA)”, which tells you that a case with 2N observers is twice as likely as a case with N observers. If you followed my reasoning, this is clearly true. But this assumption is exactly what allows this presumptuous philosopher to know that T₂ is much more likely to be true.

I have to say that I think this is very silly. Before, it made sense to think of probabilities that work given the ways we could be wrong. We wanted to say there was a 1/101 chance the coin landed on tails because there were 100 other cases we could be in, and we knew these cases could happen with a certain probability. Physical reality is not set up in the same way. There is only one universe, and we have not been guaranteed any possibility of another. We are only trying to find the correct description for the same thing. To make the dungeon problem analogous, we’d need the existence of these other cells with any probability to be completely unknown to us. But dungeon residents don’t speculate about the existence of other observers based on evidence. They get a pamphlet telling them exactly how often these observers can be expected to exist.

I could proceed through the rest of the paper, which I’m sure is brilliant, but I don’t want to spend too much time on this one thought experiment. Go read the whole paper if you want to.

The envelopes

You are given two identical envelopes, each containing money. You choose one. But one envelope has twice as much money as the other, so should you switch?

Our intuition tells us that switching envelopes does no good, but a standard calculation using expected value tells us that we should switch. Call the amount of money in your envelope A. Clearly, we should switch if the expected value of the money in the other envelope is greater than A. Here it is:

E(money in the other envelope) = 0.5A*0.5 + 2A*0.5 = 1.25A

Notice here that we have two cases in our sum. In the first case, the other envelope has half the value of the one we currently hold, 0.5A, and in the second case, the other envelope has twice the value of the one we currently hold, 2A. They’re treated as equally likely, so both are multiplied by 0.5. The final result is 1.25A, which is greater than A, so we should always switch. What gives?

The problem, to me, seems to arise from assuming A is not a random variable but a fixed number we can work with. The act of picking up the envelope doesn’t change anything about the situation. Even before doing so, we could try to write the expected value of one envelope in terms of the other, resulting in the same strange result because we simply can’t know the value of one envelope independently of the other envelope.

Let’s try this again. Rather than talking about “A”, we’ll talk about the expected value of an envelope. One has twice the value of the other, so we’ll say one has $1 and the other has $2, making the expected value of an envelope $1.50. Arbitrarily describe one envelope as E1 and the other as E2. We know that E(E1) = E(E2) = $1.50.

What’s E(E1) in terms of E(E2)? Well, we could go through the same reasoning as before, saying that half the time, E1 is going to be twice E2, and half the time, E1 is going to be half of E2. But we’ve already described E(E2) by supposing it will be $1 half the time and $2 half the time. If we went down the same line of reasoning as before, we would be accounting for chance twice! This seems intuitively incorrect to me, though I can’t quite put my finger on why.

But like I said, I think the mistake is in treating A as a fixed variable. A is a random variable, and the value of the other envelope is the exact same random variable. Once we started talking about the probability that the other envelope was this or that relative to the other, we were taking an entirely nonsensical step. To solve the paradox in the way asked by Wikipedia, I think the problem is in step 3. Saying “The other envelope may contain either 2A or A/2” is like saying “The envelope may contain either twice itself or half itself with some probability”.

Once I started reading further, I was pleased to see that the first example solution given appears to be the same as mine:

The famous mystification is evoked by confusing the situation where the total amount in the two envelopes is fixed with the situation where the amount in one envelope is fixed and the other can be either double or half that amount. The so-called paradox presents two already appointed and already locked envelopes, where one envelope is already locked with twice the amount of the other already locked envelope. Whereas step 6 boldly claims “Thus the other envelope contains 2A with probability 1/2 and A/2 with probability 1/2”, in the given situation, that claim can never apply to any A nor to any average A.

This feels slightly different but the same in that it points out that you can’t say the amount in one envelope is fixed.

I think the bad math in this problem is caused by our intuitions about physical reality. We think of the two envelopes as being physically distinct, and they are, but for the sake of determining the expected value of “the other envelope”, it’s nonsensical to treat them as distinct. Anyway, the best part of the article is this couple of sentences:

No proposed solution is widely accepted as definitive. Despite this, it is common for authors to claim that the solution to the problem is easy, even elementary.

hehehe

Newcomb’s paradox

There are two agents: a reliable predictor and a player. Two boxes are designated A and B. The player is given a choice between taking only box B or taking both boxes A and B. The player knows the following:
Box A is transparent and always contains a visible $1,000.
Box B is opaque, and its content has already been set by the predictor:
If the predictor has predicted that the player will take both boxes A and B, then box B contains nothing.
If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.
The player does not know what the predictor predicted or what box B contains while making the choice.

Take your time with this one. Here’s the Wikipedia page if you want it.

Upon reflection, I’ve decided that the best solution to this problem is to drink as much alcohol as you can before being given the problem description. A stupid person would be easily predictable and would grab the $1,000,000 box without thinking about it at all. They might not even realize that somebody decided whether the box contains $1,000,000, and just hear the amount of money and pick box B. And notice that the only plausible case where the predictor doesn’t put $1,000,000 in box B is one in which the player is smart enough to consider grabbing both boxes, and thus the predictor might not put anything in Box B. The value of box B is in some way positively related to the stupidity of the player. I don’t know if this solution has a name or is even worth taking seriously, but I would call it the alcohol solution.

Unfortunately, we don’t have the luxury of using the alcohol solution; we’ve already started thinking seriously about this. Here are the two main solutions:

Grab only Box B. Once you’ve done this, you gain information about what the predictor believed you would do that retroactively justifies your decision: they thought you would do this, so it’s full of money. If you had grabbed both boxes, the opposite would have happened; they’d have known you would do it, so nothing would be in Box B.
Grab both boxes. The guy who wrote the last solution is insane. The decision you make has no bearing on the physical reality in front of you. Regardless of the choice you make, the actual prediction that was made and the contents of the boxes are the same.

That these are both fairly convincing arguments is why this is called a paradox. I’d only grab Box B—ideally after taking a swig and then beginning the experiment—but 39.03% of philosophers disagree, and maybe they know something I don’t.

As fun as these problems are, it’s not clear where we go from here. There are various lessons:

Calculating probabilities is hard when it’s not clear what counts as an observation.
If we know that a chance event determines the existence of additional observers, knowing that you are one of the original observer(s) makes it less likely that the chance event occurred. (Or so I believe. Bostrom disagrees, and I have yet to parse his argument.)
Treating a random variable as a fixed quantity seems to cause problems.
Drinking is good for you.

I could spend a lot more time on this and take these ideas much more seriously—other writers like Bentham’s Bulldog have—but I mostly wanted to give you something fun to chew on, and I should be studying for the GRE anyway.

I think a lot about the way I write, and “may or may not” often catches my eye because it feels redundant. Taken literally, “may” already implies “may not”, but “may or may not” still feels like it has a distinct meaning. I think of this phrase as being in the same vein as “ran quickly” and other phrases that provide emphasis. “May or may not” seems to indicate that “may not” is unusually likely, so it’s worthwhile to use.

Implicit here is that such a simulation would actually have observers, i.e., that the people getting simulated would experience the world in the way we do rather than just existing as digital entities. This whole thing gets much less interesting if it turns out that simulated beings don’t experience anything, but I don’t think we can know whether that’s true anyway. See here.

This is just a formality that would be more relevant if it weren’t 50/50.

I really don’t like this part, but oh well.

Jackonomics

Discussion about this post