* * * * *

                 1/2 a girl vs. 2/3 a boy; or—I suck at stats

> Listen, here's the thing. If you can't spot the sucker in the first half
> hour at the table, then you are the sucker.
>
> “Matt Damon [1]”, “Rounders [2]”
>

Back in my college days, I was invited to a poker game, and I'm sure by sheer
coincidence, the said day just happened to be pay day. Now, while I knew (and
still know) what the various hands are (“flush”—five cards of the same suit,
“full house”—three of a kind with a pair, “royal flush”—the ace, king, queen,
jack and 10 of a single suit, etc), I didn't know (and still don't) the
ranking of the hands—which hands won over which hands. I was assured that
wouldn't matter and that I could have a “cheat sheet.” So I arrive at the
game with a huge pocket full of money and an attitude of “how hard can this
be?” Said attitude was reinforced as I won a few early rounds.

The end of the night came with the end of my money.

I learned two lessons that night:

 1. Never, ever play poker again, and
 2. I am bad at statistics.

While the first lesson sunk in (and to this day, I haven't played a game of
poker, so my record stands at a rather dismal 0–1) I forgot the second lesson
that I suck at statistics.

Monday, I wrote about pairs of kids [3] and the odds of a particular pairing,
given some information.

> Let's say, hypothetically speaking, you met someone who told you they had
> two children, and one of them is a girl. **What are the odds that person
> has a boy and a girl?**
>

“Coding Horror: The Problem of the Unfinished Game [4]”

I read the explanation for the 2/3 results [5], said “Okay, I can see that,”
accepted it as gospel and went about my business, which involved me going
back and forth with someone over this issue [6], with both of us firm on our
respective view points (me: 2/3; Vorlath [7]: 1/2).

Wanting to settle this once and for all, I wrote a very verbose program [8]
(it's written for clarity, not to be fast or anything—this is a very tricky
problem and yes, the program is verbose) that picks a bazillion pairs of kids
and brute forces the results so that I can figure out who's right and who's
wrong.

Table: Number of kids
       Value   Percentage
------------------------------
Total # of kids 20000000        100.0
Boys    10002254        50.0
Girls   9997746 50.0

I ran this program for 10,000,000 pairs. 20,000,000 virtual kids were created
for this. 50% boys, 50% girls. No controversy here.

Table: Pair Stats
       Value   Percentage
------------------------------
Total # of pairs        10000000        100.0
Boy/Boy 2501203 25.0
Boy/Girl        2499753 25.0
Girl/Boy        2500095 25.0
Girl/Girl       2498949 25.0
At least one Boy        7501051 75.0
At least one Girl       7498797 75.0

Again, nothing unexpected here either. Four possible pairings, 25% of each
pairing. 75% of the pairings will have at least one girl, and 75% will have
at least one boy. Again, straight from the numbers. So far, so good.

Table: Disclosure table #1—Overview
       Value   Percentage
------------------------------
Total # of pairs        10000000        100.0
Disclosed First Kid     5000671 50.0
Disclosed Second Kid    4999329 50.0
Disclosed Girl  4999440 50.0
Disclosed Boy   5000560 50.0

Nothing seems wrong here; half the kids being disclosed are the first ones;
independently, half of the kids being disclosed are boys. But there is a
problem here, but for now, I'll leave it to the reader to spot the issue (and
it is an issue with this problem). I didn't spot the problem until later
myself.

Table: Disclosure table #2—disclosed a Girl
       Value   Percentage
------------------------------
Disclosed Girl  4999440 100.0
  First kid   2499547 50.0
  Second kid  2499893 50.0
Disclosed Girl, other girl      2498949 50.0
  First kid   1249211 25.0
  Second kid  1249738 25.0
Disclosed Girl, other boy       2500491 50.0
  First kid   1250336 25.0
  Second kid  1249738 25.0
Disclosed Girl, pick girl, correct      2498949 50.0
  First kid   1249211 25.0
  Second kid  1249738 25.0
Disclosed Girl, pick girl, wrong        2500491 50.0
  First kid   1250336 25.0
  Second kid  1250155 25.0
Disclosed Girl, pick boy, correct       2500491 50.0
  First kid   1250336 25.0
  Second kid  1250155 25.0
Disclosed Girl, pick boy, wrong 2498949 50.0
  First kid   1249211 25.0
  Second kid  1249738 25.0

[The first three lines of this particular table can be read as:

 1. a girl was disclosed
 2. the disclosed girl was the first kid in the pair
 3. the disclosed girl was the second kid in the pair

The line labeled “Disclosed Girl, pick girl, correct” can be read as: a girl
was disclosed, we picked the other kid as being a girl, and we were correct.”
—Editor]

Well … XXXX! I was wrong! The odds are 50/50. I was all set to start posting
this when I noticed Vorlath conceeding the 2/3 position on this follow- up
post [9].

I must have missed something in the program.

Okay, what if I exclude from consideration the boy/boy pairs entirely? How do
the odds change then? One two-line patch later and …

Table: Number of kids
       Value   Percentage
------------------------------
Total # of kids 15000398        100.0
Boys    4998619 33.3
Girls   10001779        66.7

Okay, numbers are 75% of what we had … so far so good.

Table: Pair Stats
       Value   Percentage
------------------------------
Total # of pairs        7500199 100.0
Boy/Boy 0       0.0
Boy/Girl        2500052 33.3
Girl/Boy        2498567 33.3
Girl/Girl       2501580 33.4
At least one Boy        4998619 66.6
At least one Girl       7500199 100.0

Yes, that's what would be expected by dropping a quarter of all pairings.

Table: Disclosure table #1—Overview
       Value   Percentage
------------------------------
Total # of pairs        7500199 100.0
Disclosed First Kid     3750492 50.0
Disclosed Second Kid    3749707 50.0
Disclosed Girl  5002113 66.7
Disclosed Boy   2498086 33.3

Table: Disclosure table #2—disclosed a Girl
       Value   Percentage
------------------------------
Disclosed Girl  5002113 100.0
  First kid   2500888 50.0
  Second kid  2501225 50.0
Disclosed Girl, other girl      2501580 50.0
  First kid   1250803 25.0
  Second kid  1250777 25.0
Disclosed Girl, other boy       2500533 50.0
  First kid   1250085 25.0
  Second kid  1250777 25.0
Disclosed Girl, pick girl, correct      2501580 50.0
  First kid   1250803 25.0
  Second kid  1250777 25.0
Disclosed Girl, pick girl, wrong        2500533 50.0
  First kid   1250085 25.0
  Second kid  1250448 25.0
Disclosed Girl, pick boy, correct       2500533 50.0
  First kid   1250085 25.0
  Second kid  1250448 25.0
Disclosed Girl, pick boy, wrong 2501580 50.0
  First kid   1250803 25.0
  Second kid  1250777 25.0

And it's still 50/50! Am I missing anything else?

Okay, re-read even more comments [10] and looking closer at the original
problem statment:

> Let's say, hypothetically speaking, you met someone who told you they had
> two children, and one of them is a girl. **What are the odds that person
> has a boy and a girl?**
>

“Coding Horror: The Problem of the Unfinished Game [11]”

Oh, there's an unstated assumption going on—namely, what gender the
hypothetically speaking parent will reveal! So far, I've had the
hypothetically speaking parent disclosing a randomly picked child (first or
second), which could be either a girl or a boy. Add some more lines to force
the child to be disclosed as a girl (if there is a girl) and …

Table: Disclosure table #2—disclosed a Girl
       Value   Percentage
------------------------------
Disclosed Girl  7500174 100.0
  First kid   4999692 66.7
  Second kid  2500482 33.3
Disclosed Girl, other girl      2501019 33.3
  First kid   2501019 33.3
  Second kid  0       0.0
Disclosed Girl, other boy       4999155 66.7
  First kid   2498673 33.3
  Second kid  0       0.0
Disclosed Girl, pick girl, correct      2501019 33.3
  First kid   2501019 33.3
  Second kid  0       0.0
Disclosed Girl, pick girl, wrong        4999155 66.7
  First kid   2498673 33.3
  Second kid  2500482 33.3
Disclosed Girl, pick boy, correct       4999155 66.7
  First kid   2498673 33.3
  Second kid  2500482 33.3
Disclosed Girl, pick boy, wrong 2501019 33.3
  First kid   2501019 33.3
  Second kid  0       0.0

**That's** what I'm looking for! That's the unstated assumption being made by
the 2/3 camp! And my original summation of the whole problem: “The odds are
1/2, except, of course, when it's 2/3,” is correct (so to speak).

Sheesh!

So, I suck at statistics, and statistical word problems are hard to write
properly.

And now I can put this problem to rest.

[1] http://www.imdb.com/name/nm0000354/
[2] http://www.imdb.com/title/tt0128442/
[3] gopher://gopher.conman.org/0Phlog:2009/01/05.1
[4] http://www.codinghorror.com/blog/archives/001203.html
[5] http://www.codinghorror.com/blog/archives/001204.html
[6] http://my.opera.com/Vorlath/blog/2009/01/04/sample-space
[7] http://my.opera.com/Vorlath/blog/
[8] gopher://gopher.conman.org/0Phlog:2009/01/09/kids.c
[9] http://www.codinghorror.com/blog/archives/001204.html
[10] http://www.codinghorror.com/blog/archives/001204.html
[11] http://www.codinghorror.com/blog/archives/001203.html

Email author at [email protected]