======================================================================
=                         Probability axioms                         =
======================================================================

                            Introduction
======================================================================
The Kolmogorov Axioms are the foundations of Probability Theory
introduced by Andrey Kolmogorov in 1933. These axioms remain central
and have direct contributions to mathematics, the physical sciences,
and real-world probability cases. It is noteworthy that an alternative
approach to formalising probability, favoured by some Bayesians, is
given by Cox's theorem.


                               Axioms
======================================================================
The assumptions as to setting up the axioms can be summarised as
follows: Let (Ω, 'F', 'P') be a measure space with 'P' being the
probability of some event 'E', denoted P(E)',' and P(\Omega) = 1. Then
(Ω, 'F', 'P') is a probability space, with sample space Ω, event space
'F' and probability measure 'P'.


{{Anchor|Non-negativity}}First axiom
======================================
The probability of an event is a non-negative real number:
:P(E)\in\mathbb{R}, P(E)\geq 0 \qquad \forall E \in F

where F is the event space. It follows that P(E) is always finite, in
contrast with more general measure theory.  Theories which assign
negative probability relax the first axiom.


{{Anchor|Unitarity|Normalization}}Second axiom
================================================
This is the assumption of unit measure: that the probability that at
least one of the elementary events in the entire sample space will
occur is 1

: P(\Omega) = 1.


{{Anchor|Sigma additivity|Finite additivity|Countable additivity|Finitely addit
ive}}Third axiom
================================================================================
=================
This is the assumption of �-additivity:
: Any countable sequence of disjoint sets (synonymous with 'mutually
exclusive' events) E_1, E_2, \ldots satisfies
::P\left(\bigcup_{i = 1}^\infty E_i\right) = \sum_{i=1}^\infty P(E_i).
Some authors consider merely finitely additive probability spaces, in
which case one just needs an algebra of sets, rather than a �-algebra.
Quasiprobability distributions in general relax the third axiom.


                            Consequences
======================================================================
From the Kolmogorov axioms, one can deduce other useful rules for
studying probabilities. The proofs of these rules are a very
insightful procedure that illustrates the power the third axiom, and
its interaction with the remaining two axioms. Four of the immediate
corollaries and their proofs are shown below:


Monotonicity
==============
:\quad\text{if}\quad A\subseteq B\quad\text{then}\quad P(A)\leq P(B).

If A is a subset of, or equal to B, then the probability of A is less
than, or equal to the probability of B.

==== 'Proof of monotonicity' ====
In order to verify the monotonicity property, we set E_1=A and
E_2=B\setminus A, where A\subseteq B and E_i=\varnothing for i\geq 3.
It is easy to see that the sets E_i are pairwise disjoint and E_1\cup
E_2\cup\cdots=B. Hence, we obtain from the third axiom that

:P(A)+P(B\setminus A)+\sum_{i=3}^\infty P(E_i)=P(B).

Since, by the first axiom, the left-hand side of this equation is a
series of non-negative numbers, and since it converges to P(B) which
is finite, we obtain both P(A)\leq P(B) and P(\varnothing)=0.


The probability of the empty set
==================================
: P(\varnothing)=0.

In some cases, \varnothing is not the only event with probability 0.


''Proof of probability of the empty set''
===========================================
As shown in the previous proof, P(\varnothing)=0. However, this
statement is seen by contradiction: if P(\varnothing)=a then the left
hand side [P(A)+P(B\setminus A)+\sum_{i=3}^\infty P(E_i)] is not less
than infinity; \sum_{i=3}^\infty P(E_i)=\sum_{i=3}^\infty
P(\varnothing)=\sum_{i=3}^\infty a = \begin{cases} 0 & \text{if }
a=0, \\ \infty & \text{if } a>0. \end{cases}

If a>0 then we obtain a contradiction, because the sum does not
exceed P(B) which is finite. Thus, a=0. We have shown as a byproduct
of the proof of monotonicity that P(\varnothing)=0.


The complement rule
=====================
P\left(A^{c}\right) = P(\Omega\setminus A) = 1 - P(A)


''Proof of the complement rule''
==================================
Given A and A^{c}are mutually exclusive and that A \cup A^c = \Omega
:

P(A \cup A^c)=P(A)+P(A^c)
'... (by axiom 3)'

and,
P(A \cup A^c)=P(\Omega)=1
.. '(by axiom 2)'

\Rightarrow P(A)+P(A^c)=1


\therefore    P(A^c)=1-P(A)


The numeric bound
===================
It immediately follows from the monotonicity property that
: 0\leq P(E)\leq 1\qquad \forall E\in F.


''Proof of the numeric bound''
================================
Given the complement rule P(E^c)=1-P(E)
and 'axiom 1' P(E^c)\geq0
:

1-P(E) \geq 0


\Rightarrow 1 \geq P(E)


\therefore 0\leq P(E)\leq 1


                        Further consequences
======================================================================
Another important property is:

: P(A \cup B) = P(A) + P(B) - P(A \cap B).

This is called the addition law of probability, or the sum rule.
That is, the probability that 'A' 'or' 'B' will happen is the sum of
the probabilities that 'A' will happen and that 'B' will happen, minus
the probability that both 'A' 'and' 'B' will happen. The proof of this
is as follows:

Firstly,

:P(A\cup B) = P(A) + P(B\setminus A)          ... '(by Axiom 3)'

So,

:P(A \cup B) = P(A) + P(B\setminus  (A \cap B)) (by B \setminus A =
B\setminus  (A \cap B)).

Also,

:P(B) = P(B\setminus (A \cap B)) + P(A \cap B)

and eliminating P(B\setminus (A \cap B)) from both equations gives us
the desired result.

An extension of the addition law to any number of sets is the
inclusion-exclusion principle.

Setting 'B' to the complement 'Ac' of 'A' in the addition law gives

: P\left(A^{c}\right) = P(\Omega\setminus A) = 1 - P(A)

That is, the probability that any event will 'not' happen (or the
event's complement) is 1 minus the probability that it will.


                     Simple example: coin toss
======================================================================
Consider a single coin-toss, and assume that the coin will either land
heads (H) or tails (T) (but not both).  No assumption is made as to
whether the coin is fair.

We may define:

: \Omega = \{H,T\}
: F = \{\varnothing, \{H\}, \{T\}, \{H,T\}\}

Kolmogorov's axioms imply that:

: P(\varnothing) = 0
The probability of 'neither' heads 'nor' tails, is 0.

: P(\{H,T\}^c) = 0
The probability of 'either' heads 'or' tails, is 1.

: P(\{H\}) + P(\{T\}) = 1
The sum of the probability of heads and the probability of tails, is
1.


                              See also
======================================================================
* Borel algebra
* �-algebra
* Set theory
* Conditional probability
* Quasiprobability
* Fully probabilistic design


                          Further reading
======================================================================
*
*
*[https://web.archive.org/web/20130923121802/http://mws.cs.ru.nl/mwiki/prob_1.ht
ml#M2
Formal definition] of probability in the Mizar system, and the
[http://mmlquery.mizar.org/cgi-bin/mmlquery/emacs_search?input=(symbol+Probabili
ty+%7C+notation+%7C+constructor+%7C+occur+%7C+th)+ordered+by+number+of+ref
list of theorems] formally proved about it.


License
=========
All content on Gopherpedia comes from Wikipedia, and is licensed under CC-BY-SA
License URL: http://creativecommons.org/licenses/by-sa/3.0/
Original Article: http://en.wikipedia.org/wiki/Probability_axioms