NAME

NAME
Session::Token - Secure, efficient, simple random session token
generation

SYNOPSIS
Simple 128-bit session token
my $token = Session::Token->new->get;
## 74da9DABOqgoipxqQDdygw

Keep generator around
my $generator = Session::Token->new;

my $token = $generator->get;
## bu4EXqWt5nEeDjTAZcbTKY

my $token2 = $generator->get;
## 4Vez56Zc7el5Ggx4PoXCNL

Custom minimum entropy in bits
my $token = Session::Token->new(entropy => 256)->get;
## WdLiluxxZVkPUHsoqnfcQ1YpARuj9Z7or3COA4HNNAv

Custom alphabet and length
my $token = Session::Token->new(alphabet => 'ACGT', length => 100_000_000)->get;
## AGTACTTAGCAATCAGCTGGTTCATGGTTGCCCCCATAG...

DESCRIPTION
This module provides a secure, efficient, and simple interface for
creating session tokens, password reset codes, temporary passwords,
random identifiers, and anything else you can think of.

When a Session::Token object is created, 1024 bytes will be read from
"/dev/urandom" (Linux, Solaris, most BSDs), "/dev/arandom" (some older
BSDs), or with Crypt::Random::Source::Strong::Win32 (Windows). These
bytes will be used to seed the <ISAAC-32> pseudo random number
generator.

Once a generator is created, you can repeatedly call the "get" method on
the generator object and it will return new tokens.

IMPORTANT: If your application calls "fork", make sure that any
generators are re-created in one of the processes after the fork since
forking will duplicate the generator state and otherwise both parent and
child processes will go on to produce identical tokens (as with perl's
rand after seeding).

After the generator context is created, no system calls are used to
generate tokens. This is one way that Session::Token helps with
efficiency. However, this is only important for certain use cases
(generally not web sessions).

ISAAC is a cryptographically secure PRNG that improves on the well known
RC4 algorithm in some important areas. For instance, it doesn't have
short cycles like RC4 does. A theoretical shortest possible cycle in
ISAAC is "2**40", although no cycles this short have ever been found
(and probably don't exist at all). On average, ISAAC cycles are a
ridiculous "2**8295".

In a server application the most important reason you should use the
"keep generator around" mode instead of creating Session::Token objects
every time you need a token is that in this mode generating a new token
cannot fail due to a full descriptor table. Creating new generators for
every token can fail for this reason. Programs that re-use the generator
are also more efficient and are less likely to cause problems in
"chroot" environments.

Aside: Some crappy (usually C) programs that assume opening
"/dev/urandom" will always succeed can return session tokens based only
on the contents of nulled or uninitialised memory! Unix really ought to
provide a system call for random data.

CUSTOM ALPHABETS
Being able to choose exactly which characters appear in your token is
sometimes useful. This set of characters is called the *alphabet*. The
default alphabet size is 62 characters: uppercase latin letters,
lowercase latin letters, and digits ("a-zA-Z0-9").

For some purposes, base-62 is a sweet spot. It is much more compact than
hexadecimal encoding which helps with efficiency because session tokens
are usually transfered over the network many times during a session
(often uncompressed in HTTP headers).

Also, base-62 tokens don't use "wacky" characters like base-64 encodings
do. These characters sometimes cause encoding/escaping problems (ie when
embedded in URLs) and are annoying because often you can't select tokens
by double-clicking on them.

Although the default is base-62, there are all kinds of reasons you
might like to use another alphabet. One example is if your users are
reading tokens from a print-out or SMS or whatever, you may choose to
omit characters like "o", "O", and 0 that can easily be confused.

To set a custom alphabet, just pass in either a string or an array of
characters to the "alphabet" parameter of the constructor:

Session::Token->new(alphabet => '01')->get;
Session::Token->new(alphabet => ['0', '1'])->get; # same thing
Session::Token->new(alphabet => ['a'..'z'])->get; # character range

ENTROPY
There are two ways to specify the length of tokens. The first is
directly in terms of characters:

print Session::Token->new(length => 5)->get;
## -> wpLH4

The second way is to specify their minimum entropy in terms of bits:

print Session::Token->new(entropy => 24)->get;
## -> Fo5SX

In the above example, the resulting token is guaranteed to have at least
24 bits of entropy. Given the default base-62 alphabet, we can compute
the exact entropy of a 5 character token as follows:

$ perl -E 'say 5 * log(62)/log(2)'
29.7709815519344

So these tokens have about 29.8 bits of entropy. Note that if we removed
one character from this token, it would bring it below our desired 24
bits of entropy:

$ perl -E 'say 4 * log(62)/log(2)'
23.8167852415475

The default minimum entropy is 128 bits. Default tokens are 22
characters long and therefore have about 131 bits of entropy:

$ perl -E 'say 22 * log(62)/log(2)'
130.992318828511

An interesting observation is that in base-64 representation, 128-bit
minimum tokens also require 22 characters and that these tokens contain
only 1 more bit of entropy.

Another Session::Token design criterion is that all tokens should be the
same length. The default token length is 22 characters and the tokens
are always exactly 22 characters (no more, no less). This is nice
because it makes writing matching regular expressions easier, simplifies
storage (you never have to store length), and causes various log files
and things to line up neatly on your screen. Instead of tokens that are
exactly "N" characters, some libraries that use arbitrary precision
arithmetic end up creating tokens of *at most* "N" characters.

In summary, the default token length of exactly 22 characters is a
consequence of other decisions: base-62 representation, 128 bit minimum
token entropy, and consistent token length.

MOD BIAS
Some token generation libraries that implement custom alphabets generate
a random value, compute its modulus over the size of an alphabet, and
then use this modulus to index into the alphabet to determine an output
character.

Why is this bad? Consider the alphabet "abc". The ideal output
probability distribution for each character in the token is:

P(a) = 1/3
P(b) = 1/3
P(c) = 1/3

Assume we have a uniform random number source that generates values in
the set "[0,1,2,3]" (most PRNGs provide sequences of bits, in other
words power-of-2 size sets). If we use the naïve modulus algorithm
described above, 0 maps to "a", 1 maps to "b", 2 maps to "c", and 3
*also* maps to "a". Instead of the even distribution above, we have the
following biased distribution:

P(a) = 2/4 = 1/2
P(b) = 1/4
P(c) = 1/4

Bias like this is bad because some tokens are more likely than other
tokens which provides a starting point when token guessing. Tokens that
are unbiased are equally likely and therefore there is no starting
point.

Session::Token provides unbiased tokens regardless of the size of your
alphabet (though see the next section for a mis-use warning). It does
this in the same way that you might simulate producing unbiased random
numbers from 1 to 5 given an unbiased 6-sided die: Re-roll every time a
6 comes up.

In the above example, Session::Token eliminates bias by only using
values of 0, 1, and 2 (the "t/no-mod-bias.t" test contains some more
notes on this topic).

Of course throwing away a portion of random data is slightly
inefficient. In the worst case scenario of an alphabet with 129
characters, for each output byte this module consumes on average 1.9845
bytes from the random number generator. This inefficiency isn't a
problem because ISAAC is quite fast.

Note that mod bias can be made arbitrarily small by increasing the
amount of data consumed from the random number generator for each
character (providing that arbitrary precision modulus is available).
Because this module fundamentally avoids mod bias, it can use each of
the 4 bytes from an ISAAC-32 word for a separate character (excepting
"re-rolls").

INTRODUCING BIAS
If your alphabet contains the same character two or more times, this
character will be more biased than a character that only occurs once.
You should be careful that your alphabets don't overlap if you are
trying to create random session tokens.

However, if you wish to introduce bias this library doesn't try to stop
you. (Maybe it should issue a warning?)

Session::Token->new(alphabet => '0000001', length => 5000)->get; # don't do this
## -> 0000000000010000000110000000000000000000000100...

Due to a limitation discussed below, alphabets larger than 256 aren't
currently supported so your bias can't get very granular.

Aside: If you have a constant-biased output stream like the above
example produces then you can re-construct an un-biased bit sequence
with the von neumann algorithm. This works by comparing pairs of bits.
If the pair consists of identical bits, it is discarded. Otherwise the
order of the different bits is used to determine an output bit, ie 00
and 11 are discarded but 01 and 10 are mapped to output bits of 0 and 1
respectively. This only works if the bias in each bit is constant (like
all characters in a Session::Token are).

ALPHABET SIZE LIMITATION
Due to a limitation in this module's code, alphabets can't be larger
than 256 characters. Everywhere the above manual says "characters" it
actually means bytes. This isn't a Unicode limitation per se, just the
maximum size of the alphabet. Remember you can easily map bytes to
characters with tr.

use utf8;
$z = Session::Token->new(alphabet => '01', length => 10)->get;
$z =~ tr/01/-λ/;
## -> λλ--λλλλ-λ

Here's an interesting way to generate a uniform random integer between 0
to 999 inclusive:

0 + Session::Token->new(alphabet => ['0'..'9'], length => 3)->get

If you wanted to natively support high code points, there is no point in
hard-coding a limitation on the size of Unicode or some arbitrary
machine word. Instead, arbitrary precision "characters" should be
supported with bigint. Here's an example of something similar in lisp:
<isaac.lisp>.

This module is not however designed to be the ultimate random number
generator and at this time I think changing the design as described
above would interfere with its goal of being secure, efficient, and
simple.

SEEDING
This module is designed to always seed itself from your kernel's secure
random number source. You should never need to seed it yourself.

However if you know what you're doing, you can pass in a custom seed as
a 1024 byte long string. For example, here is how to create a "null
seeded" generator:

my $gen = Session::Token(seed => "\x00" x 1024);

This is done in the test-suite to compare against Jenkins' reference
ISAAC output, but obviously don't do this in regular applications
because the generated tokens will always be the same.

One valid reason for seeding is if you have some reason to believe that
there isn't enough entropy in your kernel's randomness pool and
therefore you don't trust "/dev/urandom". In this case you should
acquire your own seed data from somewhere trustworthy (maybe
"/dev/random" or a previously stored trusted seed).

BUGS
It might be a good idea if this library could detect forks and re-seed
in the child process.

There is currently no way to extract the seed from a Session::Token
object. Note when implementing this: The saved seed must either store
the current state of the ISAAC round as well as the 1024 byte "randsl"
array or else do some kind of minimum fast forwarding in order to
protect against a partially duplicated keystream bug.

SEE ALSO
<The Session::Token github repo>

There are lots of different modules for generating random data.

Like this module, perl's "rand()" function implements a user-space PRNG
seeded from "/dev/urandom". However, perl "rand()" is seeded with a mere
4 bytes from "/dev/urandom" and the perldoc doesn't seem to specify a
PRNG algorithm.

Data::Token is the first thing I saw when I looked around on CPAN. It
has an inflexible and unspecified (?) alphabet. It tries to get its
source of unpredictability from UUIDs and then hashes these UUIDs with
SHA1. I think this is bad design because some standard UUID formats
aren't designed to be unpredictable at all. Knowing a target's MAC
address and the rough time the token was issued may help you predict a
reduced area of token-space to concentrate guessing attacks upon. I
don't know if Data::Token uses these types of UUIDs or the potentially
secure "version 4" UUIDs, but because this wasn't addressed in the
documentation and because of an apparent misapplication of hash
functions (if you really had a good random UUID type, there would be no
need to hash), I don't feel good about using this module.

There are several decent random number generators like
Math::Random::Secure, Crypt::URandom &c, but they usually don't
implement alphabets and some of them require you open "/dev/urandom" for
every chunk of random bytes. Note that Math::Random::Secure does prevent
mod bias in its random integers and could be used to implement
alphabets.

String::Random is a cool module with a neat regexp-like language for
specifying random tokens which is more flexible than alphabets. However,
inspecting the code indicates that it uses perl's "rand()". Also, the
lack of performance, bias, and security discussion in the docs made me
decide to not use this otherwise very interesting module.

String::Urandom has alphabets, but it uses the flawed mod algorithm
described above and opens "/dev/urandom" on every token.

Data::Random is also a pretty nice looking library but it seems to use
"rand()" and the docs don't discuss security.

AUTHOR
Doug Hoyte, "<[email protected]>"

COPYRIGHT & LICENSE
Copyright 2012 Doug Hoyte.

This module is licensed under the same terms as perl itself.

ISAAC code:

By Bob Jenkins. My random number generator, ISAAC. Public Domain