NAME
   Algorithm::PageRank::XS - A Fast PageRank implementation

DESCRIPTION
   This module implements a simple PageRank algorithm in C. The goal is to
   quickly get a vector that is closed to the eigenvector of the stochastic
   matrix of a graph.

   Algorithm::PageRank does some pagerank calculations, but it's slow and
   memory intensive. This module was developed to compute pagerank on
   graphs with millions of arcs. This module will not, however, scale up to
   quadrillions of arcs (see TODO).

SYNOPSYS
       use Algorithm::PageRank::XS;

       my $pr = Algorithm::PageRank::XS->new();

       $pr->graph([
                 'John'  => 'Joey',
                 'John'  => 'James',
                 'Joey'  => 'John',
                 'James' => 'Joey',
                 ]
                 );

       $pr->results();
       # {
       #      'James' => '0.569840431213379',
       #      'Joey'  => '1',
       #      'John'  => '0.754877686500549'
       # }

       #
       #
       # The following simple program takes up arcs and prints the ranks.
       use Algorithm::PageRank::XS;

       my $pr = Algorithm::PageRank::XS->new();

       while (<>) {
           chomp;
           my ($from, to) = split(/\t/, $_);
           $pr->add_arc($from, $to);
       }

       while (my ($name, $rank) = each(%{$pr->results()})) {
           print("$name,$rank\n");
       }

METHODS
 new %PARAMS
   Create a new PageRank object. Possible parameters:

   alpha
       This is (1 - how much people can move from one node to another
       unconnected one randomly). Decreasing this number makes convergence
       more likely, but brings us further from the true eigenvector.

   max_tries
       The maximum number of tries until we give up trying to achieve
       convergence.

   convergence
       The maximum number the difference between two subsequent vectors
       must be before we say we are "convergent enough". The convergence
       rate is the rate at which "alpha^t" goes to 0. Thus, if you set
       "alpha" to 0.85, and "convergence" to 0.000001, then you will need
       85 tries.

 add_arc
   Add an arc to the pagerank object before running the computation. The
   actual values don't matter. So you can run:

       $pr->add_arc("Apple", "Orange");

   and you mean that "Apple" links to "Orange".

 graph
   Add a graph, which is just an array of from, to combinations. This is
   equivalent to calling "add_arc" a bunch of times, but may be more
   convenient.

 results
   Compute the pagerank vector, and return it as a hash.

   Whatever you called the nodes when specifying the arcs will be the keys
   of this hash, where the values will be the vector.

   The result vector is normalized such that the maximum value is 1. This
   is to prevent extremely small values for large data sets. You can
   normalize it any other way you like if you don't like this.

BUGS
   None known.

TODO
   * We may want to support "double" values rather than single floats
   * We may or may not want to adjust the weighting of individual arcs, as
   you cannot do now.
   * At present the indexes are "unsigned int", rather than "size_t". Thus
   this will not scale with 64-bit architectures.
   * It'd be nice to be able to use mmap(2) to efficiently use the hard
   drive to scale to places where memory can't take us.

PERFORMANCE
   This module is pretty fast. I ran this on a 1 million node set with 4.5
   million arcs in 57 seconds on my 32-bit 1.8GHz laptop. Let me know if
   you have any performance tips. It's orders of magnitude faster than
   Algorithm::PageRank, but performance tests will be here shortly.

SEE ALSO
   Algorithm::PageRank

AUTHOR
   Michael Axiak <[email protected]>

COPYRIGHT
   Copyright (C) 2008 by Michael Axiak <[email protected]>

   This package is free software; you can redistribute it and/or modify it
   under the same terms as Perl itself