NAME

NAME
Win32-ProcFarm - system for parallelization of code under Win32

OVERVIEW
What is Win32::ProcFarm?

"Win32::ProcFarm" is the code I wrote to speed up tasks that are limited
by network latency, but not by network bandwidth or local computer
power. For instance, say you want to ping every address on a subnet. The
simple approach (excluding pinging the broadcast address) is to
sequentially ping every address on the subnet. If only 30% of the
addresses are in use and you wait 1 second before deciding an address is
not in use, it will take roughly 3 minutes to ping a class C subnet. The
limitation here is obviously not the local CPU or even network
bandwidth, but rather latency. One solution would be to break up the
task. Unfortunately, the thread support in Perl doesn't work with
ActivePerl, and in any event the support is currently experimental.
Another approach would be to spin off 10 processes, have each take 25
addresses, and funnel the information back into a single process for
reporting.

This is the approach "Win32::ProcFarm" takes, but it is somewhat more
sophisticated. A "pool" of processes is created that communicate with
the parent process using TCP sockets. The parent process communicates
with the child processes using a "RPC" style library to assign tasks to
the child processes and to retrieve the return data from those tasks.

Each child process is comprised of a library file that includes the
communications routines, as well as whatever subroutines pertain to the
problem at hand. The parent process spins off the child process, which
then connects back to the parent process through a TCP port. The parent
process uses "Data::Dumper" to package up the desired subroutine name
along with any associated parameters and ships it off to the child
process. The child process then executes that subroutine and uses
"Data::Dumper" to package up the return values and send them back to the
parent. What makes the library useful is that the child process can
operate asynchronously from the parent; the parent simply calls
"execute" to instruct the child process to execute a subroutine. The
parent process can then periodically call "get_state", which will return
"wait" while the child process is still executing the subroutine. When
the child process finishes and ships the return values back up the
socket, the "get_state" method call on the parent object will return the
"fin" state. The parent then calls "get_retval" to obtain the returned
values, and the child process can then be used to execute another task.

The pool system is based upon this simplistic "RPC" system. To use the
"Win32::ProcFarm::Pool" object, one simply creates a new pool, passing
it the number of child processes to start as well as the name of the
child process and a few other parameters. Once the pool has been
created, one adds jobs to the waiting pool. This might be a list of IP
addresses to ping, for instance. Then one tells the
"Win32::ProcFarm::Pool" object to execute all the jobs. The pool assigns
a job to each of the child processes until all the child processes are
busy. It then checks the child processes periodically to see if they
have finished with the task. If they have, it places the return values
into a hash, identified by an ID passed when the job was created, and
sends the child process another job. When all the jobs have finished,
one simply requests the hash of return values and proceeds on.

Process Farm Advantages

Speed
By farming the work out over a large number of processes (I
typically use from 5 to 30), large speedup factors can be achieved
fairly easily.

Reuse
The process farm system is designed to be fairly easy to use. Simply
write the function of use, include it in a child process, and add
roughly 10 lines of boilerplate code to the parent.

Efficiency in face of variable length jobs
Because jobs are assigned one-by-one to the child processes as they
come free, jobs are allocated as efficiently as possible given the
constraint that the job execution time cannot be predicted.

Low probability of child process orphaning
Because the code to kill the child processes when everything is over
is implemented in the "DESTROY" for the parent, orphaning is a rare
event.

Process Farm Limitations

The Process Farm code is very useful in certain situations, but it has a
number of limitations that should be kept in mind.

Child Process Startup Time
On a dual Pent-Pro/200 with 128MB of RAM, child process startup time
is roughly 1/3rd of a second. This means spinning off 30 child
processes takes 10 seconds. The code already uses asynchronous
startup, and I believe the major limitation remaining is the time
necessary to start up a Perl process and create the TCP socket.

Child Process Memory Utilization
By keeping an eye on total memory utilization, it appears that each
bare child process uses roughly 2.3MB of memory. A child process
that also uses "Net::Ping" to implement a ping function uses roughly
2.6MB of memory. If you spin off 30 of these processes, that's 75MB
of RAM. If you start swapping, the thrash of 30 processes running
simultaneously is going to kill any speed benefit, so keep memory
utilization in mind when selecting the number of child processes to
use.

Real World Results

Despite the limitations, I have found the Process Farm system to be very
useful. In the previous example of pinging a range of IP addresses, with
roughly 10% coverage on a Class C, and 31 child processes, total ping
time runs roughly 21 seconds, a speed up of a factor of 10 on a problem
that otherwise takes an obnoxious amount of time.

Further Information

Please see the "tutorial" in "Docs/tutorial.pod" for more information,
as well as the POD contained within the actual Perl modules.