* * * * *

                           Adventures in profiling

Last month, Mark [1] hired me on as a consultant to help him profile a
program he's been writing. And while I can't describe the program (or the
project the program is for), I can, however, describe the issues I've had
with profiling this program.

Normally to profile a program, you compile the program using special compiler
options (with gcc this is the -pg option) that instrument the program with
special code to record not only how many times each function is called, but
how much time is spent running each function. This information is saved when
the program terminates, and you run another special program to decode this
output into an easy to read format [2]. Then you use that information to
boost the performance of your program [3].

It's pretty straightforward.

This project, on the other hand, wasn't so straightforward. The primary issue
it's a multiprocess program; that is, it calls fork() to create child
processes that do the actual work. It's the child processes that need to be
profiled, but it's difficult to actually get the profile information from the
child processes due to the way the profiler works (and both the GNU profiler
(which I'm not allowed to use due to licensing fears) and the Sun Studio 12
profiler (which is the development platform for the project) work simularly,
so this issue affects both).

Problem one—the output. The program runs and when it exits, the accumulated
data is written to a file. This file is named mon.out (gmon.out if I were
using GNU). In this case, the main program starts, creates several child
processes. The ouput file is only generated when a process ends, and the only
time a process ends in this project is when you explicitely stop the main
program. This results in mon.out being overwritten by each child as they end,
then overwritten again when the main process ends. So all I end up with is
profile information for the main process, which tells me that the main
processing loop took 99% of the time with only 0.01 seconds of CPU time (in
other words—the parent process did nothing noteworthy). And there's no option
in either the compiler or at runtime, to change the output file.

Or is there?

The file mon.out is generated in the current working directory of the running
process. Change the working directory of the process, and the file ends up in
the new working directory. So I modify the program such that each child
process creates a new working directory based on the child's process ID and
try again.

I ended up with one mon.out file in the working directory of the main
process, and a bunch of empty directories. This leads to the second problem
with profiling—you only get the output when the process calls exit() (or
returns from main()), not _exit() (there is a difference between the two
[4]).

And replacing the calls to _exit() with exit() caused the program to hang
(Mark even had a comment in the code about the C runtime handling fork()
badly in the case of exit()).

So that pretty much killed using the compiler profiler options for this
project.

Or did it?

The code is on a Sun server [5], which means it comes with DTrace [6], which
is an incredible tracing facility [7] that can be used to profile an
application! Without compiling a special version of the program! Heck, you
can profile any running process at any time!

It's a neat facility.

Just by using some sample scripts from the DTraceToolkit [8] and a few
examples from the Solaris Dynamic Tracing Guide [9], I was able to provide
Mark with enough information to nearly double the performance of the program
(major culprit—a ton of pointless strcpy() calls in a third party library
being used, but that's about all I can say about that).

I was fortunate in that DTrace existed; between samping the program counter,
recording the number of standard library calls made, and selectively checking
the call stack to a few questionable calls (for instance, sprintf() calls
ferror() if you can believe it, and tracking down the few hundred thousand
calls to strcpy()) of selected child processes, I was able to profile this
multiprocess program (and each process is multithreaded—a fact I didn't
realize until later).

And if DTrace didn't exist? Well … there's the profiling equivalent to
printf() debugging [10] I could have tried …

[1] http://gladesoft.com/
[2] gopher://gopher.conman.org/0Phlog:2007/10/24.2
[3] gopher://gopher.conman.org/0Phlog:2003/01/12.1
[4] http://www.unixguide.net/unix/programming/1.1.3.shtml
[5] http://www.oracle.com/us/products/servers-storage/servers/
[6] http://www.sun.com/bigadmin/content/dtrace/index.jsp
[7] http://en.wikipedia.org/wiki/DTrace
[8] http://hub.opensolaris.org/bin/view/Community+Group+dtrace/dtracetoolkit
[9] http://docs.sun.com/app/docs/doc/817-6223
[10] http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024

Email author at [email protected]