Given these imperfect timing tools, how do should you do benchmarking? The answer depends on whether you are trying to measure improvements in the performance of a single program on the same hardware, or if you are trying to compare the performance of different programs and/or different hardware.
For the first use (measuring the effect of program modifications with constant hardware), you should look at both system+user and real time to understand what effect the change had on CPU use, and on I/O (including paging.) If you are working on a CPU intensive program, the change in system+user time will give you a moderately reproducible measure of performance across a fairly wide range of system conditions. For a CPU intensive program, you can think of system+user as ``how long it would have taken to run if I had my own machine.'' So in the case of comparing CPU intensive programs, system+user time is relatively real, and reasonable to use.
For programs that spend a substantial amount of their time paging, you really can't predict elapsed time under a given operating condition without benchmarking in that condition. User or system+user time may be fairly reproducible, but it is also relatively meaningless, since in a paging or I/O intensive program, the program is spending its time waiting, not running, and system time and user time are both measures of run time. A change that reduces run time might increase real time by increasing paging.
Another common use for benchmarking is comparing the performance of the same program on different hardware. You want to know which machine to run your program on. For comparing different machines (operating systems, etc.), the only way to compare that makes sense is to set up the machines in exactly the way that they will normally be run, and then measure real time. If the program will normally be run along with X, then run X. If the program will normally be run on a dedicated workstation, then be sure nobody else is on the benchmarking machine. If the program will normally be run on a machine with three other Lisp jobs, then run three other Lisp jobs. If the program will normally be run on a machine with 8meg of memory, then run with 8meg. Here, ``normal'' means ``normal for that machine''. If you the choice of an unloaded RT or a heavily loaded PMAX, do your benchmarking on an unloaded RT and a heavily loaded PMAX.
If you have a program you believe to be CPU intensive, then you might be tempted to compare ``run'' times across systems, hoping to get a meaningful result even if the benchmarking isn't done under the expected running condition. Don't to this, for two reasons:
In the end, only real time means anything--it is the amount of time you have to wait for the result. The only valid uses for run time are: