Start by profiling big pieces of a program, then carefully choose which functions close to, but not in, the inner loop are to be profiled next. Avoid profiling functions that are called by other profiled functions, since this opens the possibility of profiling overhead being included in the reported times.
If the per-call time reported is less than 1/10 second, then consider the clock resolution and profiling overhead before you believe the time. It may be that you will need to run your program many times in order to average out to a higher resolution.