Thanks Chuck - I made a mistake and concentrated on the body of the message and missed the word "Profiling" in the title. I re-read the body of his message again and the rest of the thread before posting that, but I missed a big clue in the title.
It's okay--I just wanted reassurance that I wasn't hallucinating when I wrote my reply.
I wrote a profiler for finding "hot spots" in x86 code back in 1987 or so. I still have the code. It's a TSR that reacts wtih a utility that calls it to set sampling range, clear the table, print the results, etc. The TSR also has an API that can be called by a program to do any of the above, as well as turn sampling on and off.
The nice part is that you can take a slow program that you don't have the source code for and see where it's spending most of its time. This can be very useful if you're trying to speed someone else's code or even reverse-engineer it.
This was by no means the first sampling program I ever wrote. One of my jobs at several projects at Control Data was speeding up performance of operating-system related code (which back then, at least, was regarded as pure overhead). On the old line machines like the 6600, you could install your profiler in a PPU, which could read the CPU P-counter asynchronously with the CPU itself. On later systems, like the STAR, it was possible to instruct the hardware to perform periodic interrupts. I recall that, on lthe latter machine, the bulk of speedup was obtained by modifying the compiler used for OS code to special-case some loops and vectorize other ones. The yield was something like a 30% performance improvement without changing a line of OS source code.
That was a fluke--by far, the biggest improvements were realized by refining or reworking the algorithm.
So that's my story and I'm sticking to it.
(P.S. I still have my old code, but it's in an .ARC archive? Anyone remember that one and the grief they gave Phil Katz? Phil's been gone for some time now, but .ZIP files continue on...)