Profiling/timing routines from assembly.

datajerk · Dec 24, 2012

What are my options for timing code on an IBM 5150 or 5155 or 5160? I'd prefer sub-second resolution, but I can live with one second resolution. Thanks.

Chuck(G) · Dec 24, 2012

There are profiling programs out there (heck, I even wrote one) that simply use the timer interrupt to get you a sample every 55 milliseconds. The SIMTEL archives should certainly have something. Failing that, you can write one yourself--it's not rocket sicence; just hook interrupt 8 and look at where the interrupt return address points to every tick.

mbbrutman · Dec 24, 2012

The easiest way to measure time with reasonable resolution (55ms) is to snoop on the BIOS data area. At segment 0x40 and offset 0x6c there is a four byte counter that ticks every 55ms, which is 18.2 times per second. You can just read that as a uint32, or ignore the high bytes and read it as a uint16.

It can roll over at midnight, so don't do your measurements late at night. ;-0

If you really want to get fancy you can hook the timer interrupt and reprogram the timer hardware. In my mTCP Ping program I set the hardware to tick 64 times faster so that I can measure less at a resolution of around 1ms. To keep the rest of the machine from seeing the accelerated time I then only call the upstream interrupt handlers once every 64 hardware interrupts. The result is that I get the timer resolution that I need while the rest of the system sees the correct number of BIOS clock ticks. There is some extra overhead on the CPU but in my case if all I am doing is waiting for the other computer to respond over the network that is not an issue.

Mike

reenigne · Dec 24, 2012

For some applications you don't even need to change the timer 0 count - you can get a resolution of 1193181.818181...MHz by using mode 2 for timer 0 and reading the current value out of the timer chip. It's possible to combine this value with the BIOS tick counter to have a timer with very high resolution and very high range, but you lose some accuracy to the timer interrupt overhead. It's also fiddly to combine the two values since there's no way to freeze both the timer count and the BIOS value at exactly the same moment. So you have to do something like this:

1) Turn off interrupts.
2) Read the BIOS count.
3) Read the timer value.
4) Turn interrupts back on.
5) Read the BIOS count again.
6) See if BIOS count changed between step 2 and 5. If it didn't, use that BIOS count. Otherwise, see if the timer 0 value you read is just above 0 (i.e. was just about to roll over when you read it) - if so, use the old BIOS count, otherwise use the new BIOS count.

There's a generalization of this which works if you change the timer 0 count as well, but it involves multiplication if the count is not a power of 2.

datajerk · Dec 24, 2012

mbbrutman said:
The easiest way to measure time with reasonable resolution (55ms) is to snoop on the BIOS data area. At segment 0x40 and offset 0x6c there is a four byte counter that ticks every 55ms, which is 18.2 times per second.

Easy is good. I think I'll try this first. Thanks.

Chuck(G) · Dec 25, 2012

I'm confused a bit. When I think about profiling, I take it as examing a range of code and identifying places where the P counter spends time; i.e. finding the "hot spots". Strictly a sampling operation. Can be a very quick-and-dirty way of figuring out where you should be spending your effort.

Do you mean something different?

mbbrutman · Dec 25, 2012

The OP doesn't say anything about profiling - you introduced that concept in your first reply. The OP asks for how to time something with subsecond resolution, or 1 second resolution at worst.

Perhaps he could benefit from profiling. But I suspect that measuring time is a good first step.

Chuck(G) · Dec 25, 2012

mbbrutman said:
The OP doesn't say anything about profiling - you introduced that concept in your first reply. The OP asks for how to time something with subsecond resolution, or 1 second resolution at worst.

Title of this thread: Profiling/timing routines from assembly

Had he said only "timing", I would have ignored the thread. Merry Christmas, Mike!

mbbrutman · Dec 25, 2012

Chuck(G) said:
Title of this thread: Profiling/timing routines from assembly

Had he said only "timing", I would have ignored the thread. Merry Christmas, Mike!

Thanks Chuck - I made a mistake and concentrated on the body of the message and missed the word "Profiling" in the title. I re-read the body of his message again and the rest of the thread before posting that, but I missed a big clue in the title.

Your time would have been better spent exploring the concept of profiling rather than highlighting my mistake in red and super sizing the font. But you made your point.

I read nearly every thread and sometimes the words blur and I miss a word. Thanks for bearing with me. I am glad that you are not only on top of things, but being really subtle and charitable about pointing out mistakes.

datajerk · Dec 25, 2012

Chuck(G) said:
I'm confused a bit. When I think about profiling, I take it as examing a range of code and identifying places where the P counter spends time; i.e. finding the "hot spots". Strictly a sampling operation. Can be a very quick-and-dirty way of figuring out where you should be spending your effort.

Do you mean something different?

For my current need I will be creating a few synthetic benchmarks and I want to measure their performance. The code represents a single routine. If there is a readily available general purpose profiler that can report the number of times a routine is called and the time spent, then that would be splendid. I'm used to having such tools in UNIX. I am new to DOS and 8088 programming and I am trying to figure out things like how I'd measure performance. For other 8-bit system I either instrument my own code and/or hack together a simulator that can report calls and time, e.g.:

Code:

                 calls           ticks           time(s)
counter 01:          1       116247092        113.913684
counter 19:          2            3758          0.007365
counter 20:        931            6121          5.584267
counter 21:          4            7070          0.027712
counter 22:        463            8480          3.847566
counter 23:        465            8480          3.864156
counter 24:       1396           41292         56.486880
counter 25:        675           55702         36.844430
counter 26:          2        54442123        106.698631
counter 27:          1         7326033          7.178979
counter 60:     324850             139         44.530686

Chuck(G) · Dec 25, 2012

mbbrutman said:
Thanks Chuck - I made a mistake and concentrated on the body of the message and missed the word "Profiling" in the title. I re-read the body of his message again and the rest of the thread before posting that, but I missed a big clue in the title.

It's okay--I just wanted reassurance that I wasn't hallucinating when I wrote my reply.

I wrote a profiler for finding "hot spots" in x86 code back in 1987 or so. I still have the code. It's a TSR that reacts wtih a utility that calls it to set sampling range, clear the table, print the results, etc. The TSR also has an API that can be called by a program to do any of the above, as well as turn sampling on and off.

The nice part is that you can take a slow program that you don't have the source code for and see where it's spending most of its time. This can be very useful if you're trying to speed someone else's code or even reverse-engineer it.

This was by no means the first sampling program I ever wrote. One of my jobs at several projects at Control Data was speeding up performance of operating-system related code (which back then, at least, was regarded as pure overhead). On the old line machines like the 6600, you could install your profiler in a PPU, which could read the CPU P-counter asynchronously with the CPU itself. On later systems, like the STAR, it was possible to instruct the hardware to perform periodic interrupts. I recall that, on lthe latter machine, the bulk of speedup was obtained by modifying the compiler used for OS code to special-case some loops and vectorize other ones. The yield was something like a 30% performance improvement without changing a line of OS source code.

That was a fluke--by far, the biggest improvements were realized by refining or reworking the algorithm.

So that's my story and I'm sticking to it.

(P.S. I still have my old code, but it's in an .ARC archive? Anyone remember that one and the grief they gave Phil Katz? Phil's been gone for some time now, but .ZIP files continue on...)

MikeS · Dec 25, 2012

mbbrutman said:
Thanks Chuck...

Your time would have been better spent exploring the concept of profiling rather than highlighting my mistake in red and super sizing the font. But you made your point.

Oh, you guys

Speaking from experience that's what you get (and deserve) when you argue with Chuck without checking your facts ;-)

Merry Christmas, guys!!!

deathshadow · Dec 25, 2012

You might want to read up on Micheal Abrash's Zen timer -- gives you much more resolution than you're asking for, but it's really good for code profiling.

In general his whole 'black book' is a important read if you're serious about writing x86 code. Free copy online here:

http://www.phatcode.net/res/224/files/html/index.html

datajerk · Dec 27, 2012

deathshadow said:
You might want to read up on Micheal Abrash's Zen timer -- gives you much more resolution than you're asking for, but it's really good for code profiling.

In general his whole 'black book' is a important read if you're serious about writing x86 code. Free copy online here:

http://www.phatcode.net/res/224/files/html/index.html

Excellent. Thanks.

Trixter · Dec 28, 2012

Chuck(G) said:
The nice part is that you can take a slow program that you don't have the source code for and see where it's spending most of its time. This can be very useful if you're trying to speed someone else's code or even reverse-engineer it.

I'm having trouble visualizing how this would work. Does the TSR insert random int 3 breakpoints in the code periodically or something? Or does it run the program with TF set and snag stats?

Krille · Dec 28, 2012

Trixter said:
I'm having trouble visualizing how this would work. Does the TSR insert random int 3 breakpoints in the code periodically or something? Or does it run the program with TF set and snag stats?

As Chuck would/did put it:
just hook interrupt 8 and look at where the interrupt return address points to every tick

:D (no offence intended)

Jokes aside, this sounds like a very useful utility. Would you mind sharing it, Chuck?

mbbrutman · Dec 28, 2012

Trixter said:
I'm having trouble visualizing how this would work. Does the TSR insert random int 3 breakpoints in the code periodically or something? Or does it run the program with TF set and snag stats?

The normal profiling technique is to periodically interrupt the processor and just record the program counter at the time of the interrupt. If you do that long enough you will collect enough samples to be able to build a histogram of what code was running. It is a great way to spot the CPU intensive parts of the code but it assumes the workload is relatively constant.

The trick is to map the collected program counters back to the ASM code that was running. You can probably poke around a little bit in the DOS data structures (PSP) to get the current executable and code segment that it is running from. You don't want to do too much under the interrupt for obvious reasons. Profilers often collect a few registers and some parts of data structures to make the mapping process easier. (The mapping is done after the profiling data is collected.)

The big problem with profiling is that it is great at telling you know what the CPU is doing but it doesn't help you when the CPU is not doing anything. For example, if you have an idle loop and you are blocked waiting for something to happen that idle loop is going to look really hot. (You need to exclude idle loops from your analysis.) When dealing with I/O or multi-threaded programs you often need to examine why the CPU is being blocked, which is often due to I/O or lock contention. That requires a different toolset.

I'm sure some pedant will come along and correct the minutia of this post. But you get the general idea.

Mike

Trixter · Dec 28, 2012

Thanks Krille/Chuck and Mike. However, I disagree that hooking Int 08 and recording the return address is an accurate way to profile. If my code spends 99% of its time in a certain routine and only 1% doing something else, but that 1% execution just so happens to fall in the sampling period all the time, then the stats would show that the 1% routine is where my program is spending all of its time. Smaller sample periods mitigate the problem but they don't eliminate it. That's why I had trouble understanding how such a system could work with any unknown program (answer: it can't, really).

I suppose you could accurately profile any unknown program by running the system with the trap flag set (interrupts on every instruction) but of course this would slow execution down to unusable levels.

Programs like Turbo Profiler insert breakpoints into the code where you specify so that you can get precise measurements of how long things take, but you need program/source visibility to do that.

For adding the "CPU Utilization" stats in the newest version of 8088 Corruption, at program start I idle loop for a single tick and record the number of iterations as the 100% value of "idle". Then, in my main code, I increment a counter in places where idling occurs, then calc the percentage from those values.

mbbrutman · Dec 28, 2012

There is always a balance between how often to sample (accuracy) vs. the load imposed by the sampling. If you sample at too coarse a granularity then you have to take many many more samples to catch those small "in-between" events. If you sample too quickly you run out of space to store the samples and you impose significant overhead on the system which can alter what you are trying to observe. At some point you cross from sampling to tracing if you take it too far.

Depending on how the sampling is implemented you can get some data that doesn't make sense. For example, if you run with interrupts disabled as soon as you re-enable them you will see a lot of samples collected there. That is a skew you need to be aware of. Some architectures also skew the results a bit, so you will get close but not perfect samples. But your example above is far fetched; I can't imagine how that could happen unless you really got very unlucky.

To conserve space you can "bucketize" the samples to get you within a few instructions of the original PC, which is good enough for more purposes.

Profiling is just a tool in the bucket. A very good one though. I made a lot of money for IBM analyzing customer code, both finding where the CPU intensive routines (and their callers) were and by detecting when locking contention was causing a lack of CPU usage. (That shows up too, but in more subtle ways.)

Chuck(G) · Dec 28, 2012

A false profile would imply some sort of synchronicity with the timer tick interrupt. While it's not impossible for this to happen, it's extremely rare. One undesired side effect of trapping every instruction is that it purges the instruction buffering.

Attached is some 1988 code. I've probably updated this iat some point, but haven't bothered to look for later code. It's an .ARC file inside of a .ZIP file (Isn't it curious that a vintage computer forum can't accept .ARC attachments? Thom should be really upset about that.)

If you stay in text mode, another coarse measure si very simple: take the interrup 8 return address, convert it to hex and stuff it into a corner of the display buffer. The 8088's slow enough that you can often visually see "hot spots" (I've got a 1980s program to do that too).

Profiling/timing routines from assembly.

Experienced Member

25k Member

Associate Cat Herder

Veteran Member

Experienced Member

25k Member

Associate Cat Herder

25k Member

Associate Cat Herder

Experienced Member

25k Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Associate Cat Herder

Veteran Member

Associate Cat Herder

25k Member

Attachments