• Please review our updated Terms and Rules here

Please help with real iron, I need the 486 and first Pentium systems

You know what I mean :) Here's some code from the source that could be optimized:

Code:
     mov bx,ax   ;T2/2/2/2
     mov ax,dx   ;T2/2/2/2
     xor dx,dx   ;T3/3/2/2

Could be:

Code:
     xchg bx,ax ; 1-byte opcode
     xchg dx,ax ; 1-byte opcode
     xor dx,dx

Considering this is part of a macro used in an inner loop, it should be a measurable speedup. (Also, these t-state numbers don't take instruction size into account.)
 
Code:
         xor ax,ax
         sub ax,ra
         mov bx,7
         xor dx,dx
         div bx
         and al,0fch
         mov [maxnum],ax
I don't know what this code does but it's all constants and so should be calculated at assembly time, not during runtime.
Thank you. It is quite a good idea, it can make code about 12 bytes less but it doesn't affect the speed. If we want to affect speed we have to optimize code only between .l4 and JNE .l4, because it is a main loop.

Even if not reducible to a constant, it's pretty ugly.
It is reducible but do you know a more beatiful assembly for the expression ((65536-ra)/7)and 0FFFCh? ;)

You know what I mean :) Here's some code from the source that could be optimized:

Code:
     mov bx,ax   ;T2/2/2/2
     mov ax,dx   ;T2/2/2/2
     xor dx,dx   ;T3/3/2/2

Could be:

Code:
     xchg bx,ax ; 1-byte opcode
     xchg dx,ax ; 1-byte opcode
     xor dx,dx

Considering this is part of a macro used in an inner loop, it should be a measurable speedup. (Also, these t-state numbers don't take instruction size into account.)

Thank you very much. I like XCHG, it makes Intel's ISA special and more powerful. However in this case it is rather not right because it takes 4 cycles of the 8086/88 or 3 cycles of the 80286/386/486 when MOV takes only 2. So we can reduce the size of the code by two bytes but this worsen its speed by 2 or 4 (for the 8086) cycles. Maybe with the Pentium XCHG is faster? IMHO XCHG and MOV should take 1 cycle for the Pentium.

I still hope to get some info about 80486 and Pentium to my tables. Thanks.
 
Thank you very much. I like XCHG, it makes Intel's ISA special and more powerful. However in this case it is rather not right because it takes 4 cycles of the 8086/88 or 3 cycles of the 80286/386/486 when MOV takes only 2. So we can reduce the size of the code by two bytes but this worsen its speed by 2 or 4 (for the 8086) cycles.

On the 8088, the smallest code is fastest, because it takes 4 cycles to read a byte (even instruction opcodes). So the xchg variant would indeed be faster.
 
On the 8088, the smallest code is fastest, because it takes 4 cycles to read a byte (even instruction opcodes). So the xchg variant would indeed be faster.

Yea, this 8086/88's prefetch queue makes timing calculation a real headache. So it's indeed interesting to get accurate timings for the both cases... However, anyway, if you change these two MOVs to two XCHG's it gives no noticeable effect even with the 8088/86 because the jump to .div32 is taken almost every time.
 
Got a Packard Bell Pack-Mate 28 Plus with an Evergreen 586 DX5-133 with 16KB L1 Write-Back cache processor installed. I'll check and see what'll happen over the weekend once I clear some space out in my room to hook up the system temporarily.
 
It's not that easy. ;) IMHO it will be very difficult to find a way even to get 1-2 cycles less.

Did some tests last night and could not speed up the program on a 4.77 MHz 8088. That's not to say it can't be sped up -- I believe it can, just not in the inner loop as written. I'd need to rewrite the entire implementation to speed it up and I have more pressing things to take care of.

One dumb way to speed it up is to switch from int 21h,02h for printing characters to int 10h,09h (because all int 21h,02h does is add overhead to calling int 10h,09h). But you have "printing done through the operating system" as a requirement.

One possible way to make it slightly smaller is to switch from parsing individual keystrokes to using int 21h,0A.
 
I am still waiting for help with real iron... It is good for hardware to work sometimes, some work protects iron from rust and dust.

Did some tests last night and could not speed up the program on a 4.77 MHz 8088. That's not to say it can't be sped up -- I believe it can, just not in the inner loop as written. I'd need to rewrite the entire implementation to speed it up and I have more pressing things to take care of.

One dumb way to speed it up is to switch from int 21h,02h for printing characters to int 10h,09h (because all int 21h,02h does is add overhead to calling int 10h,09h). But you have "printing done through the operating system" as a requirement.

One possible way to make it slightly smaller is to switch from parsing individual keystrokes to using int 21h,0A.

Thanks for your efforts but I can repeat it is very difficult to optimize the code because it is already almost optimal. ;)
 
I am still waiting for help with real iron...

You're in luck, I just tested 3 of my computers, a PS/2 model 85 with a 33mhx 486sx, a PS/2e with a 486sx but I'm not sure what speed (I'm thinking 25mhz), and an Eduquest model 40 with a 25mhz sx I believe, but I haven't used these machines all that much or had them apart so I'm not sure on the exact specs, but I can check. Here are the results:
100 digits- Model 85: 0.00, PS/2e: 0.00, Eduquest: 0.00
1000 digits- Model 85: 1.15, PS/2e: 1.70, Eduquest: 0.88
3000 digits- Model 85: 10.43, PS/2e: 14.11, Eduquest: 7.63

If you want any more computers tested I have a few XT machines, Pentium MMX's, and a couple 286's
 
You're in luck, I just tested 3 of my computers, a PS/2 model 85 with a 33mhx 486sx, a PS/2e with a 486sx but I'm not sure what speed (I'm thinking 25mhz), and an Eduquest model 40 with a 25mhz sx I believe, but I haven't used these machines all that much or had them apart so I'm not sure on the exact specs, but I can check. Here are the results:
100 digits- Model 85: 0.00, PS/2e: 0.00, Eduquest: 0.00
1000 digits- Model 85: 1.15, PS/2e: 1.70, Eduquest: 0.88
3000 digits- Model 85: 10.43, PS/2e: 14.11, Eduquest: 7.63

If you want any more computers tested I have a few XT machines, Pentium MMX's, and a couple 286's

Thank you very much. However it is difficult for me to use your results without exact knowledge of the CPU frequency. I can't understand how PS/2 model 85 with 486SX @33MHz can be slower than EduQuest model 40 with 486SX @25MHz. Moreover http://john.ccac.rwth-aachen.de:8000/alf/ps2_85/ claims that this PC uses 486SX @66MHz. There is also IBM Personal System/2 Server 85 with 486SX @33MHz. It looks very confusing for me. :( http://www.walshcomptech.com/ps2/eqm40.htm is about EduQuest 40.
Anyway I have updated my tables. Some more results will be useful.
 
Thank you very much. However it is difficult for me to use your results without exact knowledge of the CPU frequency. I can't understand how PS/2 model 85 with 486SX @33MHz can be slower than EduQuest model 40 with 486SX @25MHz.

There's more to code speed than CPU clock, even with the same processor installed. Memory speed, wait states, interrupt load, etc can all impact the speed of any given program's run. You'd need to average several runs to even out some of it.

EDIT: Also, the PS/2 Model 85 has several possibilities for the processor complex, which would also impact performance, even with the same CPU.
 
Last edited:
Thank you very much. However it is difficult for me to use your results without exact knowledge of the CPU frequency. I can't understand how PS/2 model 85 with 486SX @33MHz can be slower than EduQuest model 40 with 486SX @25MHz. Moreover http://john.ccac.rwth-aachen.de:8000/alf/ps2_85/ claims that this PC uses 486SX @66MHz. There is also IBM Personal System/2 Server 85 with 486SX @33MHz. It looks very confusing for me. :( http://www.walshcomptech.com/ps2/eqm40.htm is about EduQuest 40.
Anyway I have updated my tables. Some more results will be useful.

Is there some piece of software to run that can tell me what hardware I have? I'm not really sure what's inside and they're kind of a pain to take apart, at least the 85 is. Maybe something that could shed some light on how a 25mhz 486 is faster than a 33
 
... Maybe something that could shed some light on how a 25mhz 486 is faster than a 33

My first guess would be the amount of cache. A 486 at 25 with any cache will be faster than a 486 at 33 without cache. Cross-referencing the IBM FRU off of the processor complex would give you much of that info.
 
Is there a program that can tell me what it is? The model 85 especially is pretty hard to take apart and it feels like I'm going to break it every time I do
 
Is there a program that can tell me what it is? The model 85 especially is pretty hard to take apart and it feels like I'm going to break it every time I do

IRC there were a lot of programs to check hardware like Norton System Info (SI).

My first guess would be the amount of cache. A 486 at 25 with any cache will be faster than a 486 at 33 without cache.

IMHO it depends on a task. The pi calculator uses access to the memory rarely so the cache size is not important for it.
 
Back
Top