Innovations in Integer to String (itoa) Methods for 8088?

neilobremski · Jul 19, 2017

This afternoon I've been playing around with writing an itoa() equivalent in x86 assembler. The signature of this method is:

void itoa(int input, char *buffer, int radix)

The general idea is to divide by the radix, write the remainder as the next digit (in a lookup table of [hexa] decimal characters), and then repeat the process on the quotient until it is zero. Then afterwards one merely reverses the string so that the digit values go highest to lowest rather than lowest to highest.

I couldn't find any mention of how this could be improved either by some fancy math or clever use of instructions. My first thought was that there must be some way to start with the highest digit and move to the lowest without the caller specifying the highest base. What I like about the itoa approach is that it doesn't use a fixed number of digits or space; it's very space efficient.

However, the disassembly I've seen from MSC 5.1's version of itoa does access memory 3x per digit: writing the digit, reading it to swap it, and writing it to swap it. Here's the assembler it uses to reverse the string.

Code:

DEC	DI	; 670
LODSB		; 671
XCHG	AL, [DI]; 672
MOV [SI-01], AL	; 674
LEA AX, [SI+01]	; 677
CMP	AX, DI	; 67A
JB	0670	; 67C

I thought the use of LEA was a bit surprising here but there must have been good reason for it?

I don't have any specific question here, just a broad cast: does anyone have any enlightening innovations to share about this concept/method? I'd love to read them

dreNorteR · Jul 20, 2017

Well for one thing the LEA shouldn't be necessary.

If the string has an odd length, SI and DI will be equal in the middle:

Code:

12345  52341 54321
S   D   S D    *

and if the length is even, DI will become less than SI after swapping the middle two digits.

Building the string backwards in the first place would be better - pass in the end of the buffer and have the function return a pointer to the first digit. If you want right-aligned output, initialize the buffer to blanks beforehand and print it completely. You could also write a combined itoa+print routine that writes the string into a temporary buffer on the stack.

carangil · Aug 16, 2017

You can multiply the base by itself counting how many times it gets to become greater than the input number.

ignoring negatives for now, here is C pseudocode to build the number right to left, with no reversing.

Code:

int atou(unsigned int num, char* buffer, unsigned int radix){
    unsigned int m = radix;

    while ( num >= m ) {
      m=m*radix;  //if overflow, need to break.  can't do that in C :-)
      buffer++;
   }

   //buffer now points to last digit
   *(buffer+1) = '\0'; //null terminate
   *buffer= '0';  //zero case
   while(num){
        *buffer = num % radix +'0';
        num = num / radix; 
        buffer--;
   }


}

Also note, this is for decimal and lower, and I didn't bother with the 'A' thru 'F' condition of hex.

retro · Nov 9, 2017

I agree it's better to avoid reversing the string.
Also for efficiency, code separate functions for each radix - I've never needed any other ASCII representation besides decimal or hex.

Here's the (decimal) library function I wrote for PL/M-86

Code:

;***************************************************************************
;*                                                                         *
;*   Int2Dec: PROCEDURE (binary, decimal) ADDRESS PUBLIC;                  *
;*               DECLARE binary  INTEGER,                                  *
;*                       decimal ADDRESS;                                  *
;*            END Int2Dec;                                                 *
;*                                                                         *
;*   Convert INTEGER to ASCII string representing its decimal value.       *
;*   'decimal' ADDRESS should point to the least significant digit.        *
;*   It's up to the caller to ensure that the string is long enough!       *
;*                                                                         *
;*   Returns the address of the leftmost digit, or sign if present.        *
;*                                                                         *
;***************************************************************************


              NAME    Int2Dec


CGROUP        GROUP   CODE
DGROUP        GROUP   CONST,DATA,STACK,MEMORY


CONST         SEGMENT PUBLIC 'CONST'
CONST         ENDS


DATA          SEGMENT PUBLIC 'DATA'
DATA          ENDS


STACK         SEGMENT STACK 'STACK'
STACK         ENDS


MEMORY        SEGMENT MEMORY 'MEMORY'
MEMORY        ENDS




CODE          SEGMENT PUBLIC 'CODE'
              ASSUME  CS:CGROUP
              ASSUME  DS:DGROUP


              PUBLIC  Int2Dec
Int2Dec:
              POP     BP                         ; return address
              POP     DI                         ; decimal
              POP     AX                         ; binary


              MOV     BL,'0'                     ; ASCII conversion base
              SUB     CX,CX                      ; Zero length
              MOV     DX,CX                      ; Zero dividend MSW
              STD                                ; Store chars right-to-left
              MOV     SI,10                      ; Divisor


              TEST    AX,AX                      ; Test sign bit
              JS      Negative
Positive:
              MOV     BH,' '                     ; Prefix with ' '
              JMP     DoDigit
Negative:
              MOV     BH,'-'                     ; Prefix with '-'
              NEG     AX                         ; Make binary unsigned
DoDigit:
              DIV     SI                         ; Divide binary by 10
              XCHG    DX,AX                      ; Swap remainder & quotient
              ADD     AL,BL                      ; Convert remainder to ASCII
              STOSB                              ; Store digit in string
              INC     CX                         ; Bump length counter
              OR      DX,DX                      ; Anything left over?
              JZ      Done                       ; No, all digits done


              MOV     AX,DX                      ; Yes, get new dividend
              SUB     DX,DX                      ; Zero MSW
              JMP     DoDigit                    ; Continue dividing
Done:
              MOV     [DI],BH                    ; Insert prefix char
              MOV     AX,DI                      ; Return start address
              JMP     BP


CODE          ENDS
              END

retro · Nov 9, 2017

Can be smaller/faster if just dealing with unsigned byte values (0..255)

Code:

Byte2Dec:
              POP     BP                         ; return address
              POP     DI                         ; decimal (string address)
              POP     AX                         ; binary


              SUB     CX,CX                      ; Zero length counter
              STD                                ; Store chars right-to-left
DoDigit:
              AAM                                ; Divide by 10
              ADD     AL,'0'                     ; Convert remainder to ASCII
              STOSB                              ; Store digit in string
              INC     CX                         ; Bump length
              MOV     AL,AH                      ; Get quotient
              TEST    AL,AL                      ; Is it zero?
              JNZ     DoDigit                    ; No, continue
Done:
              INC     DI                         ; Yes, return start address
              MOV     AX,DI
              JMP     BP

Krille · Nov 10, 2017

retro said:
Here's the (decimal) library function I wrote for PL/M-86

And here's my improvements.

Code:

Int2Dec:
              POP     BP                         ; return address
              POP     DI                         ; decimal
              POP     AX                         ; binary


              MOV     BL,'0'                     ; ASCII conversion base
              SUB     CX,CX                      ; Zero length
              MOV     DX,CX                      ; Zero dividend MSW
              STD                                ; Store chars right-to-left
              MOV     SI,10                      ; Divisor


              sahf                               ; Test sign bit
              jns     Positive
              MOV     BH,'-'                     ; Prefix with '-'
              NEG     AX                         ; Make binary unsigned
              SKIP2B  f                          ; Macro that inserts a 'db 03Dh' = CMP AX, imm
Positive:
              MOV     BH,' '                     ; Prefix with ' '

DoDigit:
              DIV     SI                         ; Divide binary by 10
              XCHG    DX,AX                      ; Swap remainder & quotient
              ADD     AL,BL                      ; Convert remainder to ASCII    ; I would use OR here but that's just my preference
              INC     CX                         ; Bump length counter
              STOSB                              ; Store digit in string
              xor     ax, ax
              xchg    dx, ax
              cmp     dx, ax                     ; Anything left over?
              jne     DoDigit

              cld                                ; Because leaving the Direction Flag set would be a *BUG*
              MOV     [DI],BH                    ; Insert prefix char
              xchg    di, ax                     ; Return start address
              JMP     BP

BloodyCactus · Nov 10, 2017

to avoid reversing, I after the div, I push the result and increment. when there is nothing left to divide, loop and pop, then it comes out in order.

Scali · Nov 10, 2017

BloodyCactus said:
to avoid reversing, I after the div, I push the result and increment. when there is nothing left to divide, loop and pop, then it comes out in order.

You're not actually avoiding the reversing, you're just using the stack for it. That is still quite a lot of CPU/memory cycles wasted, compared to constructing the string in the target buffer in the correct order right away.

Chuck(G) · Nov 10, 2017

What BC said. This makes for a very compact loop:

Code:

        mov     ax,what
        xor     cx,cx
        mov     bx,10
decm4:
        xor     dx,dx
        div     bx              ; ax = remainder, dx = quotient
        add     dl,'0'          ; digit
        push    dx              ; save
        inc     cx              ; bump count
        test    ax,ax
        jnz     decm4           ; loop until done...

Then CX has the number of digits converted, and you need only pop and store the result.

But real-world conversion for human consumption is rarely this simple. A considerate program would take into account the width of the receiving field, blank- or zero-fill, comma insertion, etc. I can post code for an convert-with-edit routine, if anyone cares.

And then, there's the matter of converting 32-bit integers...

PgrAm · Nov 11, 2017

Here's what I came up with for unsigned integer to string. This assumes ES : DI points to a buffer large enough to hold all strings from "0"-"65535" plus a null-terminator and AX contains the unsigned integer. it returns in ES : DI a pointer to the start of the string.

Code:

Uint16ToStrDEC_ Proc near
	add di, 5
	xor dx, dx
	mov es:[di], dl
	mov bx, 10
	mov cl, 48
loopa:
	div bx
	add dl, cl
	dec di
	mov es:[di], dl
	xor dx, dx
	cmp dx, ax
	jnz loopa
	ret
Uint16ToStrDEC_ Endp

deathshadow · Nov 15, 2017

Interesting to see this bumped, and whilst it's cute to do this for 16 bit, it's not a very useful range... That was the problem I was facing was that for scorekeeping in a game even 16 bit was too slow and I really needed 8 digits, not five.

I just wrote about how my answer was so switch to using BCD a few days ago:

http://www.deathshadow.com/blog/20171110_Why I use BCD for scorekeeping in 8088 games

Even though it takes more overhead for addition/subtraction -- don't even get me STARTED about multiply and divide, in my case since all I was doing is addition and I needed to display it on screen fast, the answer to binary integer to decimal per byte conversion was to do none... and just work in BCD in the first place. At 8 digits accuracy even if it takes ~350+ clocks per addition, it beats the 2800+ clocks to convert a 32 bit unsigned integer into ASCII or what is for all intents and purposes unpacked BCD.

Though... wasn't there a trick for doing conversion to any bit-depth "better" using AAM?

dreNorteR · Nov 15, 2017

Can't beat BCD, but here's my attempt to optimize the binary version.
Timings as documented + 4 clocks per byte, except after DIV:

Code:

	; DX:AX=integer less than 100_000_000
	; ES:DI=>string buffer
	; ret buffer filled (always 8 digits)
	mov cx, 10000	;16
	div cx		;152
	mov bx, dx	;2
	xor dx, dx	;3
	mov cx, 100	;16
	div cx		;152
	mov cl, 10	;4
	div cl		;88
	add ax, '00'	;4
	stosw		;11
	xchg ax, dx	;7
	div cl		;88
	add ax, '00'	;4
	stosw		;11
	xchg ax, bx	;7
	mov cl, 100	;12
	xor dx, dx	;11
	div cx		;152
	mov cl, 10	;4
	div cl		;88
	add ax, '00'	;4
	stosw		;11
	xchg ax, dx	;7
	div cl		;88
	add ax, '00'	;4
	stosw		;11
	; = 957

deathshadow said:
Though... wasn't there a trick for doing conversion to any bit-depth "better" using AAM?

AAM just divides AL by a constant (which is 10 in the documented version) - it isn't any faster than a normal DIV.

Trixter · Nov 15, 2017

dreNorteR said:
AAM just divides AL by a constant (which is 10 in the documented version) - it isn't any faster than a normal DIV.

That's not true; AAM takes 83 cycles whereas DIV can take between 80 and 90 based on the operands (dividend and divisor). So AAM is slightly faster in the general case.

However, I usually recommend against using AAM with divisors other than 10, because NEC V20/V30 processors don't support that.

Chuck(G) · Nov 15, 2017

The problem (or advantage) with the scheme above is that it always takes the same amount of time to get to an answer, whereas the "Chinese remainder theorem" is linear with the number of digits.

Scali · Nov 15, 2017

Trixter said:
That's not true; AAM takes 83 cycles whereas DIV can take between 80 and 90 based on the operands (dividend and divisor). So AAM is slightly faster in the general case.

However, I usually recommend against using AAM with divisors other than 10, because NEC V20/V30 processors don't support that.

Another advantage of AAM might be that it sets the zero flag.
So you wouldn't need an extra compare when looping/early-out.

dreNorteR · Nov 15, 2017

Trixter said:
That's not true; AAM takes 83 cycles whereas DIV can take between 80 and 90 based on the operands (dividend and divisor). So AAM is slightly faster in the general case.

However, I usually recommend against using AAM with divisors other than 10, because NEC V20/V30 processors don't support that.

Also the digits are swapped, so the quotient is in AH and remainder in AL (forgot about that until I tried to put AAM into my code above).
The V20/30 optimizes the AAD instruction with a hardwired multiply-by-10. Can't test it right now but I'm fairly certain that AAM works with other constants as it does on Intel.

Innovations in Integer to String (itoa) Methods for 8088?

neilobremski

Experienced Member

dreNorteR

Experienced Member

carangil

Experienced Member

retro

Member

retro

Member

Krille

Veteran Member

BloodyCactus

Experienced Member

Scali

Veteran Member

Chuck(G)

25k Member

PgrAm

Experienced Member

deathshadow

Veteran Member

dreNorteR

Experienced Member

Trixter

Veteran Member

Chuck(G)

25k Member

Scali

Veteran Member

dreNorteR

Experienced Member

VCF West	Aug 01 - 02 2025,	CHM, Mountain View, CA
VCF Midwest	Sep 13 - 14 2025,	Schaumburg, IL
VCF Montreal	Jan 24 - 25, 2026,	RMC Saint Jean, Montreal, Canada
VCF SoCal	Feb 14 - 15, 2026,	Hotel Fera, Orange CA
VCF Southwest	May 29 - 31, 2026,	Westin Dallas Fort Worth Airport
VCF Southeast	June, 2026	Atlanta, GA