TINY memory model C, compiler recommendation?

krebizfan · Sep 14, 2016

Which C compiler was used? MS C 6 was famous for having a completely busted optimizer which made most code not work; without the optimizer, the code was merely very big and slow. Some of the Unix compilers ported to MS-DOS did a poor job generating small executables.

Scali · Sep 14, 2016

krebizfan said:
Which C compiler was used?

I would hope OpenWatcom 1.9, the only 'recent'/'modern' C/C++ compiler with support for 808x.

Chuck(G) · Sep 14, 2016

The optimizer on MSC 8 got to be pretty good--but caution is still advised (true with any aggressive optimizer). Note that there are several optimization options--you don't have to use "/o:x" all of the time.

bocke · Sep 15, 2016

deathshadow said:
DJGPP I've never heard good things about, and dealt briefly with in FPK's early days... seems dead now, so really not looking good. Couldn't even seem to get it to work here without dicking around so, whatever.

Not dead at all: https://groups.google.com/forum/#!forum/comp.os.msdos.djgpp

But it does produce large binaries which require a lot of resources. Especially in 2.04 and 2.05. But that can be also because there isn't any DJGPP specific optimization done in recent years. So, it's pretty much possible that the runtime library is beyond bloat at this time. It's also not developed on DOS anymore. The currently active developers use cross-compilers from Windows or Linux.

AFAIK, all DJGPP binaries have two parts: a COFF format loader and a COFF binary. Even if loader is not large, it's an overhead.

deathshadow said:
BCC 5.02 surprised me with a minimum .COM size three times that of TP7's .exe's... not encouraging. It too doesn't run from inside the 64 bit shell environment.... so scratch that.

It's possible it does something similar to DJGPP: it encapsulates non-native binary format and a loader for it. BCC for DOS was originaly ported for ELKS (a tiny subset of Linux kernel for 16-bit computers) development, so it's likely it contains a binary in ELKS format (basically early 16-bit Minix format) that it maps to DOS memory and then starts it.

Sorry for interruption. Ignore my post if it doesn't add to discussion.

eeguru · Sep 15, 2016

I think the OP was looking for a 16-bit compiler. I'm pretty sure the 16-bit output from GCC/DJGPP was never really mature and long since dead.

reenigne · Sep 15, 2016

bocke said:
Not dead at all: https://groups.google.com/forum/#!forum/comp.os.msdos.djgpp

But it does produce large binaries which require a lot of resources. Especially in 2.04 and 2.05. But that can be also because there isn't any DJGPP specific optimization done in recent years. So, it's pretty much possible that the runtime library is beyond bloat at this time. It's also not developed on DOS anymore. The currently active developers use cross-compilers from Windows or Linux.

AFAIK, all DJGPP binaries have two parts: a COFF format loader and a COFF binary. Even if loader is not large, it's an overhead.

More to the point for deathshadow's purposes - DJGPP generates 32-bit code that requires a DOS extender. It's not useful for 16-bit (8088/8086/80186/80286) applications.

bocke said:
It's possible it does something similar to DJGPP: it encapsulates non-native binary format and a loader for it. BCC for DOS was originaly ported for ELKS (a tiny subset of Linux kernel for 16-bit computers) development, so it's likely it contains a binary in ELKS format (basically early 16-bit Minix format) that it maps to DOS memory and then starts it.

I think you're talking about a different BCC (Bruce's C Compiler). BCC 5.02 is Borland C++, which has a DOS lineage.

deathshadow · Sep 15, 2016

Chuck(G) said:
Don't be silly. I was using Lattice C to write x86 DOS programs for the 5150.

Sorry, but ouch. That had to suck out loud.

Chuck(G) said:
And you've obviously never fooled with the Unix/Xenix microcomputer-based systems, oh, say, Sun? x80 C, not so much, but once the 16-bitters made the scene, C suddenly became practical.

Aka "mainframe novelties" if you were a microcomputer guy. There's a reason OS/9 openly made fun of it even when it was kind-of a workalike.

I had been using microcomputers for about a decade before I ever encountered a proper unix-like or proper "posixisms" on anything less than a mainframe -- First one I saw on a micro was Xenix on... one of the TRS-80's, I forget which. It annoyed me no end in that from strictly a usability standpoint, the shell made NEWDOS80 look robust... the memory footprint was ridiculous, it was slow as hell... Even by the 386 era things hadn't really improved... That *nix and C go hand in hand? Part of why I'm not a big fan of either.

Chuck(G) said:
I don't think it was because of Pascal that Windows did what it did.

Didn't mean to make it sound like I was saying it was, what I was referring to is how the practice (commonly used in pascal) of length delimited strings is more efficient on x86 than null terminated.

What's faster?

xor cx, cx
lodsb
mov cl, al
rep movsb

or

.loop:
lodsb
or al, al
jz .done
stosb
jmp .loop
.done:

On other architectures where a register load sets the flags, there are no specific string operators and you have to "brute force" it? Then null terminated strings are fine and dandy. On x86 they're inefficient and not what the processor is really designed to do.

Laughably, I've seen C compilers in the past who did THIS:

.loop:
cmp [si], 0
je .done
movsb
jmp .loop
.done:

Which is just... Were they byte obsessed and don't want to use an extra register or two, so who cares if you create an extra memory access inside the loop? As if null terminated didn't inhale upon the proverbial equine of short stature on x86 to begin with...

Chuck(G) said:
If you have a relatively complex task that needs doing, then the choice of languages doesn't matter as much.

I know a few ADA programmers who would disagree with you on that... most of whom are ready to line Congress up against the wall over allowing the "dirty irresponsible" C into security related programming. (They're more rabid in their dislike of C than I am).

Chuck(G) said:
The problem is that automatic optimization isn't taught well in most compiler courses.

... on that we can agree; and sadly optimization in general isn't taught anymore, nor even is the idea of good programming habits that prevent you from making mistakes. It's all blind copypasta of other people's stuff without understanding any of it... and it gets worse when simple habits in some languages (PHP comes to mind) for faster leaner code are scoffed at or rejected by misquoting Hoare's "Premature optimization" line out of context and magically turning it into "NEVER OPTIMIZE!"

Forgetting Knuths addendum that it only applies to 97% of code, but one should never neglect that 3%.

Though the magical change from good idea to blanket statement seems to be ridiculously rampant these days; my paying programming work on websites sees this all the time from other dev's, who magically transform "Don't use tables for layout" into "NEVER USE TABLES", or "use <strong> or <em> when their semantic meaning is more appropriate" into "NEVER USE <B> OR <I>"... misinterpretations that are 100% bull and nowhere near the original intent.

Were that the same people were as over-the-top over things like "Placeholder is not a label".

I swear, you'd think we were back in the days of developers getting paid by the K-LOC.

Scali said:
C and C++ are not 'managed' languages, and have no garbage collection.

Actually, there is some -- and it's not where you'd think... The stack. Where are variables passed? Where are local variables stored? What has to be allocated and released at the start and exit, quite often horrifically inefficiently? Releasing the stack space for locals and parameters is a form of garbage collection.

Scali said:
Aside from that, you can use C/C++ as a 'portable assembler', and write everything from scratch, like in assembly language. You don't have to use any libraries, and you don't have to use any startup code.

Unfortunately with 8k of overhead on:

Code:

void main() {}

The compilers I tried are nowhere NEAR what I need -- I'm not telling them to use libraries, there's no extra "startup code", they're still saddling me with a ton of slop. Gets worse when you get into things like string handling where those stupid malfing null terminated strings wreak havoc on speed and code size.

Particularly if you're aiming for a 32k total on a monolithic version of a program that was already ~60% ASM that with TP7 came to a 48k distribution (spread across 11 files)... more so if your entire reason for moving away from using TP as your glue and for the more complex logic was to reduce the executable size and to wring more speed out of it by making everything "near".

The Tiny memory model ends up a dead-end if the compiler starts you out with an eighth of that already in the trash before you've even made the program do anything.

I some ways -- and this is a terrifying concept -- I think I've been spoiled by WinAVR and CC65.

Scali said:
So I am not really sure what the specific problem(s) is/are.

The resulting executables are larger and slower than TP7 defeating my entire reason for migration, there's increased overhead on every function call as there's no distinction between a function and a procedure (even if you pass no parameters and have no locals they still screw around pushing BP, copying SP to BP, etc?), procedural variables are a train wreck to declare much less call -- assuming they work at all... and are ALWAYS far regardless of your memory model... which defeats my reason for wanting to use COM files.

Pretty much what I've tried results in code that's too big, too slow, and needlessly cryptic for what I'm doing. I THOUGHT after working on Arduino Sketches, WinAVR, CC65, GCC on ARM, and languages that are derivitive of C that I'd be able to stomach it on the 8088 platform, and simply put... it's just not happening.

I think that my intended target of 16 bit DOS with minimal memory is just too far away from what any modern compiler is built to do.

More so as I'm considering trying to keep this at the point where I can make special versions for loading off cassette or burning to a ROM for use as a PCJr cartridge... because... well... Why not?

For some reason now that I'm working in NASM -- and basically have shaved 10k off the project with a 15% speed boost -- I'm really kind of thrilled at the idea of making a cassette edition that could run on a 64k 5150... or even a 64k junior.

Though unlike other ROM Basic platforms, loading machine language off tape isn't exactly something they made it simple to do... no simple CLOADM : EXEC

Scali · Sep 15, 2016

deathshadow said:
Actually, there is some -- and it's not where you'd think... The stack. Where are variables passed? Where are local variables stored? What has to be allocated and released at the start and exit, quite often horrifically inefficiently? Releasing the stack space for locals and parameters is a form of garbage collection.

Then we disagree, because I see the stack as part of the x86 architecture, and you use it very much in the same way in any other language, including assembly.
The only argument you could make is that the compiler is less efficient than hand-optimized code, in general. Including with stack management.
The C compiler doesn't *have* to use the stack to pass variables, and on certain architectures it doesn't, by default. On x86 you also have the 'fastcall' convention which passes the first few arguments via registers.
Ergo, it is not part of the language as such.

deathshadow said:
Unfortunately with 8k of overhead on:

Pretty sure you can remove that.
I know you can do it with Visual Studio at least. You can skip all libraries, and tell it to use a given function as an entry point. Then you get only the code you write, no overhead.
Pretty sure at least some DOS C compilers also allow you to do this. I never tried, but I can give OpenWatcom and Borland Turbo C++ a go.

reenigne · Sep 15, 2016

Scali said:
Pretty sure you can remove that.
I know you can do it with Visual Studio at least. You can skip all libraries, and tell it to use a given function as an entry point. Then you get only the code you write, no overhead.
Pretty sure at least some DOS C compilers also allow you to do this. I never tried, but I can give OpenWatcom and Borland Turbo C++ a go.

I did something very similar for Digger Remastered. I moved from Turbo C 2.01 to Borland C++ 4.52 for improved code generation, but the bigger libraries made the end result too big. So I did some hacking around and managed to get code compiled with Borland C++ 4.52 to link against the Turbo C 2.01 libraries. The tricky part was that the compiler emits calls to some library functions in order to implement certain language features (e.g. 32-bit multiplies and divides). The TC2.01 libraries didn't have all the functions the BC++4.52 compiler needed, so I ended up disassembling some of them from a compiled binary in order to link only what I needed. I did eventually get it to work, though!

Chuck(G) · Sep 15, 2016

Actually, K&R "C" doesn't implement strings as part of the language. That is, there is no explicit "string" type. You have character arrays with a value of null signifying the end of a string of characters. You have a shorthand convention for character literals that includes a null at the end. You have a library of of functions that goes by this convention. But there is no "string" type per se. My thought on this is that K&R didn't want to invent a metadata type to describe them.

K&R didn't invent the null-terminated character string--that extends back into the days of 6-bit character mainframes--and has the problem that it's not possible to have an array consisting of all possible values of a character--the null character has had its role preempted as a terminator. On the CDC machines, expanding from a 63-character to 64-character 6-bit set created a real mess and a number of very unsatisfactory work-arounds. The old convention was that a null value terminated a character string; then it became two nulls in the lowest-order position in a word, with space or null characters prepended to the nulls if necessary. Otherwise the null was taken to mean a colon ( : ). Of course, if someone punched two successive nulls into a card, it was a craps shoot as to whether that would be taken as an end-of-line, depending on where those colons happened to fall within a word. The real downer was that there were languages that used two consecutive nulls as an operator or part of one. You can imagine the kludges that resulted from that (can you say "equivalent digraph"?)

If you look at old communications about the IBM 1620, you'll find similar concerns expressed by Dijkstra about how badly character handling was accomplished by the hardware. For example, it's possible to read or write a "numeric blank" character, but not to test for the presence of one--or do any arithmetic or logical operation on one. You can move one around, but that's it. Try to add one to a number and you got an error stop.

When writing a business BASIC compiler on an x80 architecture, it was obvious that null-terminated strings were not the way to go. Instead, I settled on a "descriptor" system wherein the size, length and dimension of a character array was tracked, which made string handling very straightforward. For example, If one defined "+" as a concatenation operator, one could write "A$ = A$+B$+C$+D$" and handle the strings optimally. In C, this simple operation involves a lot of "we need to find the length of a string" stuff. But the difference is that strings are a part of the BASIC language, unlike C, where they're mostly a notational convenience for handling arrays of characters. Were strings part of standard K&R C, you'd likely have seen a descriptor (or other metadata) system used. C is what it is--basically a semi-portable assembly language.

As far as overhead goes in x86 C compilers, it all depends on what you're after. If you write your prologue in assembly to fit your needs, then C is no better or worse than any other language. Writing a meaningful program that fits into a 512 byte sector is certainly possible (you really don't gain much efficiency writing programs smaller than that).

Scali · Sep 15, 2016

Well, I couldn't figure out how to make a .COM file work without any C lib stuff at all (main problem seems to be to convince it to use org 100h), but even without it, a simple 'hello world' is under 2k with OpenWatcom 1.9: https://www.dropbox.com/s/egux21vx9pw4b0m/mini.zip?dl=0
You could reduce it further by writing your own helper functions to call interrupts in inline-asm rather than using the library-provided generic intr().

deathshadow · Sep 16, 2016

Chuck(G) said:
The real downer was that there were languages that used two consecutive nulls as an operator or part of one. You can imagine the kludges that resulted from that (can you say "equivalent digraph"?)

Sadly such conventions continue in character encoding to this day. It's called UTF-8... you want a kludge, there it is. That it falls back to single byte for the standard 7 bit ASCII is nice and all, but you get into more complex non latin-1 languages it becomes a bloated mess and you're better off using UTF-16... and UTF-8 is a pain in the ASS to process at the per byte level since a character can have a variable byte length, and that second byte could contain what those of us used to working in ASCII would consider to be control codes, rendering those handy control codes -- even basic ones like /r and /n -- much more difficult to isolate. I've reached the point in a number of programs where internally I run UTF-16 regardless of what character encoding I'm reading in.

... Which seems to be part of Webkit and Blink's internal handling of it as well. Makes sense really to decode on read once rather than have to brute-force check every character during string processing. "Day job" I'm working on an Electron (similar to nw.js) web crapplet right now, and we're "stuck" using json or xml on data that would be SO much simpler and faster to just use FS, GS, RS, and US on and just strip control codes we don't want from the data, instead of screwing around escaping the data with entities taking already oversized formats and making the data bigger too.

... and yeah, I remember the headaches of 6 bit character encodings from my dalliance with DiBOL and importing data from older systems to new ones... and imagine it was only worse the further back you go.

Uhg, DiBOL, now there's a language I don't miss.

Chuck(G) said:
If you look at old communications about the IBM 1620, you'll find similar concerns expressed by Dijkstra about how badly character handling was accomplished by the hardware. For example, it's possible to read or write a "numeric blank" character, but not to test for the presence of one--or do any arithmetic or logical operation on one. You can move one around, but that's it. Try to add one to a number and you got an error stop.

That had to make importing data from other systems FUN; pre-processing before sending it to that platform likely being the best approach.

Chuck(G) said:
When writing a business BASIC compiler on an x80 architecture, it was obvious that null-terminated strings were not the way to go.

... and really that's the conclusion I think Microsoft came to when working on Windows; I know during the culture-clash between IBM and M$ came to a head on OS/2, that was very much the type of thing that IBM seemed to love shoving their head up their backside to smell their own farts on and insist on having "Because it's how all our other 'big business' client systems work". They were so resistant to change and stuck in the mud when it came to efficient code vs. "This is how we've always written software"

Again though, what one can expect from the "paid by the k-loc" scam artists that made up a lot of business programming -- especially in the big iron world -- at the time. Something made all the more scary when you consider that so much of the software then ran on interpreters.

Chuck(G) said:
Instead, I settled on a "descriptor" system wherein the size, length and dimension of a character array was tracked, which made string handling very straightforward. For example, If one defined "+" as a concatenation operator, one could write "A$ = A$+B$+C$+D$" and handle the strings optimally.

Because you could easily sum the space to allocate for the new value from the dimension.

Chuck(G) said:
In C, this simple operation involves a lot of "we need to find the length of a string" stuff. But the difference is that strings are a part of the BASIC language, unlike C, where they're mostly a notational convenience for handling arrays of characters. Were strings part of standard K&R C, you'd likely have seen a descriptor (or other metadata) system used.

Which is really part of why I find C extremely limited. If anything, to use a word you used that describes it best -- strings in C are a kludge. Kind of like how objects in C++ feel just shoe-horned in there any-old-way. Though at least it's not pervasively object based like Java... more like pervertedly object based.

Chuck(G) said:
C is what it is--basically a semi-portable assembly language.

Except for where it totally isn't, but that's the problem of being TOO portable. Architectures are often a bit too different for a generic language to "one size fits all". As the Air Force quickly learned at the dawn of the jet age, it's the flaw of averages.. An average size actually fits nobody.

Chuck(G) said:
As far as overhead goes in x86 C compilers, it all depends on what you're after. If you write your prologue in assembly to fit your needs

... at which point I might as well just write the whole thing in assembly. I was hoping for an easy off the shelf solution to give me something better than what I was getting out of TP7, and that I could compile from the host OS instead of from inside DOSBox. Trying the different C compilers they just weren't giving me that -- at best a step sideways, at worst several steps backwards. It's actually kind of a laugh that MOST of my "problems" actually disappeared when I gave up on high level languages altogether. at least two thirds of Paku Paku 2.0's codebase was already in assembler, so porting it to all be "near" (except the timer ISR of course) wasn't a big deal... and stripping out the overhead of variable passing and far calls is netting me that extra wedge of speed I need for the fastest game level to run full out on a unexpanded Junior.

I'm relatively certain most of the issues I have with C for my current set of x86 projects would evaporate if I upped the target to a 16mhz 386SX as the minimum... but with a 128k DOS 2.x PCJr as the minimum target? Not so much. Even less so if I drop that to a 64k cassette based machine. NO matter what I do it's not going to fit the 40k free after booting DOS 2.x on a 64k rig. Shame... Though I'm tempted to see if I can restrict myself to DOS 1.1 compatibility since the only real offender would be the file operations for the high scores table now as I've gone monolithic on the executable -- no more separate data files in the distro for anything but scores. -- and I'm even tinkering with making it a self modifying file to get rid of that too!

Real fun is going to be figuring out the best way to load it from tape -- does cassette basic blow up in your face if you try to blind append machine language to a basic program (given there's no CLOADM) or am I just best off loading a .bas that when run does a def_seg, bload a .m and call? Pretty sure DATA and POKE ain't gonna cut it on what's looking to be a ~32k executable.

People often scoffed at tape on the PC, but it was in use on other platforms far past the PC's introduction -- I always felt it would have gotten a lot more use if they had provided a proper way for simply loading machine language programs from basic on one easy pass. I think that would have sunk the Commodore side if early on the trick of appending the machine language raw to the .bas file hadn't caught on. Once I kitbash a cable for the Junior (assuming I can track down a proper connector) I'm going to try that method first just to see what it does, since done "properly" it would be a single load and run. Of course, CREATING that .BAS file is gonna be tons of fun... 5150 and Junior are tape file compatible to each-other, right?

Cassette based PC software is really not something I've seen a lot of people try to do anything with, hence my interest in doing it.

Either way I'm likely going to pull support for the fancier sound cards from the "tape" version -- just stick to junior/tandy/speaker... especially with my 120hz arpeggio two voice speaker sound being a lot nicer than the priority based single voice I was using. Can axe the high score saving code for that build too since where would you put it? Still working on the Junior only build too using the linear 160x100 mode instead of the tweaked text mode, that can have a much smaller executable when I'm done and will have even more overhead -- that version is the one I'm tinkering with the idea of making a cartridge out of. (assuming I ever get that far)

Though I'm also working on a proper 1.5 bit speaker driver for systems that are fast enough to handle it... doesn't need to be too fancy as really for Paku Paku and most of the games I'm working on for said target platforms I really don't need more than two voices anyways.

Crazy idea, has anyone ever tried using the cassette port on the 5150 or Jr. as a second voice, the way TRS-80 users used theirs for audio? Might be fun... though I don't think at that speed i'd have the time to service the timer that fast if it's a strictly on/off affair.

Oh, and @Scali, interesting result... not what I got here, but I've moved on from C and high level languages for this purpose entirely. In a way what Chuck(g) said about languages is the problem; it's really not mattering which high level language I'm trying, they're all giving me the same results, and those results do not line up with my needs... even though TP7 remains closest in terms of size and speed compared to the alternatives I've tried... at least not without brute-forcing large sections of the build process which again, at that point I might as well just use assembly... and lots of macros.

Again at best, C ended up being a step sideways from what I had, not forwards. As always, ever forward.

reenigne · Sep 16, 2016

deathshadow said:
Sadly such conventions continue in character encoding to this day. It's called UTF-8... you want a kludge, there it is. That it falls back to single byte for the standard 7 bit ASCII is nice and all, but you get into more complex non latin-1 languages it becomes a bloated mess and you're better off using UTF-16... and UTF-8 is a pain in the ASS to process at the per byte level since a character can have a variable byte length,

UTF-16 has variable-length codepoints too (codepoints above 0xFFFF are encoded using two surrogate pairs).

deathshadow said:
and that second byte could contain what those of us used to working in ASCII would consider to be control codes, rendering those handy control codes -- even basic ones like /r and /n -- much more difficult to isolate.

Actually all the bytes in a multi-byte UTF-8 codepoint have bit 7 set. So searching for bytes 10 and 13 in a UTF-8 bytestream will find exactly codepoints U+000A and U+000D respectively.

deathshadow said:
People often scoffed at tape on the PC, but it was in use on other platforms far past the PC's introduction -- I always felt it would have gotten a lot more use if they had provided a proper way for simply loading machine language programs from basic on one easy pass.

I don't think that's the reason tape got so little use on PCs. Early PCs were really expensive compared to the 8-bit micros where tape was king. If you could afford an expensive PC you could probably afford a fancy floppy drive as well, so probably didn't even bother hooking up a tape drive.

deathshadow said:
Crazy idea, has anyone ever tried using the cassette port on the 5150 or Jr. as a second voice, the way TRS-80 users used theirs for audio?

There's no point - the 5150's cassette output is the same PIT channel 2 output as the speaker anyway. You could use the relay for percussion, though!

deathshadow · Sep 16, 2016

reenigne said:
Actually all the bytes in a multi-byte UTF-8 codepoint have bit 7 set. So searching for bytes 10 and 13 in a UTF-8 bytestream will find exactly codepoints U+000A and U+000D respectively.

I'm going to have to revisit that on this data, because I'm running into codes 0..31 inline in the data that are NOT where any control codes should be.

Wonder if it's just the age of this database. The client in question has been porting this data from system to system for years... it started out in dbase, moved to paradox, then to access, then to mssql, then to mysql... their most recent "previous developer' moved them to postgresql and made a right mess of things in the process, and they want me to move them back to mysql (well, mariadb to be precise). So many of these changes have been unnecessary and the whim of their former IT director who was one of those "If it's printed in Forbes we have to do it yesterday" types which is why he wouldn't shut the **** up about "Web 2.0".

Which as I keep saying, getting computer advice from Forbes is like getting financial advice from Popular Electronics.

... and now the owner wants me to slop AJAX and a mobile interface atop it in addition to porting it back to mysql.

Think I'll drop down to low level and write a parser to walk through the data byte at a time looking for abnormalities... 'cause I just checked and your right, that top bit should be set -- meaning there's something wrong with the data. Might be time to sanitize... and strip out any control codes that aren't /r or /n. (since given this is all flat plaintext, there's no reason for any others to be in there.)

-- edit -- damn, there's a slew of Shift in/Shift Out wrapping the capitol letters on the first ~8000 or so records; Christmas on a cracker how old is this data? Much less who would port it from a system that needed shift in/shift out to proper 7 bit and leave those in place?!? Seeing some ETB and some cancels in there too. Hmm, the ETB seem to all be at every 76 characters? ETB as a EOL?!?

Ok, THANKS for pointing out my false assumption -- it made me go back and look at the data for real. I wouldn't have thought there might already be actual attempts at control codes in it. I should probably double-check the codebase just to make sure they aren't using that for something already. Also a little surprised that if they are that old data-wise, that in all the porting from database to database they weren't lost in the process. You'd think at least one of the transfers would have wiped that sort of thing.

Though I'm THIS close to tossing their entire existing codebase in the trash and starting over clean. As it is a quarter the code is ASP, a quarter is ColdFusion, and the rest is PHP -- and they want it all ported to one language. (they don't care which, so I'm going PHP).

"Day job" -- some retirement on doctors orders this is turning into... but I got bills to pay, I got mouths to feed, and ain't nothin' in this worlds for free...

reenigne said:
I don't think that's the reason tape got so little use on PCs. Early PCs were really expensive compared to the 8-bit micros where tape was king. If you could afford an expensive PC you could probably afford a fancy floppy drive as well, so probably didn't even bother hooking up a tape drive.

Still I think it limited the usefulness of the bottom model which contributed to it not selling. A number of BASIC systems didn't exactly make loading machine language from BASIC simple, but for the ones that did it was entirely viable with disks not really giving you THAT big a leg up at the time... at least not enough of one to justify the cost for many people.

But like many things I suspect the internal struggles at IBM contributed to that as much as anything else. They both did and didn't want to make a low end system for "normal people" which is why the result often feels so... conflicted.

reenigne said:
There's no point - the 5150's cassette output is the same PIT channel 2 output as the speaker anyway.

Bugger, nevermind then.

reenigne said:
You could use the relay for percussion, though!

... heh, don't remind me; I still remember replacing those relays when they burnt out... which was lovely at the 'Shack at the time when people would walk in, and type in programs that just sat there flipping the relay as fast as possible on the Model 1 and 3.

Trixter · Sep 16, 2016

deathshadow said:
I was hoping for an easy off the shelf solution to give me something better than what I was getting out of TP7, and that I could compile from the host OS instead of from inside DOSBox. Trying the different C compilers they just weren't giving me that -- at best a step sideways, at worst several steps backwards. It's actually kind of a laugh that MOST of my "problems" actually disappeared when I gave up on high level languages altogether.

Maybe what you were looking for was HLA?

if I drop that to a 64k cassette based machine. NO matter what I do it's not going to fit the 40k free after booting DOS 2.x on a 64k rig.

Those don't exist in the wild, so you don't need to worry about them as a target. I've been using and collecting Jrs for 30 years and I've never seen a 64K-only PCjr with a floppy drive (I'm not sure IBM even offered a 64K+floppy drive option). If you still want to target a 64K PCjr, build a ROM cart.

If you insist on targeting 64K systems with floppy drives, make it a booter. You can email me for source for a boot sector that can load a single .COM or .EXE from a FAT12-formatted diskette that works on 8086 systems.

If you want to trade speed for size, there's always sizecoding.

and I'm even tinkering with making it a self modifying file to get rid of that too!

Or, just put everything in a data file and attach it to the end of the executable. Or just put only the high scores in the data file attached to the end of the executable. Or just patch them in the .COM. However, all of these run the risk of firing off any DOS-native antivirus products, which Pentium-era enthusiasts still occasionally run.

Real fun is going to be figuring out the best way to load it from tape

BASIC stub that then loads a binary and jumps to it.

Though I'm also working on a proper 1.5 bit speaker driver for systems that are fast enough to handle it.

1.5?

even though TP7 remains closest in terms of size and speed compared to the alternatives I've tried...

I still use TP7 as a high-level assembler. The IDE is a joy to use if you are limited to developing on the vintage hardware itself. It can single-step through both inline and external assembler, while you see registers change in a register watch window. I don't mind the ~4K header overhead when it is saving me so much time. I still use pure assembler when size is a concern.

CP/M User · Sep 17, 2016

I was going to suggest Moonrock earlier, but it maybe a little too simple for your needs. Moonrock is described as a BASIC-like language, though the code has some Pascal/C Traits about it, could probably be compared with QBASIC.

bocke · Sep 18, 2016

reenigne said:
More to the point for deathshadow's purposes - DJGPP generates 32-bit code that requires a DOS extender. It's not useful for 16-bit (8088/8086/80186/80286) applications.

I know. I just saw it was mentioned so I chimed in.

Although tere were at least two tries to make it generate 16-bit code. But the maintainers seemed not to be interested in any of them. The first I'm aware of is the hack the author of DJGPP did in mid-nineties. It's far from complete. Much more complete patchset was submitted few years ago, but for some reason it wasn't merged upstream.

reenigne said:
I think you're talking about a different BCC (Bruce's C Compiler). BCC 5.02 is Borland C++, which has a DOS lineage.

Ah, yes. Sorry about that.

I'm also sorry for offtopic/not contributing to the topic. I just wanted to mention this for sake of completition. I see now it doesn't add anything useful to the topic.

reenigne · Sep 18, 2016

bocke said:
Although tere were at least two tries to make it generate 16-bit code. But the maintainers seemed not to be interested in any of them. The first I'm aware of is the hack the author of DJGPP did in mid-nineties. It's far from complete. Much more complete patchset was submitted few years ago, but for some reason it wasn't merged upstream.

The time is probably right for another try. There have been some features added since the last attempt for other targets which might be quite a good fit for x86's segmented memory architecture and near/far pointers. It's on my todo list, but I have several other projects that are higher priority.

I don't expect such a port to ever be accepted upstream, though - it's niche is just too small. Even for things like boot sectors (which start with the CPU in 16-bit mode) most people will still be targeting 32-bit CPUs.

bocke said:
Ah, yes. Sorry about that.

I'm also sorry for offtopic/not contributing to the topic. I just wanted to mention this for sake of completition. I see now it doesn't add anything useful to the topic.

No need to apologize - it's not so offtopic as to be uninteresting, and there's value in keeping a conversation like this going.

bocke · Sep 19, 2016

reenigne said:
The time is probably right for another try. There have been some features added since the last attempt for other targets which might be quite a good fit for x86's segmented memory architecture and near/far pointers. It's on my todo list, but I have several other projects that are higher priority.

Please do, when and if you find time. There might still be some interest.

I, for one, would be glad to be able to use Unix tools for developping and cross-compiling 16-bit DOS software.

deathshadow · Sep 19, 2016

Trixter said:
Those don't exist in the wild, so you don't need to worry about them as a target. I've been using and collecting Jrs for 30 years and I've never seen a 64K-only PCjr with a floppy drive (I'm not sure IBM even offered a 64K+floppy drive option).

Sorry, I'm aware of that, I should have been more specific.... when I say 64k+DOS, I mean 5150. Remember, the main codebase targets both platforms. I'd like to have 64k cassette (jr or PC), Junior ROM Cart, and single floppy PC as targets.

You'll note I said converting to DOS 1.0 level support -- and remember, the Junior shipped with 2.1 for a reason. (as well as that internal 64k card if you bought the drive separately)

Actually digging into the RBIL right now to see what 2.0 had that 1.0 didn't -- since I know I'm currently using the 2.0+ disk routines. I might convert it anyways just so that it can work on the OS the 5150 shipped with day one. It's not like I'd be taxing that 160k single sided floppy's storage capacity.

To that end, also playing with the idea that maybe instead for the 64k 5150 160k floppy I should just make it a booter? Then i'd not have to worry about what DOS is going to use for memory; though I would have to also have to add my own ISR setter for the timer(s) since int 0x21 goes bye-bye that route. NOT that it's rocket science to do so. I can bring it in under ~54k with room to spare from the looks of things right now. Then for score saving just pick an unused sector and use it flat via INT 0x13 AH 02/03. File system? VVe don't need no steenkin' file system!

I always hated booters, but with 64k systems I'm starting to see why they were done, and it wasn't just for copy protection. The overhead of DOS gets in the way a LOT at 128k or less, particularly for something as simple as a game that doesn't need 90%+ of what DOS offers.

Shame I can't figure a way to get it down to 16k, but my needing a 6k software framebuffer pretty well precludes that. Even if I axe all sound but PC speaker, and all video support but CGA, I'm still at ~28k and that's for the ASM development copy that doesn't even have game logic in it yet (which I figure to be another 8k of code).

Whereas I go with a booter, I'd not have to cut anything from that version. Would also make a really fun distro model for DOSBox... though maybe I do that I should add a menu for choosing sound device and video switches during startup? (since there'd be no command line switches)

Was thinking on adding that anyways. Might do that as my next step before diving into porting the game control logic -- the only major part of the program that wasn't asm anyways. (menu, playfield setup, sprite objects, and music playback aren't "Major" in my book)

I keep putting off the game logic on this port...

Silly question, just how much software was there that actually ran on a 16k Cassette based 5150? I can't imagine it's a long list...

Trixter said:
1.5?

The common sloppy inaccurate name for PWM on the PC speaker. You'll see a number of source examples refer to it as such. Kind of like the sloppy inaccurate name in web development for sliding around a background-image to reveal just certain parts of it, where it's called "CSS Sprites"... they're not sprites any more than PWM on PC speaker is "1.5 bits".

But the name stuck.

VCF West	Aug 01 - 02 2025,	CHM, Mountain View, CA
VCF Midwest	Sep 13 - 14 2025,	Schaumburg, IL
VCF Montreal	Jan 24 - 25, 2026,	RMC Saint Jean, Montreal, Canada
VCF SoCal	Feb 14 - 15, 2026,	Hotel Fera, Orange CA
VCF Southwest	May 29 - 31, 2026,	Westin Dallas Fort Worth Airport
VCF Southeast	June, 2026	Atlanta, GA

TINY memory model C, compiler recommendation?

Veteran Member

Veteran Member

25k Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

25k Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Experienced Member

Veteran Member