• Please review our updated Terms and Rules here

Strings and parentheses in Turbo Pascal

Ruud

Veteran Member
Joined
Nov 30, 2009
Messages
1,369
Location
Heerlen, NL
Hello,

I'm writing my own pascal compiler that should be at least TP4 compatible one day. The why: just for fun. FYI: it doesn't output an executable but a file with macros. With special directives, that just look like comment to TP, my assembler knows for what CPU and what computer it is meant and translates the macros into ML. So far I was able to write little programs for the PC and for the Commodore 64.

Please have a look at this:

i1 := (i2 * (i4 - i3)) * (i4 * (i5 + i6));

The above leads to in between results and I use the Stack to store them

Now let's do the above but for strings:

s1 := (s2 + (s4 + s3)) + (s4 + (s5 + s6));

As far as I can remember I have never seen or written such a line but TP accepts it so I must be able to handle it as well. If I would use the same routine as I used for the first statement, I have to add s4 and s3 first and push the result on the stack. And not knowing how big the results will be, I have to push 256 bytes every time. Not only time consuming but that can also be very costly in needed RAM.
But my own logic says that in case of strings only plus can be used as an operator and in that case I can forget about the parentheses and treat the whole as one big addition.

Can anybody prove me wrong? (I hope not)

Thank you in advance!
 
Yes, ‘+’ is the string catenation operator, so using other ‘mathematical’ operators is invalid.

Doesn’t Pascal store the length of the string before the string itself? In this case, the runtimes ‘know’ how long each string is, and (therefore) how long the result is.

I haven’t got time to test it at the moment, but it is an interesting question regarding the use of the string catenation operator and parentheses. Of course, the result should be the same as left to right evaluation - shouldn’t it...

Dave
 
Store the pointers on the stack not the values, then when it's time to allocate the memory / string side just walk those pointers to add together their lengths before copying their values to the result.

That or just always allocated 255 bytes as a scratch area for building the result, since that's the max string size in TP anyways. It might be faster to allocate 255 bytes, copy all the strings into it, and then copy the result over when done, than it would be to predetermine the total lengths.
 
And not knowing how big the results will be, I have to push 256 bytes every time.
Actually, you don't. Your string ops can use a work area (either a static area, or simply "above" the top of the stack).

What that means is this.

push String1
push String2
the concatenate operation concatenates String1 and String2 into the work area, pops them off the stack, and pushes the work area as a new string. This new string will be as long as it needs to be (up to 256).

Now, as noted above, the parentheses are redundant, but I wouldn't change the logic.

Code:
s := s1 + (s2 + s3)

push s1
push s2
push s3
concat
concat

vs

Code:
s:= s1 + s2 + s3

push s1
push s2
concat
push s3
concat

The former impacts the stack more heavily, do it with too many big enough strings, you blow the stack and trap the program.

This is filed under "Bad Luck" with notes in the "Doc it hurts" section.

Basically, yes they can do that, there's no reason to do that, so don't do that. I wouldn't complicate the expression logic to try and simplify this. On small cases, it doesn't much matter. On large cases, the programmer can remove the parens and solve their own problem.
 
Thank you all for your comments! I'm glad that nobody proved me wrong. And Frenkel just confirmed what I thought.

@whartung: I have to disagree with you. I only need ONE temporary variable. I fill it with s1 in the first place and then I have to add s2 and s3 to it. If that viable will be part of the stack, I will think about that.
 
You can’t use a ‘static’ - or anything similar.

Consider a function call that returns a string that is concatenated with other strings. The called function itself uses string concatenation etc. If a static is used, this will all go horribly wrong.

Dave
 
A function that needs strings as input, I see what you mean: the resulting operand has to be stored first before the function can be applied. Like:

s1 := f1(s2) + f2( f3(s3 + s4) + f4(s5 + s6) );

So then I cannot use this one temporary value, the stack it has to be. But if I have to use the stack anyway, I think I'm not going to spend time in finding a way to ignore any used parentheses for things like the example I gave in my first message.

Many thanks, Dave!
 
No problem Ruud.

If course, the function parameters can be anything at all (they don't have to be strings of course). In turn, the function could call other functions...

As TP is limited to 256-byte strings (the length of one byte used as the string length) you could get away with allocating a fixed number of bytes (257?) from the stack for each and every string variable and temporary. The 'length' byte would then indicate how many 'used bytes' of the buffer there are. A bit wasteful of memory - but the easiest potential solution.

Dave
 
In the statements I gave I used variables so the length of the result is unknown. So I always have to push the full length of a string on the Stack, which is 256 bytes, including the length byte.
The only thing I can do to avoid the unavoidable "waste" of memory is to let the compiler check if there isn't a push and a pop of a string in a row. If that is the case, just use the last result as-is.
And it is not just the "waste", pushing and popping all that memory will cost time so if we can avoid it in one way, the program will gain in speed as well.
 
It depends if you are passing the string itself or a pointer to the string on the stack.

If you pass a pointer to the string (if it is used as a parameter) then the length of these strings are known at runtime. However, the parameter itself could be a temporary string e.g. Func( string1 + string2 + string3 ) - so the memory for this should be allocated and then deallocated when it goes out of scope. The net result is the same if you pass a pointer to the string as an input parameter to the function on the stack.

The result is a string that starts out as an undefined length (but we know what the maximum length could be = 256 bytes). But the concatenation operator knows (absolutely) how large both of the input strings are, so can allocate exactly the correct amount of memory to hold either the temporary or final string result.

The destination string starts off as 'empty', so there should be minimal overhead in allocating space on the stack - as it should just involve moving the stack pointer (apart from potentially allocating a byte of 0 for the initial length of course).

I think it all depends upon your variable and temporary allocation/deallocation (scope) process; and whether you are using the stack or heap for the location of the string data itself.

Dave
 
I haven't thought about pointers yet, that is something for the future.

But passing the string itself, I think I understand what you mean: the moment you have to push it to the stack, you know its size, being the first byte of the string. I have thought about that but at that time I thought it was not simple to implement. But writing this and thinking about it, another solution just popped into my mind. A bit more instructions but I think that will be compensated by the fact that most strings are even near 255 bytes long, thus less stack and less pushes and pops needed.
 
Pascal is "pass by value". When you pass a string, it's kind of expected that you are being passed a COPY of the string, not a REFERENCE to the string (if you want a reference, use VAR). Same with records. Even C is like this (save that C does not have a "string" type, so it's typically always pointers).

When you push a string, there's no reason to push 256 bytes. You can simply push the string. You "know" how long the string is because pascal strings start with a length byte. So you know how much to pop off the stack when you're ready.

Performance sensitive developers will typically just pass VARs around, to use pointers. There's no reason for the compiler to do this automatically, especially a simple compiler.

This is perfectly legal Turbo Pascal:
Code:
function strfun(p1 : str[20]; p2:str[20]) : str[20];
begin
    p1 := p1 + p2;
    strfun := p1;
end;
Since p1 and p2 are passed by value, this is safe. If p1 were a pointer, it would stomp on the original string. This is something that a compiler could detect with data flow analysis, but simple compilers are simple. So, you can't just assume that you can pass pointers where values are normally used.

You know the length of the destination string because strings are defined as character arrays of a specific length. Normal pascal, and Turbo Pascal, do not have "generic" strings. They have built in operations that are "generic string" aware. But the compiler "knows" how long strings are.

A string[10] is NOT a string[20]. They're different types and incompatible. You can can't have a procedure that takes a string[20] and pass a string[10] to it.

You CAN pass a string constant to it, but not a typed variable. Same with pointers. If you pass a constant as a VAR parameter, pretty sure TP complains.

That said, you can plays games with generic memory (getmem) and pointer casting, and things will mostly work.

But you have to be careful of 3 string operations. INSERT, DELETE, and CONCAT.

This will work just fine.
Code:
procedure test(var s1:string[20]);
begin
    insert('123', s1, 1);
end;

But if you do something like this:
Code:
procedure breakit;
var
    p1 : ^string[20];
begin
    GetMem(p1,10); /* generic memory allocator in contrast to New */
    p1^ := '12345678';
    test(p1^);
end;

Here, you have a string[20] pointer, and 10 bytes of memory that Pascal THINKS is 21 (number of character + 1 for the length) bytes of memory.

If you call that test procedure, it will happily insert the 3 letter (123) in to what it thinks is a 20 byte buffer, meaning it's going to be moving 17 bytes to make room for the 3. That's 7 more bytes than you have allocated.

Similarly, If you did this:
Code:
procedure breakit2;
var
    p1 : ^string[20];
begin
    GetMem(p1,10);
    p1^ := concat('ABCDEFGHIJKLMNOP','0123456789');
end;
This is going to shove 'ABCDEFGHIJKLMNOP0123' in to your 10 byte buffer. It "knows" it's only 20 bytes, so it truncates the string. It just doesn't know you didn't give it 20 bytes.

Recall, the string operations in Turbo Pascal are NOT functions like in C. They're special operators, just like + and /. They "know things" (notably extended type information) that standard procedures and functions do not.

If you're writing code more C like, then you can make a str255 type, use GetMem for all of your memory allocation, and pass VARs around.

Just don't use INSERT or DELETE, since they're going to work on the entire 255 byte buffer, which won't work for you if you're working with mixed length strings. CONCAT will work fine if you "know" that your destination string will hold the values.

Pretty sure this is safe:
Code:
procedure okeydokey;
var
    p1 : ^string[255];
begin
    GetMem(p1,10);  /* generic memory allocator in contrast to New */
    p1^ := concat('ABCD','0123'); /* This is ok, it fits in the 10 byte buffer */
    p1^ := concat('ABCDEF', '012345'); /* This is not, stomp stomp */
    /* this works too */
    p1^ := 'ABCD';
    p1^ := concat(p1^, p1^); /* pushes two copies of the 4 character string, the copies the 8 character result back in to p1 */
end;

Mind, none of this has anything to do with the compiler. The point is that you can operate with string this way in a TP environment, just have to be careful.

The compiler is pretty simple, and the language is designed for the compiler to be simple.
 
First: thank you for the above. I'm not a fan of pointers and that is one of the reasons I prefer Pascal above C. Pointers can be a good thing and in my new compiler I use them as well but IMHO C is using them in a quite confusing way some times.

I'm working already for 36 years with TP. And now the funny/weird thing: I didn't know of CONCAT and INSERT. I immediately checked them in TP7. CONCAT is the same as string + string and that is already enough good reason never having used it.
But INSERT is another matter. Knowing it now I sure know where I can improve some programs.

So you can see again: one is never too old to learn some thing new.
 
Another deduction of my own I hope that it is correct: all statements can be reduced to groups of operations with just two operands. I hope you understand what I mean because I even don't know how to say it in Dutch. I'll give you a simple example with strings:

s1 := s2 + s3 + s3;

That can be reduced to:

result1 := s2 + s3;

and

s1 := result1 + s3

Let's use reals and functions and use the stack for saving ALL results:

r1 := f1(r2) + f2(r3) * f3(r4);

This will result in:

Result1 := f1(r2) which then is pushed to the stack

Result1 := f2(r3) which then is pushed to the stack

But now a multiplication is encounter and things are lifted to a higher level

Result1 := f3(r4) which then is pushed to the stack

The last two results have to be popped from the stack and multiplied:

pop Result2
pop Result1
Result1 := Result1 * Result2 and this result is pushed to the stack again

pop Result2
pop Result1
Result1 := Result1 + Result2 and this result is pushed to the stack again

pop Result1
r1 := Result1

Things can be optimized by checking if a push of Result1 is followed by a pop of it. In that case we can ignore both. But that is another matter.

So the main question: can all statements be brought back to little sub-statements that result in just the need of two temporary results at the max?

The reason for this question: to handle bytes, word, integers and longints, on a PC I can use CPU registers. This is not the case for strings and 8-byte reals. That means that before I can handle them I have to store them first so a macro that is pointed to these results can handle them. If the above assumption is correct, I only need to reserve 256 or 512 bytes at the max for temporary strings. If I am wrong....... unfortunately the sky isn't the limit.
Just popped up: when compiling for a 6502, it is quite possible that I even need to create temporaries for words, integers etc.
 
all statements can be reduced to groups of operations with just two operands.
Well, this can be seen here.

Consider the TP 7 grammar talking about expressions.
Screen Shot 2022-02-08 at 12.42.25 PM.png

I mean that's it right there in a nutshell. An expression has 3 parts, 2 of which are optional. A Simple Expression, and, optionally, an Operator and another Simple Expression.

This defines every expression in the language. As you can imagine, things start getting more interesting as you dig in to the definition of a "simple expression". You'll notice, there's no math here. Those are defined deeper. It's a recursive definition that will eventually come back in to wanting, potentially, more expressions. Since Pascal historically uses recursive descent parsers, the different syntax elements automatically handle issues like precedence. A Pascal expression is built of of Simple Expressions, Terms, Factors, Variables, Constants, and Identifiers. Out of 6 pages of syntax diagrams for Pascal, 3 of them are dedicated to expressions.

But, as you can see, at the top level, it's 2 elements and an operator, just like you see.

If you follow it to its conclusion, it's straightforward to convert a normal expression in to an "RPN" (Reverse Polish Notation) style expression, also know as a "postfix" expression, which is ideal for a stack based evaluator (in contrast to a register based one).

i := 1 + 2 * 3 + 4

becomes
Code:
push 1
push 2
plus
push 3
push 4
multiply

Why is multiply last? That's operator precedence kicking in. With a properly defined grammar, you get that "for free". Historically, for me, getting the expressions right in a language is the hardest part. Once that's done, the rest is easy.

HOWEVER, there is a detail to be noted here (which is, technically not an issue with Pascal, but perhaps, nowadays, unexpected.)

Consider the following program:
Code:
program test;

function f1:boolean;
begin
    writeln('f1');
    f1 := true;
end;

function f2:boolean;
begin
    writeln('f2');
    f2 := false;
end;

begin
    if f2 and f1 then
        writeln('win');
    else
        writeln('boo');
end.

What do you think will be printed with this?

In Pascal, you get both f2 and f1.

In C, you'd get just f2.

Why?

C boolean expressions short circuit. Pascal does not.

Short circuiting in boolean expressions is when the rest of the expression is cut off when the answer is already known. Since f2 returns false, we "know" that the AND will return false. In C, f1 will not be executed. In Pascal, it is.

(I don't know if this applies to modern Pascal.).

The point being that in C, a boolean expression is evaluated different from an arithmetic expression.

So, in C you can't just do:
Code:
call f2
call f1
and

But in Pascal you can.

Anyway, the fundamental point back to you, is "yes" in Pascal, you can just do a straightforward "infix" to "postfix". Every expression is, indeed, inevitably, just a chain of 2 arguments and operator.
 
If I remember correctly Turbo Pascal supported Boolean Short Circuit Evaluation since version 4.0.
 
Hallo Whartung,

First: I second Noppa's comment. And even tested it in TP7 to be sure.

i := 1 + 2 * 3 + 4

becomes
Code:
push 1
push 2
plus
push 3
push 4
multiply

Shouldn't that be:

Code:
 push 1
 push 2
 push 3
 multiply    ; 2 * 3
 plus          ; 1 + 6
 push 4
 plus          ; 7 + 4
 
First: I second Noppa's comment. And even tested it in TP7 to be sure.
Yea, thats not surprising. Its a very powerful feature. TP3 does not short circuit, nor did original Pascal. 99% of the time, it shouldn't matter.

Though it is a common idiom in, for example, Perl.
Code:
x = do_something || die;

If the 'do_something' returns an error (0), the process quits. Here they're firing functions for side effects and relying on short circuiting as a bit of syntax sugar.
Shouldn't that be:
And, yes, my mistake.
 
Back
Top