Notes about a PDP-8 version of small-C:
(These notes are about a theoretical optimizing 
compiler, not yet the one implemented here.)

Peephole optimization:
        TAD X
        CIA
        TAD Y
        CIA
to
        TAD Y
        CIA
        TAD X
likely generated by simple subtraction.

Peephole optimization:
        TAD X
        DCA Y
        TAD Z
        DCA Y
to
        TAD Z
        DCA Y
likely generated by expression parsing.

Peephole optimization:
        CDF n
        <stuff without CDF>
        CDF n
to
        CDF n
        <stuff without CDF>
likely generated a lot if we assume an indirect referencing style 
for non-pointer data.  Possibly also by pointers, if we can infer 
a constant "n".

Location contents are generally represented as a constant with a 
symbol relocation.  Thus
        FOO
        FOO+4
can be represented.  MRI instructions require an alternative
representation, as their treatment of the relocation of the
operand is different.

Recursion is quite difficult to do efficiently.  It will be 
necessary to eliminate it, or at the least to explicitly 
declare it.  Explicit declaration may not be sufficient;
if x calls y which calls z, and x and z are declared recursive, 
any call path from z back to x must infer y (and others) as 
recursive too.  This would not be needed for subtrees containing
no recursive functions.

A minimal type structure that would be subset-ANSI would be 
chars at one word, ints at 2 words.  Longs require a minimum 
of 3 words, if implemented.  If the representation of a 
pointer is as a (CDF field, offset) pair, there should be
no problem with pointers fitting into the 2 word int.

If ints are 2 words, fast extended precision may be worth the 
code size penalty of inlinining.  (Int is expected to be fast, 
after all.)  A more "threaded code" model would conserve space:
        JMS ADD
        X       ; need far pointers here?
        Y

About far calls -- should the generated code assume all functions 
are near, and let the "linker" optimize function placement and 
call code?

Similar question for data -- local data can probably be presumed 
to be near, and data pointer references might be presumed to be far, 
but what about global data?  Can the linker rewrite code to remove 
extra CDF instructions?

As far as development goes, would it be advantagious to work out an 
assembler/linker tool chain, or to go whole hog with a single optimizing 
compiler?

I'm thinking the "linker" would like to control the page-ination of 
the code and data.  As such, I am thinking that a "procedure" declaration 
should include stuff marked as "code" or "data", so the linker can 
keep that stuff together, and optimize the sharing of data, etc.
        .proc main
        .text
        .globl main
main,   jms printf
        hello
        jms exit
        .data
hello,  .asciz /Hello World/
        .endp main
Here the linker would see that exit, and possibly printf, will fit in 
the current page, and can look for opportunity to share data with the 
"Hello World" string.  Are explicit "extern" declarations here?  That 
would allow one to inform the linker that "printf" and "exit" are both 
expected to be code, not data.  (Then again they are the target of 
JMS, which would allow the linker to infer as much.)

It would be advantagious to have a ".proc" per compound statement, 
instead of per procedure.  Then the linker can more easily repack
code around the branches of "if", "while", "switch", etc.  Doing 
so at least partially decouples code from it's data, though.  Is 
it better to consider all the data for co-location with the procedure?

Is it better to model as "code with continuation", or just have 
instruction sequences ending with a JMP?  What about skip OPR
instructions?  Should they have a pair of continuations, with the 
linker able to re-order them by adjusting the "skip complement" bit?

Generally, I'm assuming the linker is permitted to re-order data as 
well as code.  Is there to be any way to create "records" -- that is,
data which must be kept together?  If "FOO" can be an alias for "FIE+n", 
that should suffice, since FIE can presumably be of arbitrary size (up
to 4K, anyway).