Notes about a PDP-8 version of small-C: (These notes are about a theoretical optimizing compiler, not yet the one implemented here.) Peephole optimization: TAD X CIA TAD Y CIA to TAD Y CIA TAD X likely generated by simple subtraction. Peephole optimization: TAD X DCA Y TAD Z DCA Y to TAD Z DCA Y likely generated by expression parsing. Peephole optimization: CDF n CDF n to CDF n likely generated a lot if we assume an indirect referencing style for non-pointer data. Possibly also by pointers, if we can infer a constant "n". Location contents are generally represented as a constant with a symbol relocation. Thus FOO FOO+4 can be represented. MRI instructions require an alternative representation, as their treatment of the relocation of the operand is different. Recursion is quite difficult to do efficiently. It will be necessary to eliminate it, or at the least to explicitly declare it. Explicit declaration may not be sufficient; if x calls y which calls z, and x and z are declared recursive, any call path from z back to x must infer y (and others) as recursive too. This would not be needed for subtrees containing no recursive functions. A minimal type structure that would be subset-ANSI would be chars at one word, ints at 2 words. Longs require a minimum of 3 words, if implemented. If the representation of a pointer is as a (CDF field, offset) pair, there should be no problem with pointers fitting into the 2 word int. If ints are 2 words, fast extended precision may be worth the code size penalty of inlinining. (Int is expected to be fast, after all.) A more "threaded code" model would conserve space: JMS ADD X ; need far pointers here? Y About far calls -- should the generated code assume all functions are near, and let the "linker" optimize function placement and call code? Similar question for data -- local data can probably be presumed to be near, and data pointer references might be presumed to be far, but what about global data? Can the linker rewrite code to remove extra CDF instructions? As far as development goes, would it be advantagious to work out an assembler/linker tool chain, or to go whole hog with a single optimizing compiler? I'm thinking the "linker" would like to control the page-ination of the code and data. As such, I am thinking that a "procedure" declaration should include stuff marked as "code" or "data", so the linker can keep that stuff together, and optimize the sharing of data, etc. .proc main .text .globl main main, jms printf hello jms exit .data hello, .asciz /Hello World/ .endp main Here the linker would see that exit, and possibly printf, will fit in the current page, and can look for opportunity to share data with the "Hello World" string. Are explicit "extern" declarations here? That would allow one to inform the linker that "printf" and "exit" are both expected to be code, not data. (Then again they are the target of JMS, which would allow the linker to infer as much.) It would be advantagious to have a ".proc" per compound statement, instead of per procedure. Then the linker can more easily repack code around the branches of "if", "while", "switch", etc. Doing so at least partially decouples code from it's data, though. Is it better to consider all the data for co-location with the procedure? Is it better to model as "code with continuation", or just have instruction sequences ending with a JMP? What about skip OPR instructions? Should they have a pair of continuations, with the linker able to re-order them by adjusting the "skip complement" bit? Generally, I'm assuming the linker is permitted to re-order data as well as code. Is there to be any way to create "records" -- that is, data which must be kept together? If "FOO" can be an alias for "FIE+n", that should suffice, since FIE can presumably be of arbitrary size (up to 4K, anyway).