PDM #: 200-205-036-01 File: QPRV2.MEM Specification for: DTE20 Queued Protocol Version 2 Date: 10 December 1976 Copyright (C) 1976 Digital Equipment Corporation, Maynard, Mass. This software is furnished under a license for use only on a single computer system and may be copied only with the inclusion of the above copyright notice. This software, or any other copies thereof, may not be provided or otherwise made available to any other person except for use on such system and to one who agrees to these license terms. Title to and ownership of the software shall at all times remain in DEC. The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. DEC assumes no responsibility for the use or reliability of its software on equipment which is not supplied by DEC. DTE20 Queued Protocol Version 2 Page 2 1.0 INTRODUCTION This represents Version 2 of the Queued Protocol and Version 3 of the Communications Region. 1.1 Definitions 1. Packet: A logical group of data (i.e. a single NSP message). 2. Transfer: A group of data transferred by the DTE20 upon which the sender has acted exactly once. 2.0 COMMUNICATIONS REGION DTE20 Queued Protocol Version 2 Page 3 2.1 Format !=======================================================================! -11 BITS !DEXWD1 ! DEXWD2 ! DEXWD3 ! !3!2! 0! 14! 12! 10! 8! 6! 4! 2! 0! 14! 12! 10! 8! 6! 4! 2! 0! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! -10 BITS !0!1! 3! 5! 7! 9! 11! 13! 15! 17! 19! 21! 23! 25! 27! 29! 31! 33! 35! !=======================================================================! \ \ ;START OF COMMUNCATIONS HEADER \ HEADERS FOR OTHER PROCESSORS \ \ \ !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! ! HEADER WORD FOR PROCESSOR 2 ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! ! HEADER WORD FOR PROCESSOR 1 ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! ! MBZ ! PROC. NUMBER ! ! RELATIVE ADDRESS OF THIS PROCESSORS AREA ! ;END OF COMMUNICATIONS HEADER !=======================================================================! PIDENT !X! VR ! !PROTO. VER.! # PROC. !SIZE ! NAME ! ;START OF COMM REGION !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! ; FIRST COMM AREA, OWNING SECTION CHNPNT ! POINTER TO NEXT COMM AREA ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! ! ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! ! ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! ! ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! KPALIV !0! KEEP ALIVE COUNT ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! \ \ \ UNUSED \ \ \ ;END OF OWNER'S SECTION OF AREA !=======================================================================! FORPRO !X!D!DT#! UNASSIGNED ! PROTYP !BSIZE! "TO" PROCESSOR NUMBER ! ;START OF FIRST "TO" SECTION OF AREA !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! PROPNT ! POINTER TO "TO" PROCESSOR'S OWNED COMM AREA ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! STATUS ! !L!I!V! UNUSED !R! CPQCNT ! QCOUNT ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! QSIZE ! TMODE ! PSIZE ! CSIZE ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! RELOAD !0! RELOAD PARAMETER FOR "TO" PROCESSOR ! !-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-! CPKPLV !0! OWNING PROCESSOR'S COPY OF "TO" PROCESSORS KEEP ALIVE COUNT ! ;END OF FIRST "TO" SECTION OF AREA !=======================================================================! \ \ \ OTHER SECTIONS FOR "TO" PROCESSORS \ \ \ ;END OF FIRST AREA !=======================================================================! \ \ \ OTHER COMMUNICATIONS AREAS \ \ \ !=======================================================================! DTE20 Queued Protocol Version 2 Page 4 2.2 Description Of Communications Region All processors have read access to the entire communications "region". There is a negative extension to the communications region called the header, which exists so that the -11 processors represented in the communications region can determine what their protocol processor number is. The communications region is composed of sequential "areas", each processor (both -10s and -11s) owning one and only one area, which is the only part of the communications region the processor is allowed to write into. During comm region initialization, however, a -10 processor is responsible for setup of the entire region, which means that this -10 processor must have write access to the entire comm region. After this operation, the rules for accessing the region apply. Each area has a single "section" for the processor owning the area, and a section for each processor that the processor owning the area will communicate to. Thus, each pair of processors that communicates with each other using the queued protocol utilizes four sections of the comm region; the processors' own sections of their respective areas (this is where the keep alive count is kept), and each processor's section in its area designated for communcation to the other processor. 2.3 HEADER Definitions The header is considered to be a negative extention to the communications region. It exists to provide a means for the -11 processors to determine what protocol processor number they have been assigned. The -11 can simply examine the first word of its relocated examine space to determine what its protocol processor number is and where in the communications region its own area is located. The -11 also knows that the first word of the communications region itself is at location N+1 in its relocated examine space, where N is the -11s protocol processor number. From there, it can scan the areas in the communcations region to find any areas owned by processors with which the -11 will communicate. Communications region word 0 starts immediately after the last header word. 2.3.1 MBZ - Must be zero 2.3.2 PROC. NUMBER - Processor number 2.3.3 RELATIVE ADDRESS - Offset, relative to communications region word 0, of the area owned by the processor to which this header word belongs. DTE20 Queued Protocol Version 2 Page 5 2.4 PIDENT Definitions 2.4.1 X - One if this area belongs to a 10 2.4.2 VR - Communications area version number -- set to value 3 2.4.3 PROTO. VER. - Protocol version number -- set to value 2 2.4.4 # PROC. - Number of processors represented in this area including the owning processor 2.4.5 SIZE - Size of owner's section in multiples of 8 word blocks 2.4.6 NAME - Name of processor owning this area (serial number) 2.5 CHNPNT Definitions 2.5.1 CHNPNT - Pointer (relative to word 0 of entire comm region) to next communications area, circular list 2.6 KPALIV Definitions 2.6.1 KPALIV - Keep alive count owning processor increments. The processor responsible for monitoring this count examines the count periodically to make sure it has changed. A word in the owner's section of each area is reserved to remember the last value of the keep alive, although it is not required that this word be used. It would be impractical for an -11 to remember the last count in the -10's memory, since it would have to do examines and deposits through the DTE20 to access the cell. The keep alive count should be incremented at least once a second by the processor owning the area, and the monitoring processor should allow the keep alive count to remain the same for at least 6 seconds before declaring the monitored processor dead. DTE20 Queued Protocol Version 2 Page 6 2.7 FORPRO Definitions FORPRO is the first location in each "to" processor section. 2.7.1 X - Set to 1 if this block will be used to communitcate to a 10 2.7.2 D - Set to one if there is a DTE connection between this and owning processor 2.7.3 DT - DTE number if D bit is set 2.7.4 PROTYP - Protocol type index. Defined types: 0 - RSX-20F; 1 - NSP. 2.7.5 BSIZE - Size of this block in multiples of 8 2.7.6 "TO" PROC. NUM. - Number of the processor which owning processor will communicate through this block 2.8 PROPNT Definitions 2.8.1 PROPNT - Pointer to "to" processors own comm area 2.9 STATUS Definitions The status word is set up so that it is the only word a processor has to examine when it receives a doorbell interrupt. This is especially useful to the 11, for whom an examine of 10 memory is painful. 2.9.1 L - If a processor wishes to be reloaded by a second processor, the first processor sets the L bit in the first processor's section of the second processor's comm area and rings the second processor's doorbell. The second processor then examines the STATUS word, sees the L bit set, and performs a reload operation for the first processor. The L bit gets set by a processor whenever it knows that it has crashed (i.e. inside its crash reporting routine). DTE20 Queued Protocol Version 2 Page 7 2.9.2 INIT (I) - Initialize - controls protocol initialization. See the section on "Initialization" for details of this operation. 2.9.3 V - Valid examine bit - If the examine protection word of a DTE is zero and the -11 connected to the other end of the DTE is either privileged or non-privileged with the PI0 enable bit on, any examine done by the connected -11 will appear to succeed, returning a data value of zero. The valid examine bit is a bit that is always guaranteed to be non-zero so that the -11 can be sure that its examines are succeeding. If this bit is zero, the owning processor wishes to leave the Queued Protocol. For resticted -11's, this implies that the -10 has crashed. For non-restricted -11's, this implies that the other processor should enter Secondary Protocol. 2.9.4 RCV (R) - Set to 1 by the receiver in sender's section of receiver's comm area after sender sets the @ bit or increments the queue count, and the receiver processes the doorbell. Cleared by receiver after receipt of to-receiver done. This bit exists so that the sender can make sure the receiver has not lost a done interrupt. 2.9.5 CPQCNT - Owning processor's copy of sending processor's QCOUNT 2.9.6 QCOUNT - A wrap around counter which is incremented by the owning processor every time a transfer request is initiated by the owning processor. The To-processor (reciever) keeps the last known value of QCOUNT in CPQCNT of the To-sender section of its own COMM area, and if this value differs from the To-reciever QCOUNT, then the reciever starts the transfer. This difference should be either zero or one. If this difference is greater than one, the sender has tried to send a packet before the previous packet was finished. This counter is also useful in case a doorbell interrupt is missed by the To-processor. If a doorbell is missed, the counters will be noticed at the next doorbell. 2.10 QSIZE Definitions DTE20 Queued Protocol Version 2 Page 8 2.10.1 TMODE - transfer mode index. Defined modes: 0 - 8 bit bytes, byte mode; 1 - 8 bit bytes, word mode; 2 - 16 bit bytes, word mode. 2.10.2 PSIZE - Number of bytes remaining in the current packet 2.10.3 CSIZE - Number of bytes in the current transfer. Reciever should initiate a transfer of CSIZE bytes whenever in notices that QCOUNT has changed. If CSIZE is equal to PSIZE, then this is the last transfer in the packet. Note that the DTE20 allows the reciever to segment a transfer in as many pieces as he/she desires. 2.11 RELOAD Definitions 2.11.1 RELOAD - Copy of "to" processor's reload word, saved by owning processor in case "to" processor crashes. Currently this word is sent to 11 inside 11 bootstrap code to select device. 2.12 CPKPLV Definitions 2.12.1 CPKPLV - Owning processor's copy of "to" processor's keep alive count 3.0 INITIALIZATION 3.1 Communications Region Initialization One of the -10's using this communications region must initialy set up all the pointers and sizes, and clear all the STATUS words. When a particular protocol processor wishes to initiate protocol operation, it must: 1. Set up it's owned region, and all "to" regions with which it desires to communicate. 2. Set the Valid Examine and INIT bits in each "to" area to be used. 3. If this is a -10, setup the DTE20 EPT locations (examine, deposit relocation, protection). 4. If this is a non-restricted Front End running Secondary Protocol, issue a "Leave Secondary Protocol" command. DTE20 Queued Protocol Version 2 Page 9 3.2 Queued Protocol Initialization Initialization is done by a handshaking process using the INIT bit in STATUS and the doorbells. At any time, a processor may request initialization of the Queued Protocol. This may imply data loss. Initialization may occur while the protocol is running, however the normal case will be a processor setting Valid Examine and INIT at the same time. The following describes the interactions required to inititialize, with S as the sender (initiator) and R as the reciever. 1. S and R idle. 2. S sets the To-R INIT bit. 3. S clears the To-R RCV, QCOUNT, and CPQCNT. 4. S aborts any To-R transfers in progress. 5. S resets it's To-R and To-S queues. 6. S rings the To-R doorbell. 7. R detects the doorbell. 8. R reads the To-R STATUS word. 9. R clears the doorbell. 10. R notes that the To-R INIT bit is set, and the To-S INIT bit is not set. 11. R sets the To-S INIT bit. 12. R clears the To-S RCV, QCOUNT, and CPQCNT. 13. R aborts any To-S transfers in progress. 14. R resets it's To-S and To-R queues. 15. R rings the To-S doorbell. 16. S detects the doorbell. 17. S reads the To-S STATUS word. 18. S clears the doorbell. 19. S notes that the To-S INIT bit is set, and the To-R INIT bit is set. 20. S clears the To-R INIT bit. 21. S rings the To-R doorbell. DTE20 Queued Protocol Version 2 Page 10 22. R detects the doorbell. 23. R reads the To-R STATUS word. 24. R clears the doorbell. 25. R notes that the To-R INIT bit is not set, but the To-S INIT bit is set. 26. R clears the To-S INIT bit. 27. R rings the To-S doorbell. 28. R can now start normal protocol operation, as described under "Operation". 29. S detects the doorbell. 30. S reads the To-S STATUS word. 31. S clears the doorbell. 32. S notes that the To-S INIT bit is not set, and the To-R INIT bit is also not set. 33. S can now start normal protocol operation, as described under "Operation". The states and transitions involved in this protocol initialization can be described by two conditions: 1. The owning processor's INIT bit (my To-him INIT). 2. The other processor's INIT bit (his To-me INIT). Also, to avoid races, the following considerations must be taken: 1. No action may be taken while my To-him doorbell is still set. 2. The initiator of the initialization (S) must wait until the other processor's To-S INIT bit has been set. These states are summarized in the following table. My His To-him To-me State or Action INIT INIT 0 0 Normal operation 0 1 Initialize request: Initialize and set my To-him INIT 1 0 Sender: No action -- wait for his To-me INIT DTE20 Queued Protocol Version 2 Page 11 to go to 1 Reciever: Initialization complete: Clear my To-him INIT 1 1 Initialization complete: Clear my To-him INIT 4.0 OPERATION The procedure to transfer a complete logical packet is as follows, with the sender represented as S and the reciever as R. 1. S and R idle. 2. S sets up a data packet. 3. S sets the To-R PSIZE to the size of the packet. 4. S sets up the following, in any order: 1. The To-R CSIZE to the size of the packet, or the first part, or the next part, or the last part. CSIZE must be less than or equal to PSIZE (equal in the case of the entire packet or last part of a packet). 2. The To-R TMODE to the desired transfer mode. 3. If S is an -11, the DTE20 byte mode bit set/clear. 4. The DTE20 send address or byte pointer. 5. S incremenents its To-R QCOUNT. 6. S rings the To-R doorbell. 7. R detects the doorbell. 8. R reads the To-R STATUS word. 9. R clears the doorbell. 10. R notes that the To-R QCOUNT is exactly one greater than R's copy of it in the To-S CPQCNT. 11. R sets its To-S RCV bit. 12. R moves the new To-R QCOUNT into the To-S CPQCNT. 13. If R is an -11, it sets the DTE20 byte mode bit according to the To-R TMODE field. DTE20 Queued Protocol Version 2 Page 12 14. R sets up the DTE20 recieve address or byte pointer. 15. R reads the To-R CSIZE. 16. If this is the last part of the transfer, R sets the DTE20 I bit to indicate "done" to both processors. 17. R sets the DTE20 byte or word count to any transfer size less than or equal to the number of bytes left in the current transfer, which starts the transfer. 18. R gets To-R "done". 19. If there is more to the current transfer, R continues at (14). 20. S gets To-R "done". 21. R clears the To-S RCV bit. 22. S decrements the To-R PSIZE by the current To-R CSIZE. 23. If there is more to the current packet (PSIZE greater than zero), S continues at (4). 24. Packet complete. 5.0 PACKET CONTENTS There are no restrictions on packet contents. 6.0 REDUNDANCY AND ERROR DETECTION Some amount of redundancy has been incorporated in the Queued Protocol, where it would not severely impair performance. The DTE20 is intended to provide an error-free path between the two processors, but both hardware and software have a history of never reaching this goal. 6.1 Error Detection DTE20 Queued Protocol Version 2 Page 13 6.1.0.1 Hardware Detected Errors - The DTE20 provides us with several error conditions. 1. Transfer Error (To-11 or To-10 error) 2. E-Buss parity error 3. Clock Error Stop 4. Halt Also, the system hardware will tell us about Power Restart. 6.1.0.2 Software Detected Errors - The software can detect some errors due to either non-cooperating software on the other processor or hardware malfunction. 1. Deposit or examine failure 2. Transfer did not transfer requested number of bytes 3. QCOUNT not equal to CPQCNT or CPQCNT+1 4. Re-boot request from other processor (LOAD set to 1 in STATUS) 6.1.0.3 Periodic Hung Checks And Timeouts - The software can periodically check the status of transfers and the COMM region to insure that both the hardware and software have performed requested operations even though acknowledgement was not recieved. 1. Missed doorbell: 1. Valid Examine zero 2. INIT set to 1 3. LOAD set to 1 4. QCOUNT not equal to CPQCNT 2. Sender missed "done" (To-R QCOUNT equal to To-S CPQCNT and To-S RCV zero) 3. Reciever missed "done" (by timeout) 4. Keep Alive count not incremented (KPALIV equal to CPKPAL) DTE20 Queued Protocol Version 2 Page 14 5. Initialize sequence timeout The doorbell missed conditions should be checked once a second, timeouts for Initialize, Keep Alive, and transfer timeouts should be on the order of 5 seconds. 6.2 Error Correction A variety of ways exist to treat error conditions. However, in such a closely controlled enviornment such as the DTE20, it is quite reasonable to provide destructive error processing in many cases. The possible ways to treat these conditions are detailed below. 6.2.0.1 Ignore Error - The only error which should be ignored is Keep Alive ceased on restricted Front Ends. 6.2.0.2 Assume Event Occured - For all errors detected by periodic checking or timeout, this method of recovery is suggested. These type of errors usually indicate some form of more complex condition, which will (hopefully) be detected by another method. A warning message should be printed and/or the error logged. A threshold counter should be incremented, such that repeated timeout errors will cause a fatal error condition. 6.2.0.3 Attempt Recovery Or Retry - This is a complicated solution, and, considering the enviornment, not worth the extra code and time. 6.2.0.4 Re-initialize Protocol - This is a reasonable, though not required, recovery for protocol and transfer errors. This is the only reasonable recovery for Power Restart. As above, a warning message should be printed and/or the error logged, as well as a threshold counter incremented and checked to prevent indefinite retrying of this recovery attempt. 6.2.0.5 Re-boot Other Processor - This recovery procedure is not available on restricted Front Ends, which should print a warning message and/or log the error in these cases. The following errors should be handled in this manner: 1. Keep Alive ceased (non-resticted Front Ends only) DTE20 Queued Protocol Version 2 Page 15 2. Re-boot request from other processor 3. Hardware DTE20 or KL10 errors: 1. Clock Error Stop 2. Halt 3. Deposit or examine failure 4. E-Buss parity error 6.2.0.6 Request Re-boot For Self - This method of recovery should be used when all else fails, or an internal redundancy check has detected an illegal condition. 6.3 Crash Detection "Crash" is defined as the irrecoverable state of a processor which requires re-boot and may exist for a long period of time. 6.3.1 Re-boot Conditions - All re-boot conditions described above are defined as entering a crash state. Note that restricted Front Ends cannot perform a re-boot, but this state can be broadcast to other network nodes. 6.3.2 Valid Examine - The Valid Examine bit in the STATUS word indicates that a processor is using the Queued Protocol. If a processor clears the Valid Examine bit, the Queued Protocol is assumed to be turned off. For restricted Front Ends, this is a crash condition: The other processor should treat it as such. For non-restricted Front Ends, this implies enter Secondary Protocol. No other action should be taken. 6.3.3 Keep Alive - Note that Keep Alive ceased is a re-boot condition for non-restricted Front Ends only. Keep Alive will not be maintained or checked while Valid Examine is clear (for either processor). [End of QPRV2.RNO]