Part V: Timer Queues


Writing VMS Privileged Code

Part V: Timer Queues


Hunter Goatley

Edward A. Heinrich

Keeping track of the time is an integral part of any computer operating system. The actual time-keeping implementation varies based upon factors such as the type of clocks present in the computer hardware and the environment that the operating system is designed to support. A real-time operating system supporting robotic manufacturing applications generally needs more granularity and flexibility in its time-keeping subsystem than a single-user PC-based operating system.

In the VAX family of processors, different hardware implementations are used to provide the two architecturally required hardware clocks: the time-of-year, or TOY, clock and the interval timer. In the Alpha AXP architecture the interval clock is referred to as the “battery-backed watch”. The AXP architecture also implements process (PCC) and system (SCC) cycle counters that can be used to measure short intervals. The OpenVMS AXP executive routines EXE$DELAY, EXE$TIMEDWAIT_SETUP, and EXE$TIMEDWAIT_COMPLETE all make use of the SCC to provide nanosecond granularity.

The VAX architecture stipulates the interval timer must generate a hardware interrupt every 10 milliseconds, or ten times per second, and the VAX hardware guarantees a hardware interrupt every 10 ms. The AXP architecture requires a clock with a greater granularity — all the current AXP systems interrupt at IPL 22 1024 times per second. Since “VMS is VMS” on both platforms, OpenVMS AXP uses a counter, stored in the per-processor database in field CPU$L_SOFT_TICK, that is decremented every time the hardware clock generates an interrupt, to emulate VAX-style 10 millisecond interrupts, (known as a SOFT TICK). Note that the AXP implementation does NOT guarantee the interval timer routine, fired when the CPU$L_SOFT_TICK entry reaches zero, will occur at precise 10 ms intervals. Rather, over time the aggregate interval will be 10 ms. (The actual algorithms, along with the rationale for choosing them, are very well documented in the source code found in module TIMESCHD_MON.MAR or TIMESCHD_MIN.MAR on the AXP source listing CD).

When the hardware interrupt is generated on a VAX, or the soft tick counter has reached zero on an AXP, OpenVMS, inside the interrupt service routine EXE$HWCLKINT, performs the following actions: the system time is updated; the Timer Queue is checked to see if any TQEs have come due; process and CPU accounting is performed; and the SMP sanity timer is checked, if there are multiple CPUs in the system.

The Internals and Data Structures manual, in the chapter on Time Support, contains details on the implementation of the timer support, including the use of privileged registers and OpenVMS internal routines that are used to update the time.

As systems programmers, we often need to interface with the time keeping routines and data structures for one of two reasons: to obtain the current time, or to request that the OpenVMS operating system performs a function on our behalf at some calculated future point in time. For example, a utility such as MONITOR that collects statistics may need to “wake up” at some regularly defined interval to display the accumulated statistics or write them to secondary storage.

To obtain the current system date and time, one can call the SYS$GETTIM system service or, if the processor is running in kernel mode, a MACRO-32 program can use the READ_SYSTIME macro. The current time is stored in the executive as a quadword in EXE$GQ_SYSTIME, as the number of 100-nanosecond intervals since Nov 17, 1858. Since this is a quadword value and is updated via multiple hardware instructions, the READ_SYSTIME macro acquires the HWCLK spinlock to prevent the hardware clock interrupt service routine, EXE$HWCLKINT, from executing and updating the time while it is being read. The READ_SYSTIME macro takes one argument, the address of a quadword into which the time is to be written. If you are writing in VAX MACRO and specify a VAX register as the destination, READ_SYSTIME will write the value into that register and the next highest register. For example, the following macro invocation copies the 64-bit time representation from EXE$GQ_SYSTIME into R2 and R3:

           READ_SYSTIME R2            ; Put current time in R2/R3

For keeping track of events that must be executed at a future time, OpenVMS maintains a doubly-queued list of Timer Queue Entries, TQEs, with the list head located at EXE$GL_TQFL. TQEs are allocated from non-paged pool; access to the queue is synchronized by the acquisition of the TIMER spinlock. The TQE list can be viewed as the list of events that OpenVMS must process in the future. Non-privileged code threads can allocate a TQE by calling the SYS$SCHDWK and SYS$SETIMR system services. Kernel mode code can directly access the TQE queue to insert and remove TQE entries as necessary. From kernel mode, a TQE can be allocated by calling the executive routine EXE$ALLOCTQE via the JSB linkage. A TQE can also be embedded in another data structure, which allows the address of that structure to be easily passed to the routine to be executed when the TQE comes due.

The format of a TQE is as follows:

       |        Forward link           |       TQE$L_TQFL
       |       Backwards link          |       TQE$L_TQBL
       |RQType| Type   |      Size     |       TQE$W_SIZE,TQE$B_TYPE,TQE$B_RQTYPE
       |           PID / FPC           |       TQE$L_PID / TQE$L_FPC
       |           AST / FR3           |       TQE$L_AST / TQE$L_FR3
       |        ASTPRM / FR4           |       TQE$L_ASTPRM / TQE$L_FR4
       |           Absolute            |       TQE$Q_TIME
       | - - - - - - - - - - - - - - - |
       |       Expiration time         |
       |         Delta Repeat          |       TQE$Q_DELTA
       | - - - - - - - - - - - - - - - |
       |            Time               |


       |            RMOD               |       TQE$L_RMOD
       |            EFN                |       TQE$L_EFN


       |   Reserved    | EFN   | RMOD  |       TQE$B_RMOD, TQE$B_EFN

       |         Requester PID         |       TQE$L_RQPID
       |        Process CPU time       |       TQE$L_CPUTIM

The fields TQE$L_TQFL and TQE$L_TQBL are the standard queue linkages associated with OpenVMS doubly-linked queue entries. TQE$W_SIZE holds the size of the TQE, which is TQE$K_LENGTH plus any additional fields specified by the user. TQE$B_TYPE contains the standard OpenVMS structure type, DYN$C_TQE.

TQE$B_RQTYPE tells OpenVMS what type of timer queue entry this is. There are currently three bit mask values defined symbolically: TQE$M_REPEAT for a repeating TQE entry; TQE$M_ABSOLUTE for a TQE entry that comes due at an absolute time, as opposed to a delta time interval; and TQE$M_CHK_CPUTIM, which was introduced in VMS Version 5.0, to allow for the expiration to occur as a result of accumulated CPU time. In addition to these bit mask values, the low-order two bits of TQE$B_RQTYPE are used to specify the context for which the TQE is destined. (These are defined as constant values instead of bit mask/value pairs). A TQE can be used to describe either a scheduled wake-up request for a process, by setting both bits, TQE$C_WKSNGL, in TQE$B_RQTYPE, or a request to execute a subroutine at a predefined time, TQE$C_SSSNGL. If the value contained in these bits is a zero, TQE$C_TMSNGL, then the TQE is a process based timer request and is a one-shot TQE. (It is illegal to set TQE$M_REPEAT if bits 0 and 1 are both clear). If bit 0 is set, then this is for a system subroutine and it is assumed that the code executed as a result of the TQE coming due executes in system context. This is often the type of TQE requested by systems programs, with the TQE$M_REPEAT bit set if we desired a recurring timer request. TQEDEF also defines TQE$C_ SSREPT and TQE$C_WKREPT which are simply the TQE$C_SSSNGL and TQE$C_WKSNGL bits combined with TQE$M_REPEAT.

The symbolic values defined in $TQEDEF that can be specified in TQE$B_RQTYPE are listed in Table 1-1.



     TQE$M_REPEAT     4           Requeue TQE after expiration
     TQE$M_ABSOLUTE   8           Absolute time specified, as opposed
                                  to DELTA
     TQE$M_CHK_       16          Base calculation upon accumulated CPU
     CPUTIM                       time
     TQE$C_TMSNGL     0           Timer entry is a single shot request
     TQE$C_SSSNGL     1           Timer is system subroutine single
                                  shot request
     TQE$C_WKSNGL     2           Timer is wake entry single shot
     TQE$C_SSREPT     5           Timer is a system subroutine repeat

Table 1-2 summarizes the TQE fields commonly initialized by a program.



     TQE$W_SIZE       Holds the size of the TQE structure. This is
                      either TQE$K_LENGTH, or a larger value if extensions
                      to the standard TQE are made.

     TQE$B_TYPE       Standard OpenVMS structure type code, DYN$C_TQE.

     TQE$B_RQTYPE     Tells OpenVMS the type of TQE entry this is:

                      TQE$C_TMSNGL is a process-based single shot timer

                      TQE$C_SSSNGL is a system routine single shot
                      timer request.

                      TQE$C_WKSNGL specifies that the process is to be
                      awaken once.

                      TQE$C_SSREPT is a repeating system routine TQE.

                      TQE$C_WKREPT specifies this is a repeating TQE to
                      wake the process at every interval.

     TQE$L_PID        For wake-up and process-based requests, this
                      field contains the PID of the process that either
                      is awakened or receives the AST associated with
                      the TQE.

     TQE$L_FPC        For system routine requests, this field contains
                      the address of the routine to be executed, via a
                      VAX JSB instruction, when the TQE expires.

     TQE$L_AST        For process-based requests, this field contains
                      the address of the AST executed as a procedure.

     TQE$L_FR3        For system routines, TQE$L_FR3 contains the value
                      to be copied to R3 prior to invoking the routine
                      contained in TQE$L_FPC.

     TQE$L_ASTPRM     For process-based requests, contains the parameter
                      passed to the procedure, at offset 4(AP), in

     TQE$L_FR4        For system routines, TQE$L_FR4 contains the
                      value to be loaded into R4 prior to invoking
                      the contained in TQE$L_FPC.

     TQE$Q_TIME       This field contains the 64-bit time when the
                      TQE is to be removed from the list and the AST
                      or system routine specified in TQE$L_FPC/TQE$L_AST

     TQE$Q_DELTA      This field contains the 64-bit delta time interval
                      that is used for repeating requests to
                      compute the next absolute expiration time of the

     TQE$B_RMOD [VAX] For process-based TQE entries, this field contains
     TQE$L_RMOD [AXP] the mode that the AST procedure is executed in.

     TQE$B_EFN  [VAX] For process-based requests, TQE$[B/L]_EFB contains
     TQE$L_EFN  [AXP] event flag to be set when the request comes due.

     TQE$L_RQPID      PID of requesting process.


Typically, applications either want a one-shot TQE, in which the timer request comes due once and is then discarded, or a repeating TQE, in which the timer repeats at the same interval until it is cancelled or the machine crashes or is shutdown (systems programmers should always hope a shutdown occurs before a crash occurs!). For example, in a program such as a performance monitoring utility, where statistics are displayed or recorded once per interval, a repeating TQE is preferred since we would want to record the accumulated statistics after every interval. In a program that performs network connections, we may write data to a peripheral device, an Ethernet controller in this example, and then wait either for a response from our network partner or until a certain number of seconds has elapsed without receiving a response from the network. In this case, a one-shot TQE would be preferred since we cannot be guaranteed that at every interval we would have a pending network connection request waiting for a response. OpenVMS supports both types of timer requests and determines between the two based upon the information contained in TQE$B_RQTYPE.

If the TQE is for a process-based timer request, the requesting code thread MUST charge the destination process for a TQE by debiting the TQE quota contained in the JIB. This is necessary as the code in EXE$SWTIMINT will credit a TQE to the process when the TQE expires by incrementing JIB$W_TQCNT. In addition, process-based requests that desire an AST to execute as a result of the TQE coming due must set the ACB$M_QUOTA bit in TQE$[B/L]_RMOD and charge the destination process for an ACB by decrementing PCB$W_ ASTCNT. (Specifying a process-based TQE request and not setting ACB$M_QUOTA causes OpenVMS to declare the AST resource available to the destination process).

In order to insert a TQE on the timer queue, a program invokes EXE$INSTIMQ via the JSB linkage. The 64-bit absolute expiration time is passed in the R0/R1 register pair, and the address of the TQE is passed in R5. (IPL must be at IPL$_TIMER or lower.) EXE$INSTIMQ acquires the TIMER spinlock and copies the absolute expiration time from R0/R1 into TQE$Q_TIME. The routine then traverses the existing pending TQE list, whose listhead is at EXE$GL_TQFL, and compares the R0/R1 expiration time with the values contained in the TQE$Q_TIME field of the already on-queue TQE entries. When the value contained in an on-queue TQE is greater than or equal to the R0/R1 value, the new TQE is inserted in front of the existing TQE entry. The TQE list is maintained as an absolute queue and manipulated with the VAX INSQUE and REMQUE hardware instructions or the AXP PALcode pseudo-instructions. Synchronization of the TQE queue is enforced via ownership of the TIMER spinlock. Entries are maintained in expiration order to optimize the checks required for determination of an expired entry.

OpenVMS also keeps a record of the absolute due time of the first TQE element in the global location EXE$GQ_1ST_TIME. If the new TQE is inserted at the head of the queue, then EXE$INSTIMQ updates EXE$GQ_1ST_TIME with the 64-bit absolute time contained in R0/R1.

The following MACRO-32 code fragment allocates a TQE, by invoking EXE$ALLOCTQE, for repeatedly executing a system routine. Note that the routine does not charge any quotas.

       $TQEDEF                                 ; Timer Queue Entry

       SECONDS         = -1000*1000*10         ; 1 second

               JSB     G^EXE$ALLOCTQE          ; Allocate a TQE
               MOVB    #TQE$C_SSREPT, TQE$B_RQTYPE(R2)
                                               ; Repeating system subroutine
                                               ; Address of routine to execute
               MOVL     #PSL$C_KERNEL, TQE$L_RMOD(R2)
                                               ; Kernel mode desired
               CLRL     TQE$L_EFN(R2)          ; No event flag necessary
               MOVB #PSL$C_KERNEL, TQE$B_RMOD(R2)
               CLRB TQE$B_EFN(R2)           ; No EFN specified

The following is a MACRO-32 code fragment that demonstrates one way of allocating a TQE and charging the calling process for an AST, a TQE, and the number of bytes of non-paged pool required to contain the actual TQE. Note that the code is assumed to be executing in kernel mode.

       ;       Since this TQE will execute on behalf of the calling process, we
       ;       charge the process for all the system resources it uses.  We begin
       ;       by charging the process for both a TQE and an AST.
               MOVL    G^CTL$GL_PCB, R4        ; Obtain PCB address in R4
               MOVL    PCB$L_JIB(R4), R0       ; Copy address of JIB to R0
               ADAWI   #-1, JIB$W_TQCNT(R0)    ; Charge TQE against the process
               BLSS    200$                    ; Branch if no quota left
               ADAWI   #-1, PCB$W_ASTCNT(R4)   ; Decrement # of AST's left
               BGEQ    400$                    ; Branch if we got one
       ;       Add back quotas subtracted if we encountered insufficient quotas.
           0$: ADAWI   #1, PCB$W_ASTCNT(R4)    ; Put back the AST quota we took
         200$: ADAWI   #1, JIB$W_TQCNT(R0)     ; Add back TQE quota deducted
               MOVZWL  #SS$_EXQUOTA, R0        ; Include final status code
               BRW     ABT                     ; And return to caller w/ error
       ;       Allocate non-paged pool to contain the TQE.  The EXE$DEBIT_BYTCNT_ALO
       ;       routine handles debiting and checking of BYTLIM quota.
         400$: MOVZWL  #TQE$K_LENGTH, R1       ; Size of a TQE entry
               PUSHL   R3                      ; Save R3 contents
               JSB     G^EXE$DEBIT_BYTCNT_ALO  ; Some non-paged pool please
               POPL    R3                      ; Restore previous R3 value
               BLBC    R0, 0$                  ; Return deducted quotas & exit if none
       ;       R1      now contains the size of pool that was allocated
       ;       R2      contains the address of the pool to use as a TQE
               CLRQ    (R2)                    ; Say not on queue yet
               MOVW    R1, TQE$W_SIZE(R2)      ; Set size in TQE field
               MOVB    #DYN$C_TQE, TQE$B_TYPE(R2)
                                               ; Type is TQE
               MOVB    #TQE$C_SSREPT, TQE$B_RQTYPE(R2)
                                               ; Repeating system subroutine TQE
               MOVAB   TQE_EXPIRATON, TQE$L_FPC(R2)
                                               ; Address of routine to execute
               MOVL    R6, TQE$L_FR3(R2)       ; EPB address into R3, TQE$L_FR3
               CLRL    TQE$L_FR4(R2)           ; No AST address, TQE$L_FR4
               CLRQ    TQE$Q_TIME(R2)          ; Skip over TQE$Q_TIME (absolute time)
               EMUL    P4(AP), #SECONDS, #0, R9; Convert seconds to ticks, in R9/R10
               MOVQ    R9,  TQE$Q_DELTA(R2)    ; Delta time to TQE$Q_DELTA

.IF     DEFINED AXP                            ; Architectural differences
               MOVL    #PSL$C_KERNEL, TQE$L_RMOD(R2)
                                               ; Mode is kernel
               CLRL    TQE$L_EFN(R2)           ; Event flag field
               MOVB    #PSL$C_KERNEL, TQE$B_RMOD(R2)
                                               ; Mode is kernel
               CLRB    TQE$B_EFN(R2)           ; Event flag field
               CLRW    TQE$B_EFN+2(R2)         ; Undefined word
.ENDC                                          ; AXP vs. VAX offsets

               CLRL    TQE$L_RQPID(R2)         ; Clear rest of TQE
               CLRL    TQE$L_CPUTIM(R2)
       ;       Enqueue TQE. R5 will contain TQE address, R0/R1 quadword expiration time
               PUSHL   R5                      ; Preserve UCB address
               MOVL    R2, R5                  ; Copy TQE address to R5
               READ_SYSTIME R0                 ; Obtain current 64 bit time in R0/R1
               ADDL    TQE$Q_DELTA(R5), R0     ; Add in delta time, low-order 32 bits
               ADWC    TQE$Q_DELTA+4(R5), R1   ; Done in two parts to catch overflow

        1000$: JSB     G^EXE$INSTIMQ           ; Insert in the queue (Start the clock)
               POPL    R5                      ; Restore UCB address to R5

TQE Expiration

When the hardware clock interrupts, the processor dispatches to EXE$HWCLKINT to service the interrupt. As part of the logic of this interrupt handler, it updates the system time maintained in EXE$GQ_SYSTIME and compares the new time with EXE$GQ_1ST_TIME. If the comparison determines that one or more TQE entries has come due, a software timer interrupt is requested.

The interrupt service routine for the software timer is EXE$SWTIMINT, which removes the first TQE that has expired (by definition, at the list head) and, based upon the low-order two bits in TQE$B_ RQTYPE, dispatches to process the TQE. Wakeup requests simply modify the process state by calling SCH$WAKE. Process requests cause EXE$SWTIMINT to copy TQE$[B/L]_RMOD to AST$B_RMOD to reformat the TQE into an ACB, and issue a call to SCH$QAST to execute the routine specified in TQE$L_AST as an AST in the process. NOTE: Since the TQE$Q_TIME field overlays ACB$L_KAST, a process-based request cannot request a special kernel mode AST.

The more interesting case is that of a system subroutine request. When either TQE$C_SSSNGL or TQE$C_SSREPT is set in TQE$B_RQTYPE, the values from TQE$L_FR3 and TQE$L_FR4 are copied to R3 and R4 respectively, and then transfers control to the routine contained in TQE$L_FPC via a JSB instruction. Note that all spinlocks are released prior to issuing the JSB and that the address of the TQE is left in R5. Upon completion, the subroutine must relinquish any spinlocks it acquired and RSB back to EXE$SWTIMINT with the address of a TQE in R5. If the TQE$M_REPEAT bit is set in TQE$B_RQTYPE upon return to EXE$SWTIMINT, then EXE$SWTIMINT adds the value contained in TQE$Q_DELTA to TQE$Q_TIME to compute the next expiration time and calls EXE$INSTIMQ to requeue the TQE. Adding TQE$Q_DELTA to TQE$Q_TIME, instead of to EXE$GQ_SYSTIME, ensures that the desired interval is achieved regardless of the time required to execute the routine specified in TQE$L_FPC. It also does not incur the overhead of acquisition of the HWCLK spinlock.

Cancelling the AST

If the routine determines that it does not want the TQE to be re-queued, or the TQE is not for a repeating event and is to be deallocated or used as another data structure, the routine can substitute the address contained in EXE$AR_TQENOREPT for the TQE address in R5. EXE$AR_TQENOREPT contains the address of a permanently defined TQE that never comes due and has the TQE$M_REPEAT bit clear, and is the supported mechanism for not returning with the original TQE contents in R5.

The following MACRO-32 code fragment handles the case of deallocating the TQE and performing the substitution of EXE$AR_TQENOREPT:

       ;       We have a TQE that we want to deallocate.  First add back
       ;       back the quotas we deducted when we started up.
               MOVL    G^CTL$GL_PCB, R4        ; Get current process PCB address
               MOVL    PCB$L_JIB(R4), R0       ; Address of JIB to R0
               ADAWI   #1, PCB$W_ASTCNT(R4)    ; Put back the AST quota we took
               ADAWI   #1, JIB$W_TQCNT(R0)     ; And the TQE quota we deducted

       ;       Next delete the TQE entry and substitute the system-wide norepeat TQE
       ;       address for the TQE address in R5.  This is done as we must return to
       ;       EXE$SWTIMINT with a valid TQE address in R5.  There is no support in
       ;       OpenVMS for removing a system routine TQE other than this substitution.
               MOVB    #DYN$C_IRP, IRP$B_TYPE(R5)
                                               ; Set type code to something OpenVMS knows
               MOVL    R5, R0                  ; Get TQE address in R0
               JSB     G^EXE$DEANONPAGED       ; And toss TQE back in the pool
               MOVL    G^EXE$AR_TQENOREPT, R5  ; Replace R5 contents w/ permanent TQE
               RSB                             ; Return to EXE$SWTIMINT

Removing Unexpired TQE Elements

There may be instances where it is necessary to cancel an enqueued TQE. The OpenVMS routine EXE$RMVTIMR exists to remove one or more TQE entries that meet certain criteria. EXE$RMVTIMR will dequeue TQE entries that match the PID passed in R5, the entry type contained in R4, the request id specified in R3, and that were enqueued with an access mode greater than or equal to that specified in R2. If the request id field, passed in R3, is zero, then all TQEs that match on the PID and access mode checks will be dequeued.

The following is a VAX MACRO code fragment that demonstrates how to remove a previously enqueued TQE:

       ;       We must now cancel the TQE we have outstanding.  We call the
       ;       EXE$RMVTIMR routine to do it for us.
       ;       R2      access mode
       ;       R3      request id
       ;       R4      type of entry to remove
       ;       R5      process id
               MOVL    G^CTL$GL_PCB, R4        ; Get PCB address in R4
               MOVPSL  R0                      ; Get PSL into R0
               EXTZV   #PSL$V_PRVMOD, #PSL$S_PRVMOD, R0, R0
               JSB     G^EXE$MAXACMODE         ; Maximize access modes
               MOVL    PCB$L_PID(R4), R5       ; Get current PID
               MOVL    #TQE$C_TMSNGL, R4       ; Indicate a single request
               MOVL    #GENLOCK_C_REQIDT, R3   ; Specify the TQE number
               MOVL    R0, R2                  ; Copy access mode
               JSB     G^EXE$RMVTIMQ           ; Delete timer queue entry

In this installment, we have examined the mechanisms inherent in the OpenVMS operating system that allow privileged code threads to request time-related operations. The functionality provided by these routines allows us to request that code threads be scheduled to execute either at regularly-scheduled intervals or one specific point in the future.

Hunter Goatley, Western Kentucky University, Bowling Green, KY.
Edward A. Heinrich, Vice-President, LOKI Group, Inc., Sandy, UT.

 Posted by at 10:30 pm