Writing VMS Privileged Code
Part II: The Fundamentals, Part 2
Edward A. Heinrich
This article covers a number of topics that all OpenVMS systems programmers should be familiar with on some level. While this series’ goal is to help educate applications programmers so they can write systems programs, not all of the details will be covered in the sections below. For the details on these topics, please consult the VAX/VMS Internals And Data Structures book, written by Ruth Goldenberg and published by Digital Press. This book will be referred to later as the I&DS book.
Note that many of the examples in this series will be written in MACRO-32. Unless otherwise designated, all MACRO-32 examples are valid for both OpenVMS VAX and OpenVMS AXP. The only difference is that MACRO-32 refers to the assembler on the VAX and the MACRO compiler on the AXP.
Synchronized access to memory and device components by multiple processes is very important to the successful operation of OpenVMS. Without it, the validity of data structures could not be guaranteed. The need for synchronization can be demonstrated with a short example.
Suppose you have a piece of code that needs to write data to two different memory locations. Assume the code was able to write data to one location, but was interrupted before it could write to the second location. If the interrupting code needed the data in those locations, the data would no longer be valid, since only part of it was written out. Additionally, assume the data is also available for reading and writing by another process on the system. If the second process tried to read the data before the first process had written both pieces out, the second process would retrieve corrupted information.
There are four primary methods used for synchronizing privileged code: IPL, spinlocks, mutexes, and system locks via the lock manager. Of the four, the lock manager is the least-used in privileged systems code.
IPL – Interrupt Priority Level
IPL (Interrupt Priority Level) controls access to various system components, especially data structures and devices on the system. There are 32 levels of IPL, numbered from IPL 0 to IPL 31; normally, code executes at IPL 0. Synchronization is achieved by raising IPL to block other code executing at lower IPLs. For example, a device driver may raise IPL to level 23; all code running below IPL 23 is blocked from execution until the IPL is lowered.
IPL works only on uniprocessor systems (systems with one CPU) or on one processor in a multiprocessor system. The IPL scheme assumes that there is only one code thread that can be running at any given time; this is not true on a multiprocessor system, where multiple code threads can execute concurrently on the multiple CPUs. To handle multiprocessor systems, VMS V5.0 introduced the concept of spinlocks discussed in the next section.
This does not mean that IPL is not used on multiprocessor systems. Instead, IPL is used in tandem with spinlocks to provide proper synchronization. Proper use of IPL still blocks other, lower-IPL code from executing on the current CPU while the elevated-IPL code is executing.
Proper usage of IPL for synchronization mandates that you always raise IPL and never lower it unless you are lowering IPL back to the original level. For example, assume you have code running at IPL 2 and you need to elevate IPL to 11, then access a structure synchronized at IPL 8, then raise IPL back to 11. The proper sequence to guarantee synchronization is:
- raise IPL to 11
- lower IPL to 2 (the original IPL) or lower
- raise IPL to 8
- raise IPL to 11
- lower IPL to 2 (the original IPL) or lower
If the code had changed IPL from 11 directly to 8, the synchronization of the IPL 8 structure could be compromised because an IPL 8 code thread might have been interrupted by the IPL 11 thread.
Page faults are not allowed above IPL 2. Code executing above IPL 2 that generates a page fault causes a PGFIPLHI bugcheck, which causes the system to crash. Methods for ensuring that data is locked into memory will be discussed throughout this series; the most common methods are the uses of nonpaged pool and the $LKWSET system service.
If interrupt code that executes at elevated IPL must access data structures that are synchronized at a lower IPL, it must create a fork process, which is a means by which the access can be delayed until a later time when the IPL has been lowered. For example, if a device driver must access a process’s PCB (Process Control Block, which is synchronized at IPL 8), it cannot lower IPL-it should instead create a fork block that describes the code thread that should be executed when IPL drops below the desired IPL. Forks will be described in more detail in a future article.
Similarly, interrupt code that elevates IPL should never lower IPL below that at which it was called. The REI instruction will generate a reserved operand fault if the current IPL is lower than the previous IPL that is in the PSL that has been stored on the stack.
IPL is raised by writing the level number into processor register PR$_IPL. The symbolic names of the processor registers are defined in module $PRDEF, in SYS$LIBRARY:STARLET.MLB, and the symbolic IPL names are defined in $IPLDEF, in SYS$LIBRARY:LIB.MLB. This is normally accomplished by using the DSBINT, ENBINT, and SETIPL macros, which generate the needed MTPR (Move To Processor Register) instruction on the VAX (it’s a PALcode call on AXP).
The following table shows some of the various IPLs and their uses:
_______________________________________________________________ Symbol___________IPL_____Description___________________________ IPL$_HWCLK 24 Blocks clock and device interrupts IPL$_MAILBOX 11 Blocks mailbox interrupts IPL$_POWER 31 Disables all interrupts IPL$_QUEUEAST 6 Driver fork IPL to go to IPL 8 IPL$_TIMER 8 Blocks access to timer queue IPL$_SYNCH 8 Synchronizes access to PCB, etc. IPL$_SCHED 8 IPL for scheduling structures IPL$_JIB 8 IPL for JIB access IPL$_MMG 8 IPL for Memory Management access IPL$_ASTDEL 2 Blocks delivery of ASTs (prevent process deletion _______________________________________________________________
The DSBINT macro accepts three optional parameters: IPL (the new IPL), DST (the address of a longword to receive the old IPL), and ENVIRON (a keyword indicating the environment, either UNIPROCESSOR or MULTIPROCESSOR). If no new IPL is specified, DSBINT defaults to IPL 31, which blocks every other interrupt on the system (including hardware interrupts from clocks and devices). If the DST parameter is not specified, the old IPL value is pushed on the stack-it is assumed that it will later be popped off the stack by the ENBINT macro. If the ENVIRON parameter is not specified when assembling under OpenVMS VAX V5.x or higher, or under OpenVMS AXP, the assembler or compiler will generate the following warning:
Raising I P L to XX provides no multiprocessing synchronization
(You can think of the ENVIRON parameter as OpenVMS’s way to force you to say, “Yes, I know what I’m doing.”)
A typical call to DSBINT would look like the following line:
DSBINT #IPL$_SYNCH,ENVIRON=MULTIPROCESSOR ; Set IPL to 8
Conversely, the ENBINT macro simply restores the IPL to its previous value. It accepts one optional parameter, SRC (the address of the longword containing the previous IPL). If SRC is not specified, ENBINT pops the top longword off the stack and stores that as the new IPL. A typical call to ENBINT would look like the following line:
ENBINT ; Reset IPL to original value
Note that proper use of DSBINT and ENBINT when using the stack to store the original IPL means that the stack must be properly maintained so the original IPL is on top when ENBINT is called.
The SETIPL macro is not normally used; it does not save the current IPL, so it is useful only when your code segment wants to explicitly raise and lower IPL. When writing driver-level code, your code may get called from IPLs higher than 0, so it’s better to use DSBINT and ENBINT so IPL is properly restored. A call to SETIPL is almost identical to the DSBINT call:
SETIPL #IPL$_SYNCH ; Set IPL to 8 [....] SETIPL #0 ; Lower IPL to 0
As noted earlier, raising IPL is not sufficient for synchronizing multiple code threads executing concurrently on multiple CPUs. To allow OpenVMS to work with multiprocessor VAXes and AXPs, DEC introduced spinlocks with VMS V5.0. A spinlock is, essentially, a flag that indicates whether or not a resource is currently being accessed by another code thread. When a spinlock is held by a code thread, any other code thread that tries to acquire the spinlock will “spin” until the lock is released. For this reason, as with IPL, it is imperative that code that acquires spinlocks should execute as quickly as possible to avoid unnecessarily blocking important system events.
When a code thread acquires a spinlock, it also raises IPL. Elevated IPL is still needed to block access on any single CPU, while the use of the spinlock blocks access from other threads executing on other CPUs. For single processor systems, the spinlock code is never executed; the spinlock code determines at run-time that spinlocks are not needed, so the only action taken is that IPL is raised. This check is made by testing if SMP$V_ENABLED is set in the SMP$GL_FLAGS global longword. (Full SMP-checking is not normally enabled on a single-processor system; you can enable it by setting the SYSGEN parameter MULTIPROCESSING to 2.
NOTE If you enable SMP code, OpenVMS will not load any device drivers that have not been modified for SMP operation. Drivers that have been modified have the DPT$M_SMPMOD bit set; note that the DECwindows/Motif drivers have not been modified for SMP, so you cannot use DECwindows while MULTIPROCESSING is enabled. This will be discussed in more detail in a future article.
There are various spinlocks used by OpenVMS to control access to data structures; the following table shows some of the more commonly used spinlocks:
SCHED Scheduler MMG Memory Management JIB Job Information Block TIMER Timer queue QUEUEAST Queue AST to a process
In addition, each device on a system can have its own spinlock, but usually all devices in a certain class share the same spinlock. The spinlock data structures are allocated from the system’s nonpaged pool. Each spinlock has a specific IPL associated with it; when a spinlock is acquired, the IPL is raised to the level specified for that spinlock.
There are a number of macros defined in LIB.MLB for locking and unlocking spinlocks. The general purpose macros are LOCK and UNLOCK; for device spinlocks, the DEVICELOCK and DEVICEUNLOCK macros are used. The LOCK and UNLOCK macros are called much like DSBINT and ENBINT: LOCK the spinlock, perform the necessary operations, and UNLOCK the spinlock.
LOCK and UNLOCK take a number of optional parameters; the following example demonstrate common calls to these macros:
LOCK LOCKNAME=SCHED,- ; Grab the scheduler spinlock SAVIPL=-(SP),- ; ... Save current IPL on stack PRESERVE=NO ; ... Don't save R0 [....] ; Do priv code here UNLOCK LOCKNAME=SCHED,- ; Release the scheduler spinlock NEWIPL=(SP)+,- ; ... Restore orig. IPL CONDITION=RESTORE ; ... Only release our lock
The LOCKNAME parameter specifies the name of the spin lock (SCHED in this case). The LOCK and UNLOCK macros do not automatically save and restore the IPL, so it is necessary to tell LOCK where to place the original IPL value and UNLOCK where to find it—the stack is used in this example. Both macros use R0 as a work register; by default, they save and restore the contents of R0. If you don’t care about the contents, you can specify PRESERVE=NO to skip the save and restore; this eliminates a couple of extra instructions. Before discussing CONDITION=RESTORE, a discussion of how code can acquire multiple spinlocks is needed.
When a spinlock has been acquired, it is “legal” to acquire another spinlock only if the second spinlock has the same IPL or higher. Additionally, each spinlock has a rank associated with it. Spinlocks must be acquired in order of increasing rank only; the lower the rank value, the higher the rank priority.
For example, assume a thread has acquired the MMG spinlock, which raises IPL to 8 and has rank 10(16). This same thread can also acquire the SCHED spinlock, because it also synchronizes using IPL 8 and has rank 0F(16), and the MAILBOX spinlock, because it raises IPL to 11 and has rank 08(16). However, the spinlocks must be released in LIFO (last in, first out) order, where the most recently acquired spinlock is released first. As you can see in this example, if the SCHED spinlock was released first, IPL would be dropped from 11 to 8, invalidating the MAILBOX synchronization.
Similarly, it is valid for a code thread to acquire a spinlock and call another routine that acquires that same spinlock. The UNLOCK macro, by default, will release all acquisitions of a spinlock. Specifying CONDITION=RESTORE on the UNLOCK macro above causes only the most recent acquisition to be released. Without it, when the original thread tried to release the spinlock, a fatal bugcheck would be generated because the spinlock would have already been released.
The DEVICELOCK and DEVICEUNLOCK macros are special versions of LOCK and UNLOCK for use with device spinlocks. The spinlock for a device is stored in its UCB at offset UCB$L_DLCK. If the LOCKADDR is not specified, the macros will use the contents of UCB$L_DLCK(R5) as the address of the device’s spinlock. The following example demonstrates typical uses of DEVICELOCK and DEVICEUNLOCK:
; Assume UCB address is in R5 DEVICELOCK - ; Grab the device's spinlock SAVIPL=-(SP),- ; ... Save original IPL PRESERVE=NO ; ... Don't save R0 [...] DEVICEUNLOCK - ; Release the spinlock NEWIPL=(SP)+, ; ... Restore the orig. IPL CONDITION=RESTORE,- ; ... Release only this lock PRESERVE=NO ; ... Don't save R0
The NEWIPL, CONDITION, and PRESERVE parameters are virtually identical to those for LOCK and UNLOCK. When specified for DEVICELOCK, CONDITION can be set to NOSETIPL to prevent the IPL from being changed.
If the address of a device spinlock is not stored in the UCB, but is instead stored in another structure, you could specify the LOCKADDR in the macro calls:
DEVICELOCK - ; Grab the device's spinlock LOCKADDR=MY_DLCK(R6),- ; ... Device spinlock address SAVIPL=-(SP),- ; ... Save original IPL PRESERVE=NO ; ... Don't save R0
The System Dump Analyzer (SDA) features a SHOW SPINLOCKS command that displays information about all of the spinlocks on a system. To invoke SDA, type ANALYZE/SYSTEM at the DCL prompt; at the SDA> prompt, type SHOW SPINLOCKS. SDA will show a number of structures like the following:
System static spinlock structures --------------------------------- SCHED Address 80224960 Owner CPU ID None IPL 08 Ownership Depth 0000 Rank 0F CPUs Waiting 0000 Index 2F
Some data structures are too complex or too large to allow privileged code to traverse them while running at elevated IPL (system interrupts would be blocked for too long). Also, data in pageable memory is not suitable for code at elevated IPL because page faults are not allowed above IPL 2. OpenVMS handles these structures through mutexes, which is short for “mutual exclusion semaphores.” Some examples of structures that are synchronized via mutexes include the logical name table, paged dynamic memory, global section descriptor list, and one of the most commonly used, the I/O database.
Like a spinlock, a mutex is, essentially, a flag that indicates whether or not a code thread is accessing the structure. The low word of the 31-bit mutex consists of the ownership count for a mutex; the high word consists of a status word, indicating whether or not the mutex has been acquired for writing. In VAX VMS the mutex count field is initialized to -1 so that a single TSTW instruction can be used to determine if the mutex is unowned, has exactly one owner, or is being accessed by multiple code threads. The global longword for the I/O database mutex has the symbolic name IOC$GL_MUTEX; similarly, the logical name table mutex is LNM$AL_MUTEX, the paged dynamic memory mutex is EXE$GL_PGDYNMTX, etc.
Read access to a mutex is acquired by passing the address of the mutex to the system routine SCH$LOCKR. If the mutex is unowned or is owned by another thread for read access, SCH$LOCKR increments the ownership in the mutex and bumps the number of mutexes owned by the process, which is stored in the Process Control Block (PCB) at PCB$W_MTXCNT. If this is the first mutex owned by the process, the process current and base priorities are saved in the PCB and the priority is boosted to 16 to hasten the access to the mutex.
If the mutex is currently unavailable (another process has write access to the mutex), the calling process is placed in a mutex wait state (IPL is raised to 2 to prevent process deletion while waiting for the mutex). When the mutex is released, the process in the mutex wait queue with the highest priority acquires the mutex and control returns to the caller. If the desired mutex is never released, a process will never return from a mutex wait. This is not supposed to happen, but sometimes does if code doesn’t handle mutexes correctly.
When a process needs write access to a mutex, it calls SCH$LOCKW, which does essentially the same as SCH$LOCKR, except that the mutex is acquired only if no other process has acquired it. Once again, if the mutex is already owned by another process, the requesting process is placed in a mutex wait state. A second entry point, SCH$LOCKNOWAIT, returns im- mediately if the mutex is not available, letting code take alternate action if it can’t access the structure at that time.
The system routine SCH$UNLOCK is called to release a mutex. Once again, the address of the mutex is passed to the routine.
Because the I/O database is frequently accessed by device drivers, separate entry points exist for gaining ownership of IOC$GL_MUTEX. Routines can call SCH$IOLOCKR and SCH$IOLOCKW, which simply move the address of IOC$GL_MUTEX into R0 and then drop into SCH$LOCKR and SCH$LOCKW, respectively.
To unlock the I/O database mutex, code can JSB to (in addition to SCH$UNLOCK) SCH$IOUNLOCK and IOC$UNLOCK. IOC$UNLOCK causes R0 to be saved over the subroutine call, while SCH$IOUNLOCK does not. If the code that processes the I/O database places a status in R0, a call to IOC$UNLOCK automatically saves the status.
System Locks and the Lock Manager
The OpenVMS lock management system services allow cooperating processes to share access to a resource. The resource can be any entity on the system: a process, a device, a file, a data structure, etc. As with the other synchronization techniques, system locks are not effective unless all programs that need access to the resource use the lock management services.
The services that make up the lock manager are $ENQ (ENQueue lock request), $ENQW (ENQueue lock request and wait), $GETLKI and $GETLKIW (GET LocK Information), and $DEQ (DEQueue lock request). A lock is a data structure allocated from the system dynamic memory. It can be given a resource name that other processes can specify when they try to acquire the lock. The lock is created and requested using the $ENQ system service. $ENQ accepts a number of parameters, including a lock mode that specifies the type of lock that the process wants to take out. Using the lock mode, a process can acquire the lock exclusively or it can acquire it for reading or writing and allow others to read or write to the resource. Resource locks can have a parent lock so that different parts of a resource can be protected by different locks. For example, you might have a resource lock on a data structure and other “child” locks that control access to particular fields within the structure.
For a detailed explanation of the lock management services, please see the manual Introduction to VMS System Services in the OpenVMS documentation set.
In the next issue, we’ll continue our look at kernel-mode fundamentals with discussions of memory management.
Hunter Goatley, Western Kentucky University, Bowling Green, KY.
Edward A. Heinrich, Vice-President, LOKI Group, Inc., Sandy, UT.