Writing VMS Privileged Code

Part I: The Fundamentals, Part 1

Hunter Goatley

Edward A. Heinrich

There are a lot of books for the OpenVMS applications programmers, and there are now a number of books that explain various OpenVMS internals details. But there are few sources available that actually describe *how* to write privileged code under OpenVMS. This series will attempt to fill that gap, providing the documentation needed to make the transition from OpenVMS applications programmer to OpenVMS systems programmer.

This article covers a number of topics that all OpenVMS systems programmers should be familiar with on some level. While this series' goal is to help educate applications programmers so they can write systems programs, not all of the details will be covered in the sections below. For the details on these topics, please consult the _VAX VMS Internals And Data Structures_ book, written by Ruth Goldenberg and published by Digital Press. This book will be referred to later as the _I&DS_ book.

The VAX Architecture and OpenVMS

OpenVMS was originally designed to run on the VAX family of computers. The operating system and the computer were actually developed at the same time, and until the early 1990s, VAX/VMS referred explicitly to the hardware (VAX) and the VAX software (OpenVMS). For many years, it was felt that OpenVMS could only be run on a VAX because the VAX processor was designed to let the operating system perform many complex tasks with a single instruction. Tasks such as process context-switching were performed by hardware micro-code, which was faster than it would have been for OpenVMS to swap processes in software. While other operating systems could be run on VAXes (ULTRIX, BSD, VAXELN), OpenVMS could not run on anything except a VAX because of its VAX dependencies.

All of this changed in 1992 when Digital announced its ultimate replacement for the VAX line: the Alpha AXP RISC system. Using a simple instruction set, the Alpha system allows multiple instructions to be executed in a single CPU cycle. Coupled with pipelining and other RISC methods, the Alpha AXP system is expected to debut at performance levels that the older CISC technology in the VAX line will never match.

Because OpenVMS relies so heavily on such VAX architecture features as processor modes and IPL, the only way OpenVMS could be ported to the Alpha system was to make Alpha emulate those features of the VAX architecture. The Alpha designers accomplished this through the use of PALcode (Privileged Architecture Library) routines, which can be thought of as run-time library routines at the hardware level. Through the PALcode support, Alpha provides those VAX features upon which OpenVMS is most dependent: the four processor modes and IPL.

Processor Access Modes

The VAX processor is capable of executing in one of four modes: User, Supervisor, Executive, and Kernel. These processor modes are used to control access to memory locations and also to control access to certain privileged instructions. User mode is the least privileged mode, followed by supervisor, executive, and kernel. Code streams executing in kernel mode can access any memory location and can execute any of the privileged instructions for performing such tasks as process context switching, blocking interrupts, etc. Executive and kernel mode are required to access many of the system data structures; OpenVMS systems programs execute in one of these inner-access modes.

Most applications code executes in user mode. User-mode code must rely on system services and run-time library routines to perform any functions that are considered to be privileged. The DCL Command Line Interpreter (CLI) executes in supervisor mode. It is about the only piece of OpenVMS code that does run in supervisor. Most of the RMS file services code executes in executive mode, as do several of the system services. The majority of the system services and the OpenVMS kernel execute in kernel mode. The OpenVMS kernel is comprised of three major components: the I/O subsystem, the memory management subsystem, and the scheduling subsystem.

The current processor mode is determined by the setting of the mode bits in a special register called the Processor Status Longword (PSL). Under OpenVMS AXP, this register is just called the PS (Processor Status). There are four mode bits in the PS: two for the current access mode and two for the previous access mode. The values of the bits determine the mode---0 is kernel mode, 1 is executive, 2 is supervisor, and 3 is user mode. The previous mode bits are set whenever the access mode is changed; they are used to determine the proper access mode to which the processor is to be returned when the access mode is to be reset. The CHMx and REI instructions (PALcode under OpenVMS AXP) change the access mode by loading a new PS for the processor.

For more details on the operation of these instructions, please consult the appropriate architecture manual.

Each process has separate stacks for each of the access modes. These stacks are allocated in the process's P1 space (described below); the sizes of all but the user-mode stack are static. The user stack is automatically expanded under certain conditions; this process is described in detail in the _I&DS_ manual.

There is also a system-wide stack known as the interrupt stack. This stack is accessible only from kernel mode and is used by code that services interrupts and other code that runs in system context. It is described in more detail later in this chapter.

Exceptions and interrupts

When reading most any OpenVMS kernel documentation, you will often encounter the terms exception and interrupt. As prospective systems programmers, you must understand the difference between the two. An exception occurs synchronously as the direct result of the processor executing an instruction, and is serviced by either a user-defined handler or OpenVMS routine in the context of the process that caused the exception. Exceptions cannot be blocked, although some can be ignored by clearing the appropriate bit in the PSL of the process. In addition, executing the exact code sequence repeatedly will consistently generate the identical exception. The OpenVMS exception handler code normally executes on the kernel stack, since it has process context. If an exception occurs while in system context, it will, however, be serviced on the interrupt stack. Examples of exceptions are access violations and page faults.

Interrupts are events that occur asynchronously to the instruction stream being executed by the processor. Unlike exceptions, interrupts can be temporarily blocked by raising IPL and/or acquiring a spinlock. An interrupt is generated by the occurrence of an external event, i.e., a power failure, or an I/O transfer completing. Unlike exceptions, interrupts cannot be reliably predicted or generated within a particular code path. When the OpenVMS kernel detects an interrupt, it switches to system context and services the interrupt on the system-wide interrupt stack. AST delivery is the only interrupt that executes in process context. Note that exceptions are not allowed while processing an interrupt.

The $CMKRNL and $CMEXEC System Services

Programs enter other access modes through the use of the CHMx and REI instructions (or PALcode routines under OpenVMS AXP). For example, a program executing in user mode changes to kernel mode by, ultimately, executing a CHMK (CHange Mode Kernel) instruction. The CHMK instruction accepts one operand; this operand is used as an index into a table of offsets to various system services. CHME (CHange Mode Executive) works the same way. CHMU and CHMS are rarely used; in fact, neither exists under OpenVMS AXP. Code usually enters user and supervisor modes through the REI instruction.

While the CHMx instruction is ultimately executed, user code actually enters one of the two inner-access modes through two system services: $CMKRNL (Change Mode KeRNeL) and $CMEXEC (Change Mode EXECutive). These routines accept as parameters the address of a routine to be called in the appropriate access mode and the address of an argument list to be passed to the target routine. For example, the MACRO-32 code to execute routine KERNEL_STUFF in kernel mode would look like:

        $CMKRNL_S                     ;Change mode to kernel to call
                ROUTIN=KERNEL_STUFF,  ;... routine KERNEL_STUFF
                ARGLST=KERNEL_STUFF_ARGS

The target routine must be declared as a .CALL_ENTRY routine under Alpha OpenVMS and as a .ENTRY routine under OpenVMS VAX. The $CMxxxx system services change to the desired access mode and then call the specified routine. When that routine exits back to the system services, the processor is placed back into the original access mode and control returns to the routine that called the system service.

The choice of executive mode or kernel mode is determined by a number of factors that will be discussed throughout this book. Normally, the decision is based on the protection of the target addresses that are to be accessed. The protection of most of the OpenVMS address space is set to ERKW (Executive Read, Kernel Write), which means that the structure can be read from executive mode and kernel mode, but can only be modified from kernel mode. On such structures, executive mode may be suitable. However, privileged instructions and PALcode routines such as MTPR and REI may only be executed from kernel mode.

If your application can be written using only executive mode, it is usually preferable to do so. Access violations in executive mode will result only in process deletion, while kernel mode access violations that are not properly handled will result in a system crash. This fact alone usually makes executive mode code easier to write and debug.

Memory Management Summary

The OpenVMS operating system is a virtual memory system (which is, of course, where the name OpenVMS came from). A virtual memory system can access more memory than is actually physically present. This is accomplished through the use of page files, shared memory, and several other methods. Accessing a page of memory that is not in physical memory results in a page fault, a process by which the page is moved into memory. There are two type of page faults; a soft page fault means that the data was still located in a cache buffer, and a hard page fault means that the page had to be faulted in from disk.

The total amount of virtual memory that OpenVMS VAX can access is 4 gigabytes (the largest address space provided by 32-bit addresses). As OpenVMS AXP grows to use 64-bit addresses, this limit significantly increases. OpenVMS divides this 4-gigabyte virtual address space into four regions: P0, P1, S0, and S1 space. Each process has its own P0 and P1 address space and shares S0 and S1 space with other processes.

                   +-------------------------+
                   |                         | 00000000
                   |        P0 space         |
                   |                         | 3FFFFFFF
                   +-------------------------+
                   |                         | 40000000
                   |        P1 space         |
                   |                         | 7FFFFFFF
                   +-------------------------+
                   |                         | 80000000
                   |        S0 space         |
                   |                         | BFFFFFFF
                   +-------------------------+
                   |                         | C0000000
                   |        S1 space         |
                   |                         | FFFFFFFF
                   +-------------------------+

The VAX architecture defined a page of memory as 512 bytes, which is the same size as a disk block under the OpenVMS file system. The Alpha AXP architecture achieves some of its speed through the use of larger page sizes. These page size ranges from 8K to 64K, depending on the Alpha processor. Fortunately for OpenVMS programmers, most system services in OpenVMS AXP have been modified to work with pagelets, which are 512-byte chunks of a page; much of the OpenVMS application code that works with 512-byte pages will still work under OpenVMS AXP. However, systems code that bypasses the system services when working with pages will have to be rewritten for OpenVMS AXP.

P0 Address Space (The Program Region)

P0 space (the program region is the memory in the address range 0000000016-3FFFFFFF16. It holds user programs and any shareable images with which a given program is linked. Additional memory may be allocated from the program region using the run-time library (RTL) routines LIB$GET_VM and LIB$GET_VM_PAGE and by the system services $EXPREG and $CRETVA.

Not all of P0 space is actually mapped at any given time. Only the amount needed to run the current program is mapped, unless additional memory is created using $EXPREG or $CRETVA (and deleted using $CNTREG and $DELTVA). When the program region is expanded, it grows toward P1 space (toward higher addresses).

Programs are actually mapped beginning at address 0000020016; 0000000-000001FF16 is never defined. This architecture feature helps programmers locate programming errors, since one of the most common errors is the accidental reference of data using an offset from 0. Because the first 512-byte block of memory is invalid, any such references generate access violations.

P0 space is volatile in the sense that all of the P0 memory is released (or re-initialized) at image rundown. Any data stored in a P0 location (e.g., in memory allocated using LIB$GET_VM) will be lost when the image exits.

P1 Address Space (The Control Region)

P1 space (also known as the control region) consists of the memory in the address range 4000000016-7FFFFFFF16. P1 memory grows from the higher addresses toward the lower address. Conceptually, P0 space and P1 space grow toward each other as each region is expanded. The control region is the home of the CLI (Command Line Interface) and all of the process-permanent data structures used by the CLI. The control region can be expanded using $EXPREG and $CRETVA.

P1 space is not volatile in that all data stored there is retained across image activations. The memory containing the CLI (usually DCL) is mapped to P1 addresses of all the processes that use it. Other data stored in P1 space includes the per-process stacks for the various access modes, the CLI command recall buffer, global and local symbol definitions, key definitions, process-permanent file information (SYS$INPUT, SYS$OUTPUT, and SYS$ERROR), and memory management structures.

More information on P1 space and DCL will be discussed in future articles.

S0 and S1 Address Space (System Space)

S0 space is the space used by OpenVMS itself. It extends from 8000000016--BFFFFFFF16. S1 space is the region from C000000016--FFFFFFFF16; it is currently undefined by OpenVMS and is reserved for future use.

S0 space is shared among all processes; there is only one copy of OpenVMS maintained in memory. S0 space contains, among other things, the OpenVMS image(s), nonpaged and paged pool for memory allocation, process context and memory management data structures for all the processes on the system, and space for logical name tables. Much of the mapped S0 space is nonpaged, which means it is in physical memory at all times. Such data structures as the logical name tables and some memory management structures are stored in paged memory.

OpenVMS Executive Overview

The OpenVMS executive is the code and data needed for OpenVMS to run. Virtually all of the OpenVMS data structures are implemented as linked lists. An understanding of the structures and process and system context is necessary for any OpenVMS systems programming.

OpenVMS Data Structure Overview

The successful design and implementation of any computer-based system relies heavily upon its underlying data structures. The structures used need to provide easy and efficient access to the data, and allow for the addition or deletion of fields as modifications to the original design and implementation become necessary.

Naming Conventions

The _I&DS_ manual describes the OpenVMS naming conventions in great detail in Appendix D. A brief description is included here for completeness.

The OpenVMS designers using a standard naming convention for routines, global symbols, and error statuses. Most Digital symbols contain a dollar sign ($) in them; Digital recommends that user-written code uses the underscore character (_) to distinguish user names from Digital names.

The symbols are designed to convey as much information as possible about the object named, while still retaining some brevity. To this end, most OpenVMS data structures have symbolic names for individual fields that follow the form structure$t_field-name. The letter t is the data type of that field. The most common data types are B (byte), W (word), L (longword), Q (quadword), and T (text string). The symbol itself is the byte offset from the beginning of the structure to the named field. For example, the value of the symbol PCB$L_EPID is the offset of the longword EPID field in a Process Control Block (PCB).

Data structures symbols are defined in MACRO-32 by macros named like $structureDEF, where structure is the name of the data structure. For example, the PCB symbols are defined in the macro $PCBDEF, which can be found in the system macro library LIB.MLB in SYS$LIBRARY. Most of the system data structures are defined in LIB, while a few are defined in the STARLET.MLB library, and a few are not defined at all.

Other common symbols used by systems programmers have a data type prefixed with the letter ``G'', which indicates that the location is a ``global'' location. For example, the symbol CTL$GQ_LASTLOGIN_I is the name of the global quadword location in the CTL region (described in a future article) that contains the last interactive login time. References to global symbols are usually resolved by linking images with a system symbol table.

A symbol table is a special type of object module that contains the symbol definitions of all the global symbols defined in an image. Symbol tables are created by the linker using the /SYMBOL_TABLE qualifier on the LINK command line. The global symbol definitions can then be shared with other images by linking those images with the symbol table module.

There are a few symbol tables provided with OpenVMS. Any systems program referencing an OpenVMS VAX global symbol must be linked with one or more of the following tables:

      SYS$SYSTEM:SYS.STB     Contains all of the symbols found
                             in SYS.MAP in SYS$SYSTEM.

      SYS$SYSTEM:SYSDEF.STB  Contains many of the symbol definitions
                             defined in LIB.MLB, including such symbols
                             as those defined by $UCBDEF, $PCBDEF, etc.

      SYS$SYSTEM:DCLDEF.STB  Contains global symbols defined by
                             and used by DCL.

User programs can link with one or more symbol tables to resolve global references by specifying the table file name on the DCL LINK command line:

$ LINK file,SYS$SYSTEM:SYS.STB/SELECTIVE_SEARCH

Under OpenVMS AXP, the SYS.STB no longer exists. Instead, the symbols are included in the shareable image SYS$BASE_IMAGE.EXE in SYS$LOADABLE_IMAGES:. The other .STB files are also located in SYS$LOADABLE_IMAGES: instead of SYS$SYSTEM: on an OpenVMS AXP system.

To link an AXP program with the system symbol table, the LINK qualifier /SYSEXE should be specified:

	$ LINK/SYSEXE file,SYS$LOADABLE_IMAGES:DCLDEF.STB/SELECTIVE_SEARCH

This qualifier tells the linker that the image is to be linked against SYS$BASE_IMAGE.EXE.

Common Structure Types

Most of the OpenVMS data structures are implemented as quadword-aligned, doubly-linked queues, though there are a few singly-linked lists. Figure 1 describes some of the structures most commonly accessed by OpenVMS systems programmer. It is not intended to be a complete list; see the _I&DS_ manual for a more complete list. Some of these structures are described in more detail in later chapters.

Figure 1: Common Internal OpenVMS Data Structures

Mnemonic Name              Description

PCB    Process Control     Contains process-specific information that is
       Block               used primarily for scheduling purposes. Allocated
			   from nonpaged pool.

PHD    Process HeaDer      Contains process-specific information that is
			   primarily used for memory management. Allocated
			   from paged pool and may not always be resident.

JIB    Job Information     Contains quotas and information that applies to
       Block               all processes within a certain job. Allocated
			   from nonpaged pool.

UCB    Unit Control        Contains device-specific information
       Block               about devices on the system. Each
                           device has a UCB associated with it.
                           Allocated from nonpaged pool.

DDB    Device Data Block   Contains information pertaining to
                           a class of devices. There is one
                           DDB for each class. Allocated from
                           nonpaged pool.

ORB    Object Rights       Contains information about the protection
       Block               tection of an object. Allocated from paged pool.

Queue Manipulation Instructions

In the OpenVMS operating system, the basic data structure format is both simple and elegant; it allows for efficient access algorithms and easy expansion of data fields. Most of the internal OpenVMS data structures are quadword-aligned, doubly-linked queue entries. The VAX architecture provides inherent hardware support for two types of queues:

Absolute, in which each link field contains the address of the next element in the queue
Self-relative, in which each link field contains an offset that specifies the distance from the current entry to the next entry

The VAX INSQUE and REMQUE hardware instructions allow for enqueueing and dequeueing of absolute queue entries, while the INSQTI, INSQHI, REMQTI, and REMQHI instructions are used to manipulate self-relative queues. Absolute queues support the addition or removal of an entry at any place in the queue, whereas self-relative queue entries can only be inserted or removed at either the head or tail of the queue. One other fundamental difference between the two queue types is that the absolute queue instructions are interruptible and are not synchronized between processors, while the self-relative queue instructions are non-interruptible and are interlocked among all processors in an SMP environment. Prior to OpenVMS V5.0, many of the internal OpenVMS data structures were linked via absolute queues. With the addition of SMP support in V5.0, these queues were changed to be self-relative.

Each queue entry contains twelve bytes of fixed overhead at the beginning of the entry. A longword is used to describe the forward and backward links, a word describes the size of the structure, and a byte holds the structure type. As the number of structures has increased to where it now exceeds 127, a subtype field has been added. OpenVMS data structures with a type code greater than 95 contain a subtype. For structures that do not use a subtype, the 12th byte is either unused or contains information relevant to the structure. Pictorially, the queue overhead looks as follows:

                   +-------------------------------+
                   |          Forward link         |
                   |-------------------------------|
                   |         Backward link         |
                   |-------------------------------|
                   |Subtype| Type  |     Size      |
                   +-------------------------------+

OpenVMS has reserved types 120--127 for user-defined data structures.

The file $DYNDEF in SYS$LIBRARY:LIB.MLB or SYS$LIBRARY:LIB.REQ, contains the symbolic offsets for the various OpenVMS data structures.

System Context and Process Context

To say that a code thread executes in process context means that the code can rely on access to P0, P1, and S0 address space, including access to the per-process stacks. Threads that execute in process context may be subjected to interruption due to the delivery of ASTs and scheduling changes. In addition, code executing in process context may be restricted by privileges and quotas.

All non-kernel code executes in the context of a process. Process context implies access to P0 and P1 address space. Non-kernel code cannot raise IPL or acquire spinlocks. Non-kernel code can depend upon all the comforts of process context.

Kernel-mode code may or may not execute in process context. If your kernel code cannot ensure it has process context, it must not reference any process-specific data or system code or data structures that cannot be guaranteed to be memory resident, i.e., paged pool. For example, if you write code that is executed in response to an interrupt, the code thread executes asynchronously to the entire system and cannot rely on anything that is not resident in S0 space.

When the processor is running in system context, it is executing at elevated IPL and pagefaults are not allowed; therefore, you can only reference data structures, defined by either the operating system or your code, that are located in nonpaged pool. ASTs are not delivered while the processor is executing system context code. In addition, code that runs in system context uses the system-wide interrupt stack-care must taken when using the stack because it is a finite structure. The size of the interrupt stack is determined by the SYSGEN parameter INTSTKPAGES, which is 4 pages by default. This value is stored in global location SGN$GW_ISPPGCT.

Next Issue

In the next issue, we'll continue our look at kernel-mode fundamentals with discussions on synchronization techniques and memory management.

Hunter Goatley, Western Kentucky University, Bowling Green, KY.

Edward A. Heinrich, Vice-President, LOKI Group, Inc., Sandy, UT.