Part VII: More Debugging Techniques

 

 

Writing VMS Privileged Code

Part VII: More Debugging Techniques

by

Hunter Goatley

Edward A. Heinrich

 


Bugchecks and Error Logging

In development code, forcing bugchecks and logging errors in the system error log are often good ideas. A bugcheck is a system exception that can result in a forced crash, depending on the type of bugcheck. Non-fatal bugchecks cause entries to be recorded in the system error log, found in SYS$ERRORLOG:ERRLOG.SYS. These bugchecks can clarify what went wrong, allowing the systems programmer to use SDA to examine memory while the system is still running. Once code is in production, the code that generates bugchecks should be removed to avoid crashing the system; one of the options described in the last issue for controlling whether or not XDELTA should be called can also be used to enable and disable bugcheck code.

Bugchecks

Bugchecks can be produced in VAX MACRO-32 code by using the BUG_CHECK macro. This macro accepts two parameters: the bugcheck error code and a flag indicating whether or not the bugcheck is fatal. The bugcheck error code names begin with the prefix “BUG$_”. For example, BUG$_NOTPCB.

Unfortunately, the BUG$_ symbols are not defined in the LIB library; instead, they are part of the system symbol table (SYS$SYSTEM:SYS.STB on OpenVMS VAX). The types of bugchecks and their values can be examined by looking at the system map file in SYS$SYSTEM. The only description of all the bugcheck codes is in the OpenVMS source listings. The bugcheck codes are defined in module [SYS]BUGCHECK_CODES.LIS.

The following MACRO-32 code shows how to invoke the BUG_CHECK macro. Note that the last bugcheck code is CUSTOMER, which is reserved for customer use.

          BUG_CHECK       NOTPCB,FATAL          ; Not a PCB bugcheck
          BUG_CHECK       PGFIPLHI,FATAL        ; High-IPL pagefault bugcheck
          BUG_CHECK       CUSTOMER		; Non-fatal bugcheck

The BUG_CHECK macro prefixes the specified bugcheck name with “BUG$_”. Note that the FATAL parameter is case-sensitive; it must be specified as all uppercase.

On OpenVMS AXP, you can also specify a third parameter (REBOOT) to force a cold boot on a bugcheck. The format would be:

	BUG_CHECK	CUSTOMER,FATAL,COLD_BOOT

The default is WARM_BOOT. The reboot qualified is ignored for non-fatal bugchecks.

Under OpenVMS AXP, the BUG$_ symbols are not included in any of the loadable executive images but are instead located in SYS$LIBRARY:VMS$VOLATILE_PRIVATE_INTERFACES.OLB. They do appear in several of the .STB files in SYS$LOADABLE_IMAGES:, but those files were created by the linker and cannot be used as input to the linker.

In order to locate the BUG$_ symbols, you must specify the object library on either the LINK command or in a linker options file:

     $ LINK BUGCHECK,SYS$LIBRARY:VMS$VOLATILE_PRIVATE_INTERFACES.OLB/LIBRARY

To view the various BUG$_ symbol names, you can use the System Dump Analyzer (SDA). This is done using the READ/IMAGE command:

	$ ANALYZE/SYSTEM		!Invoke SDA

	OpenVMS AXP (TM) System analyzer

	SDA> read/image sys$loadable_images:exception.stb
	456 symbols read from SYS$COMMON:[SYS$LDR]EXCEPTION.STB;2
	SDA> sho sym bug$_customer
	BUG$_CUSTOMER = 000008E0
	SDA>

The SDA command SHOW SYMBOL BUG$_/ALL can be used to see all the BUG$_ symbols. In this case, BUG$_CUSTOMER equates to ^X8E0, so you could use that in a program like:

--------------------
	.LIBRARY	/SYS$LIBRARY:LIB.MLB/

	.PSECT	_CODE_,EXE,NOWRT,SHR
	.ENTRY	BUGCHECK,^M<>

	BUG_CHECK	CUSTOMER	; A customer-specific bugcheck
;
; The macro above expands to these instructions
;
;	EVAX_STQ	R16,-(SP)	; Save the contents of R16
;	EVAX_BUGCHK	#^X8E0		; BUG$_CUSTOMER is ^X8E0 (from SDA)
;	EVAX_LDQ	R16,-(SP)	; Restore the contents of R16

	RET				; Return to VMS
	.END	BUGCHECK
--------------------

The ANALYZE/ERROR command is used to check for bugcheck entries in the system error log:

--------------------
AXP$ analyze/error/include=bugchecks/sin=19:30
Error Log Report Generator                                     
Version V6.1

 ******************************* ENTRY    7782. *******************************
 ERROR SEQUENCE 3167.                            LOGGED ON:  CPU_TYPE 00000002
 DATE/TIME 22-JUN-1994 19:36:57.46                            SYS_TYPE 00000004
 SYSTEM UPTIME: 0 DAYS 00:51:03
 SCS NODE: ALPHA                                            OpenVMS AXP V6.1

 HW_MODEL: 0000040B Hardware Model = 1035.

 USER BUGCHECK DEC 3000 Model 400

 CUSTOMER, Reserved for customer use

       PROCESS NAME    Polter Goat
       PROCESS ID      00010032
[...]
--------------------

As shown in the example above, BUG_CHECKs can be issued from user-mode code if the running process has the BUGCHK privilege. Code executing in executive- or kernel-mode can also issue BUG_CHECKs.

Error Logging

Device drivers also have the option of calling the error log routines directly to create entries in the system error log. The error log entries can be defined to dump data relevant to debugging the application. Error log reporting should be considered for complex code that can be affected by external events within the operating system.

The error log routines ERL$DEVICERR, ERL$DEVICTMO, and ERL$DEVICEATTN are all documented in the _OpenVMS Device Support Reference_ manual. Of the three, ERL$DEVICEATTN is the only one not associated with a particular I/O request, allowing it to be used for general error logging for a device. The other two routines expect the address of an IRP to be passed in R3.

Code that is not executing at elevated IPL can also use the $SNDERR system service to produce an entry in the system error log. The $SNDERR system service accepts the address of a string descriptor containing the text of the message to write to the error log. The $SNDERR system service can be called from any access mode, but the process must have the BUGCHK privilege enabled.

The following example is a simple C program that demonstrates a call to $SNDERR.

--------------------
  #include 

  main(void)
  {
     $DESCRIPTOR(msg, "This is a test message to the system error log");

     return (SYS$SNDERR(&msg));
  }
--------------------

And this is a sample error log entry from the $SNDERR system service. It was located using the following command:

	$ ANALYZE/ERROR=CONTROL_ENTRIES

_____________________________________________________________________
 ******************************* ENTRY    7794. *******************************
 ERROR SEQUENCE 3180.                            LOGGED ON:  CPU_TYPE 00000002
 DATE/TIME 22-JUN-1994 19:48:11.86                            SYS_TYPE 00000004
 SYSTEM UPTIME: 0 DAYS 01:02:17
 SCS NODE: ALPHA                                            OpenVMS AXP V6.1

 HW_MODEL: 0000040B Hardware Model = 1035.

 $SNDERR MESSAGE DEC 3000 Model 400

 MESSAGE TEXT

       This is a test message to the system error log

_____________________________________________________________________

Code running at elevated IPL within process context can call the system service executive routine, EXE$SNDERR, in the same fashion. The error log routines acquire the EMB spinlock, which has an IPL of 31 and a rank of 0; it can be acquired with any other spinlocks already held. The EXE$SNDERR routine assumes the address of the current process PCB is in R4; it uses the PCB address to make sure the process has the BUGCHK privilege enabled. The routine must be invoked on the VAX via a CALLx instruction, with the address of the string descriptor passed in the argument list.

A code thread executing without process context will not be able to call EXE$SNDERR unless a suitable PCB address is placed in R4. As of OpenVMS VAX V5.5-1, the SWAPPER process has all privileges enabled; a driver could load the SWAPPER PCB address into R4 and then call EXE$SNDERR. The SWAPPER PCB address is stored in SCH$AR_SWPPCB; the following VAX MACRO-32 code shows one way to call it:

--------------------
	.
	.
	MOVL    G^SCH$AR_SWPPCB,R4	; Move SWAPPER PCB into R4
	PUSHAQ  MSG			; Push message descriptor address
	CALLS   #1,G^EXE$SNDERR		; Send the error to the error log
	.
	.
--------------------

Writing Messages to the Console

Invoking XDELTA while debugging systems code works very well, but there are cases when it is not practical to use XDELTA. One such case is a device driver that interacts with an external peripheral device. If the device driver starts some operation on the peripheral and then invokes XDELTA, the system will stop, but the peripheral may not. By the time the system is allowed to continue, the peripheral may or may not still be in the desired state, which can skew testing with the peripheral. In cases like this, messages to the system error log may be useful, but they cannot be examined in “real-time”; the error log buffers must be flushed to disk before they can be examined.

A time-honored method of debugging is the writing of informational messages to a video or hardcopy terminal. Such messages could indicate which subroutine is executing or the status of certain operations. For example, it is not uncommon when writing applications code to write a routine like the following:

--------------------
     .
     .
  int do_the_work (int x)
  {  int i;

     printf("Inside do_the_work()\n");
     for (i = 0; i < x; i++)
	printf("%d\n", i);

     printf("Leaving do_the_work()\n");

     return 1;
  }
--------------------

When the program calling this routine is run, the informational messages “Inside do_the_work()” and “Leaving do_the_work()” will be written to the current output device. This can speed the debugging process—if the second message is never printed, there must be a problem inside the loop that keeps it from exiting.

Obviously, any method that uses RMS to do the output is of limited-use in privileged code, since a kernel-mode routine cannot call the executive-mode RMS services. $QIO system service calls would work for process-context code that is not executing at elevated IPL. But what about real systems code, that may not have any process context and executes at elevated IPL? These code threads can write output directly to the system console using some little-known executive routines.

The Console I/O Routines

OpenVMS module [SYS]CONSOLIO.LIS contains I/O routines for the console terminal. On a VAX, the routines all expect the address of a terminal device’s CSR (Control Status Register) to be passed in general register R11. If R11 has the value 0, the system console device is used by calling CPU-specific routines named beginning with CON$.

Under OpenVMS AXP, the CSR address is ignored by the EXE$ console I/O routines. The system console device is used for all I/O operations. Also, the OpenVMS AXP source module [OPDRIVER] contains the source code for OPDRIVER, the console device driver.

The characters are written and read from the physical device registers, bypassing the normal device drivers for the device. The console I/O routines should be called from device IPL or higher, with any spinlock held. This makes them ideal for use in device drivers and other elevated-IPL code. The routines will work for non-elevated IPL as long as the system console is not being used for other purposes at that time.

Table 1-2 describes the EXE$ routines for writing to the console terminal. However, before the routines can be called, interrupts must be disabled on the console and then re-enabled. This is accomplished by calling two CON$ routines: CON$SAVE_CTY to save the current settings, and CON$RESTORE_CTY to restore the saved settings. These routines are defined in OPDRIVER (found in [SYSLOA] in the VAX listings and [OPDRIVER] in the AXP listings). CON$SAVE_ CTY returns the current console state in R0 and R1; CON$RESTORE_ CTY expects the console state to be passed in R0 and R1.

Table_1-2:__EXE$_Console_I/O_Routines_____________________________

Routine_______________Description_________________________________

EXE$OUTBYTE           Convert and output hex byte (value passed in R1)
EXE$OUTHEX            Convert and output hex longword (value passed in R1)
EXE$OUTBLANK          Output blank character
EXE$OUTCHAR           Output character (character passed in R0)
EXE$OUTCRLF           Output carriage return/line feed pair
EXE$OUTCSTRING        Output counted string (address passed in R1)
EXE$OUTZSTRING        Output zero terminated string (address
______________________passed_in_R1)_______________________________

Example 1-5 shows a VAX MACRO-32 module that defines two subroutines, PUT_CONSOLE_ASCIC and PUT_CONSOLE_HEX. These routines save the current console state, perform the specified output with some additional linefeeds and carriage returns, then restore the saved console state. The following fragment shows how these routines would be called:

--------------------
  MSG:    .ASCIC  /This is a test console message/
          .
          .
          MOVAQ   MSG,R0     ; Get address of ASCIC message
          JSB     PUT_CONSOLE_ASCIC       ; Write it to the console
          .
          .
          MOVL    #^XDEAD00ED,R0          ; Move the value to R0
          JSB     PUT_CONSOLE_HEX         ; Write it to the console
          .
          .
--------------------
Example 1-5:  PUT_CONSOLE Subroutines
_____________________________________________________________________

	.TITLE  PUT_CONSOLE - Write messages to console
	.IDENT  /01-000/
;+
;
;  File:        PUT_CONSOLE.MAR
;
;  Author:      Hunter Goatley
;
;  Date:        November 10, 1992
;
;  Description:
;
;       The module contains two routines for performing device I/O
;       to the system console: PUT_CONSOLE_ASCIC and PUT_CONSOLE_HEX.
;
;  Modified by:
;
;       01-000          Hunter Goatley          10-NOV-1992 13:09
;  Genesis.
;
;-
	.DSABL  GLOBAL

	.EXTRN  CON$RESTORE_CTY    ;* Restore console state
	.EXTRN  CON$SAVE_CTY       ;* Save console state
	.EXTRN  EXE$OUTCHAR        ;* Print a character
	.EXTRN  EXE$OUTCRLF        ;* Print 
	.EXTRN  EXE$OUTCSTRING     ;* Print ASCIC string
	.EXTRN  EXE$OUTHEX         ;* Dump a HEX string

;+
;
;  PUT_CONSOLE_ASCIC    Expects ASCIC address in R0, preserves all regs
;
;-
PUT_CONSOLE_ASCIC::
	PUSHR   #^M<r0,r1,r2,r3,r4,r11>		; Save registers
	CLRL    R11				; Show I/O to console
	PUSHL   R0				; Save address of string
	JSB     G^CON$SAVE_CTY			; Save console state
	MOVQ    R0,				; Save the data
	MOVL    #10,				; Move a  to R0
	JSB     G^EXE$OUTCHAR			; Send the 
	POPL    R1				; Restore address of string
	JSB     G^EXE$OUTCSTRING		; Write it out
	MOVL    #13,				; Move a  to R0
	JSB     G^EXE$OUTCHAR			; Send the 
	MOVQ    R3,				; Restore console state data
	JSB     G^CON$RESTORE_CTY		; Restore console state
	POPR    #^M<r0,r1,r2,r3,r4,r11>		; Save registers
	RSB					; Return to caller
;+
;
;  PUT_CONSOLE_HEX      Expects hex value address in R0, preserves all regs
;
;-
PUT_CONSOLE_HEX::
	PUSHR   #^M<r0,r1,r2,r3,r4,r11>         ; Save registers
	CLRL    R11				; Show I/O to console
	PUSHL   R0				; Save address of string
	JSB     G^CON$SAVE_CTY			; Save console state
	MOVQ    R0,R3				; Save the data
	JSB     G^EXE$OUTCRLF			; Follow with a 
	MOVL    #10,R0				; Move a  to R0
	JSB     G^EXE$OUTCHAR			; Send the 
	POPL    R1				; Restore hex value
	JSB     G^EXE$OUTHEX			; Write it out
	MOVL    #13,R0				; Move a  to R0
	JSB     G^EXE$OUTCHAR			; Send the 
	MOVQ    R3,R0				; Restore console state data
	JSB     G^CON$RESTORE_CTY		; Restore console state
	POPR    #^M<r0,r1,r2,r3,r4,r11>         ; Save registers
	RSB					; Return to caller

     .END

_____________________________________________________________________
</r0,r1,r2,r3,r4,r11></r0,r1,r2,r3,r4,r11></r0,r1,r2,r3,r4,r11></r0,r1,r2,r3,r4,r11>

Using the $SNDOPR System Service

Non-privileged code can send messages to the operator console by calling the $SNDOPR system service to queue a message to OPCOM, the operations communication manager. Messages received by OPCOM are sent to appropriate operator terminals, including the console, and are recorded in the system operator log (OPERATOR.LOG in SYS$MANAGER). Unfortunately, $SNDOPR cannot be called from kernel mode because it initially executes in executive mode. $SNDOPR uses the executive mode stack for temporary storage space for the message text, then changes mode to kernel to actually write the message to the OPCOM mailbox, whose address is stored in SYS$AR_ OPRMBX.

The system routine EXE$SNDOPR actually writes the message to the OPCOM mailbox by calling the system routine EXE$SNDMSG. This routine ensures that the memory holding the message text is faulted in and calls EXE$WRTMAILBOX to write the message to the OPCOM mailbox. In order to ensure that all the necessary pages have been faulted into memory, the address of the message text is rounded down to the previous page boundary. If the message is small and near the top of the executive stack, this can cause an overrun into the kernel stack, which works because the routine is executing in kernel mode. If the message were allowed to reside on the kernel stack, the rounding down of the address could result in the access of the page preceding the kernel stack, which is not a valid page. The system would then generate an access violation exception.

Still, $SNDOPR can be useful in non-privileged portions of systems code as a means of notifying the system manager of a particular event. Even at the driver-level, if a code thread needs to produce an alarm associated with a particular process, it may be able to queue an executive-mode AST to that process (to be discussed in a future article); the executive-mode AST could then call the $SNDOPR system service.

Using the POOLCHECK SYSGEN Parameter

A common problem when writing systems code is the accidental improper usage of system pool memory. Typical problems include using the pool memory after it has been deallocated, using non-initialized fields in a data structure allocated from nonpaged pool, and using unallocated pool. Depending on how the affected pool is being used, these problems can result in either immediate system crashes, a corruption of some structure that won’t be detected for some period of time, or no noticeable effect.

To help detect these problems as early as possible, a new SYSGEN parameter, POOLCHECK, was added to VAX/VMS v5.0. The POOLCHECK parameter value is a bitmask value that, when set to a non-zero value, causes a system bugcheck when pool memory corruption is detected. Setting POOLCHECK to 0 disables the pool corruption checks.

Like many of the features described in this book, the use of POOLCHECK is reserved to Digital, though the authors have successfully used it ever since it was introduced. Certain settings can cause problems with normal OpenVMS routines, though. Specifically, some lexical functions can be adversely affected when POOLCHECK is used to check pool in a process’s P1 region.

POOLCHECK works by “poisoning” pool memory when it is deallocated; the deallocated memory is overwritten with a pattern specified in the POOLCHECK bitmask. When that memory is later allocated, the poolcheck code checks to make sure the memory is still filled with the poison pattern. If it’s not, the system bugchecks with a POOLCHECK bugcheck code. Determining the cause of the pool corruption is easier because the system crashes before much time has passed since the corruption.

Note that enabling POOLCHECK does add some overhead to normal OpenVMS operations. The performance hit is usually negligible, but it could have a noticeable impact on busy systems.

It’s a good idea to enable POOLCHECK during all debugging of systems-code, because it can help detect problems that might not ever result in a crash but could still cause corruption. POOLCHECK is a dynamic SYSGEN parameter, which means it can be enabled and disabled without having to reboot the system. The format of the POOLCHECK value is:

          31        23       15       7       0
          +--------+--------+--------+--------+
          | allo   | free   | mbz    | flags  |
          +--------+--------+--------+--------+

Table 1-3 describes the flags field. The “free” byte specifies the fill pattern that is to be written to memory that it deallocated to free pool. The “allo” byte specifies the fill character for pool memory that is allocated; if bit 2 in the flags byte is set, the allocated memory will be overwritten with the “allo” byte before being returned to the caller. Bits 8-15 must be zero (“mbz”).

Table_1-3:__POOLCHECK_Flags_Descriptions_____________________________

Bit_value_(hex)__Description_________________________________________

01  Fill pool (after header) with "free" on deallocate.
02  Check packet for "allo" pattern on allocate.
04  Fill and check SRP packets.
08  Fill and check IRP packets.
10  Fill and check LRP packets.
20  Unused.
40  Unused.
80  Fill and check P1-space pool regions
--------------------------------------------------------------------

The POOLCHECK flag values are simply OR’ed together to produce the appropriate mask. The following example shows how POOLCHECK could be enabled to check all system addresses:

  $ MCR SYSGEN
  SYSGEN>  USE ACTIVE
  SYSGEN>  SET POOLCHECK %XABCD001F
  SYSGEN>  WRITE ACTIVE

Checking P1-space addresses can negatively affect DCL lexical functions, so it’s not enabled in the example above. When memory is deallocated, it will be filled with “CD” and all allocated memory will be filled with “AB” before being given to the user.

For more information on POOLCHECK, including a thorough description of how the checks are made, please see the _OpenVMS ID&S_ manual.

WPDRIVER

The Watchpoint utility, hereafter referred to as WP, is an “Internal Use Only” debugging tool that can be used to record modifications of addresses in OpenVMS system space (S0). WP sets up a watchpoint that works much like those in the OpenVMS debugger: each time a watched address is modified, WP performs some designated action. For each address being watched, WP can be directed to log the information in a system buffer, invoke XDELTA, or force a system crash.

WP has been supplied with OpenVMS since VAX/VMS v4.0. It has never been documented, but there is a help library for it in SYS$HELP (WP.HLB).

WP was not included with OpenVMS AXP V1.0. It did show up in OpenVMS AXP V1.5, but it doesn’t appear to work correctly yet, as of OpenVMS AXP V6.1.

Using the WP Interface

WP is implemented as a pseudo-device driver, WPDRIVER, and a user interface, WP.EXE, located in SYS$SYSTEM. WP.EXE communicates with the driver by assigning a channel to the pseudo-device WPA0: and using $QIO system service calls to tell WP which addresses to watch or stop watching. Before WP can be used, the driver must be loaded into the system and the WPA0: device must be created; both of these tasks can be performed with a single SYSGEN command:

  $ MCR SYSGEN
  SYSGEN>  CONNECT WPA0:/NOADAPTER
  SYSGEN>  EXIT
  $

The /NOADAPTER qualifier on the CONNECT tells SYSGEN that the device is a pseudo-device and not a physical device connected to the system. The file specification for the device driver can be specified using the /DRIVER qualifier. If the qualifier is omitted, the driver file defaults to xxDRIVER in the SYS$LOADABLE_ IMAGES: directory, where “xx” are the first two letters in the device name.

Once the WPA0: device exists, a process with PHY_IO privilege can enable watchpoints using WP.EXE, found in SYS$SYSTEM. When WP is run, an identification banner is displayed:

  $ mcr wp
  Watch Point Utility Version X-4

  WP>

If the device WPA0: does not exist, an error message will be displayed and the program will exit.

The primary WP commands are WATCH, SHOW, IGNORE, SET, and HELP. The WATCH command is used to start watching a system address. Qualifiers are used to specify the size of the address and the action to be taken when the watchpoint is modified. The format of the command is:

  WP> WATCH address-to-be-watched [/qualifiers]

The address to be watched must be a valid hexadecimal system address (addresses within the WPDRIVER code cannot be watched). WP uses the same symbolic representation for system addresses that SDA uses: “G” causes 8000000016 to be added to the specified value to produce a system address. For example, 8000444816 could be specified as simply G444816.

The qualifiers for WATCH specify the size to be watched (/BYTE, /WORD, /LONG, and /QUAD) and the action to be taken when the location is modified (/SILENT, /FATAL, /XDELTA). The default qualifiers are /LONG and /SILENT, which causes WP to maintain a modification history in memory for the longword watchpoint. This history can be examined with the SHOW command, described below. When a silent mode watchpoint is modified, WP saves the time of the modification, the previous and modified contents, the contents of all registers at that time, and 15 bytes of the instruction stream that modified the watchpoint.

Assuming the XDELTA debugger is loaded, /XDELTA causes the system to stop in XDELTA when the watchpoint is modified; any XDELTA command can be executed at that point. If XDELTA is not loaded, the call is dismissed and the system continues.

The /FATAL qualifier causes WP to force a system crash, saving the current contents of memory to the system dump file for later analysis. The bugcheck message displayed by SDA when analyzing the dump is:

  WATCHPOINT, Watchpoint encountered by the watchpoint driver (WPDRIVER)

The address of the instruction that modified the watchpoint will be located at the position pointed to by the stack pointer (SP) in the SDA SHOW STACK display.

The following examples demonstrate setting up watchpoints:

  WP> WATCH 80002024/SILENT!Watch longword in silent mode
  WP> WATCH G45672/XDELTA/QUAD          !Watch quadword & invoke XDELTA
  WP> WATCH 803A32C1/BYTE/FATAL         !Crash system if byte is modified

The IGNORE command is used to delete the watchpoint to stop watching an address. It has no qualifiers and takes only the address as a parameter:

  WP> IGNORE G45672

Exiting from WP.EXE does not delete the watchpoints; the driver will continue to monitor the watchpoints until they are deleted using the IGNORE command.

The SHOW command is used to display watchpoints set with the WATCH command. The format of the command is:

  WP> SHOW watchpoint-address [/qualifiers]

The default qualifier is /CONTROL_BLOCK, which causes WP to dump out a brief summary of the watchpoint, as shown in Example 1-6. Once an address has been modified, the display changes to include the contents of the registers at the time of the modification and 15 bytes of the instruction stream that modified the watchpoint.

The SHOW command supports two more qualifiers: /TRACE_TABLE, which causes WP to display the maintained history of modifications to the watchpoint, and /FULL, which displays both the /CONTROL_BLOCK and the /TRACE_TABLE information.

Example 1-6:  Sample Output from WP SHOW Command
_____________________________________________________________________
WP> SHOW 80008138

   Base Address    = 80008138 Length         = 04
   Address Touched = 00000000 Type           = SILENT
   Time Touched    = 00:00:00.00           Touched Count  = 00000000

Watch Point Contents

Initial = 0000000000000000  Previous = 0000000000000000  Post = 0000000000000000
WP>
_____________________________________________________________________

The $QIO Interface to WP

In addition to the WP interface, any program can set and delete watchpoints by assigning a channel to WPA0: and using the $QIO system service to communicate with the driver. WPDRIVER supports the QIO function codes shown in Table 1-4.

Table_1-4:__WP_QIO_Function_Codes____________________________________

IO$_ACCESS    Sets a watchpoint. The following function modifiers are
	      accepted:

	None		Sets a silent watchpoint.
	IO$M_ACCESS	Sets up a read-only watchpoint (not implemented).
	IO$M_CTRL	Invokes XDELTA when the address is modified.
	IO$M_ABORT	Causes a fatal system bugcheck.
	IO$_DEACCESS	Clears a watchpoint.
	IO$_RDSTATS	Returns the modification history for a particular
			watchpoint
--------------------------------------------------------------------

The function code IO$_ACCESS requires two parameters in the $QIO call: the size of the memory to watch (1, 2, 4, or 8 bytes), which is passed as parameter P2, and the address to watch, which is passed as parameter R3. IO$_DEACCESS accepts only parameter P3, the address to stop watching. The following MACRO-32 code shows how a watchpoint whose address is stored in WP_ADDR would be created:

     .
     .
     $ASSIGN_S -   			; Assign an I/O channel to
		DEVNAM=WPA0_NAME,-	; ...  WPA0:
		CHAN=WP_CHAN		; ...
	BLBC    R0,10$			; Branch on error
;
;  Set up the watchpoint like /SILENT.  Could use the following, too:
;
;       IO$_ACCESS!IO$M_CTRL    - Invoke XDELTA
;       IO$_ACCESS!IO$M_ABORT   - Crash the system
;
	$QIOW_S CHAN=WP_CHAN,-		; Issue QIO to start watching
		FUNC=#IO$_ACCESS,	; ...  the address.  Just keep
		IOSB=WP_IOSB,-		; ...  a history of the changes.
		P2=#4,-			; ...
		P3=ADDR			; ...
	BLBC    R0,10$			; Branch on error
	BLBC    WP_IOSB,10$		; Branch on error
     .
     .

The I/O function IO$_RDSTATS returns a watchpoint information to a specified user buffer; the buffer address is passed as parameter P1, the length of the buffer is passed as P2, and the watchpoint address is again passed as P3. For information about the format of this buffer, please consult the WP sources in the OpenVMS source listings kit.

The System Interface to WP

Code executing at elevated IPL can make use of the WP utility through calls to two (undocumented, of course) system routines, WP$CREATE_WATCHPOINT and WP$DELETE_WATCHPOINT. On the VAX, both routines are called with JSB (Jump SuBroutine) instructions. As with XDELTA, if WPDRIVER has not been loaded into system memory, the WP$ routines just point to RSB (Return from SuBroutine) instructions, dismissing any calls to them. The WPDRIVER initialization routine is responsible for setting these global addresses to point to routines within WPDRIVER itself.

The WP$ routines raise IPL to 31 when creating or deleting a watchpoint, so they can be called from any IPL and with any spinlock held. This makes them ideal for use in driver-level code.

Parameters to WP$CREATE_WATCHPOINT are passed in general registers R0-R2:

  • R0 = State bits to tell WP the action to take. These bits are defined in $WPCBDEF, found in LIB. The valid values are 0 (equivalent to /SILENT), WPCB_STATE$M_FATAL (/FATAL) and WPCB_STATE$M_BPT (/XDELTA).
  • R1 = The length of the watchpoint (1, 2, 4, or 8 bytes).
  • R2 = The address of the watchpoint.

WP$DELETE_WATCHPOINT takes only one parameter: the address of the watchpoint to be deleted, which is passed in R2.

When writing device drivers or other code that uses OpenVMS data structures, a common problem during debugging is finding that some field in some structure is getting trashed. For example, assume the PID field in UCBs for a particular driver are being zeroed by some unknown code. Using the WP user interface, the watchpoint on the address of the PID field could be set, but this address is not a constant value; every UCB will have its own PID field, with its own address.

One solution to this problem would be to add a code segment similar to the following in the unit initialization routine for the driver. Each time a device is created, a watchpoint would be set for the PID field:

  ;
  ; Assume the UCB address is in R5
  ;
         MOVAL   UCB$L_PID(R5),R2        ; Get the watchpoint address
         MOVL    #4,R1      ; The field is 4 bytes long
         CLRL    R0         ; Just log the modifications
  ;      MOVL    #WPCB_STATE$M_BPT,R0    ; We could invoke XDELTA
  ;      MOVL    #WPCB_STATE$M_FATAL,R0  ; ... or crash the system
         JSB     G^WP$CREATE_WATCHPOINT  ; Go create the watchpoint

The next time the PID field in that UCB is modified, WP will record the change, which can be examined using the SHOW command in the WP user interface.

Similarly, the watchpoint could be deleted from within another routine in the driver:

         MOVAL   UCB$L_PID(R5),R2        ; Get the watchpoint address
         JSB     G^WP$DELETE_WATCHPOINT  ; Go delete the watchpoint

NOTE: Extreme caution should be exercised when setting watchpoints on dynamic addresses allocated from nonpaged pool, as in the example above. The watchpoint must be deleted before that memory is released back to free pool, otherwise WP will continue to be called each time that address is re-used. If the watchpoint were set to invoke XDELTA or to bugcheck, the results would be undesirable, at best.


How WP Works

WP works by forcing access violations on watched addresses and trapping the exception. When WPDRIVER is loaded, it modifies the system control block (SCB) so that the entry for the access violation exception handler points to a routine within WPDRIVER. From that point on, when an access violation occurs, WPDRIVER is called to handle the exception. The WP exception handler compares the target address with a table of known watchpoints. If the address is not a watchpoint, WPDRIVER calls the original access violation exception handler to process the error.

When a watchpoint is set, WP changes the protection on that page of memory to disallow writes to that page. For example, a page whose protection is set to allow kernel-mode write access is modified to only allow kernel-mode read access with no write access. When an instruction tries to modify an address on that page, an access violation occurs and OpenVMS calls the WP exception handler through the modified SCB vector. If the target address that caused the access violation is set up as a watchpoint address, WP performs whatever action has been established for that watchpoint: recording the information, invoking XDELTA, or generating a fatal system bugcheck. Once WP has performed the designated action, the page protection is temporarily restored to its original setting and the access is allowed to occur.

Note that because WP modifies page protections to perform its work, the target address cannot span a page boundary.


Hunter Goatley, Western Kentucky University, Bowling Green, KY.
Edward A. Heinrich, Vice-President, LOKI Group, Inc., Sandy, UT.

 Posted by at 10:32 pm