DAVID W. DELEY

Here are a few examples of how to examine a system crash dump file using
the Symbolic Dump Analyzer.  This may come in handy if you're trying to
debug some kernel mode code.  The following examples occured while I was
developing the DEALLOC program on VMS 5.3.

The following commands may be used to obtain information about a system
crash:
     $ ANALYZE/CRASH SYS$SYSTEM:SYSDUMP.DMP
     SDA> SHOW CRASH
     SDA> SHOW STACK

Scan the stack for the value FFFFFFFD.  From there identify the
MECHANISM ARRAY and the SIGNAL ARRAY.  From the SIGNAL ARRAY get the
exception value which follows.  Now we must determine what exception
this is.

Create a list of all SS$_xxxx symbolic values as follows:

    1.  Create 5 line file GET.MAR below:

             .TITLE GET
             .LIBRARY     /SYS$LIBRARY:LIB/
             .LINK        'SYS$SYSTEM:SYS.STB' /SELECTIVE_SEARCH
             $SSDEF GLOBAL
             .END

    2.  Compile and link with a full map:

            $ MACRO GET.MAR
            $ LINK/NOEXE/MAP=SSDEF/FULL GET.OBJ

    3.  Examine SSDEF.MAP and find exception value to get SS$_xxxx symbol.

Find SS$_xxxx symbol in "Introduction to VMS System Services" chapter on
"Condition-Handling Services" subsection "Types of Exception" (Section
10.1 table 10-1 VMS 5.0 manual).  The table will indicate what arguments
if any follow.

After the arguments (if any) come the exception PC an PSL in the SIGNAL
ARRAY.  This tells where the program was when the exception occurred.

Example #1:

     $ ANALYZE/CRASH SYS$SYSTEM:SYSDUMP.DMP
     SDA> SHOW STACK

Current operating stack (KERNEL):

 SP =>  7FFE776C  00000000 handler
	7FFE7770  00000000	mask/psw
	7FFE7774  00000000	AP
	7FFE7778  7FFE77C0	FP
	7FFE777C  80000014	PC   SYS$CALL_HANDL+004

	7FFE7780  801302F2	PC   (JSB @#SYS$CALL_HANDL)

	7FFE7784  00000002	signal argument list.  2 arguments follow
	7FFE7788  7FFE77A8	address of signal array
	7FFE778C  7FFE7790	address of mechanism array

	7FFE7790  00000004	MECHANISM ARRAY
	7FFE7794  7FFE77E4	establisher frame
	7FFE7798  FFFFFFFD	depth (-3, last chance exception vector)
	7FFE779C  00000001	R0 at time of exception
	7FFE77A0  0000008C	R1 at time of exception

	7FFE77A4  000008F8	signal/stop code.  ($SS$_NOHANDLER)

	7FFE77A8  00000005	SIGNAL ARRAY
	7FFE77AC  0000000C	exception.  value = SS$_ACCVIO
	7FFE77B0  00000004	reason mask.  00=read, 04=write
	7FFE77B4  00000018	inaccessible virtual address
	7FFE77B8  000006E6	exception PC
	7FFE77BC  00C20008	exception PSL

	7FFE77C0  00000000	handler    (call frame placed on stack by call
	7FFE77C4  003C0000	mask/psw    to kernel mode routine FORCE_DEALLOCATE)
	7FFE77C8  7FEF2178	AP
	7FFE77CC  7FFE77E4	FP
	7FFE77D0  80157E7D	PC   (PROCESS_MANAGEMENT+0087D)

	7FFE77D4  00000004	R2   (registers saved as specified by entry
	7FFE77D8  7FF77185	R3    mask to kernel mode routine.)
	7FFE77DC  80248FE0	R4
	7FFE77E0  7FFE5EC4	R5

	7FFE77E4  00000000	handler
	7FFE77E8  00000000	mask/psw
	7FFE77EC  7FEF2178	AP  (arguments are on user stack)
	7FFE77F0  7FEF215C	FP  (top of user stack)
	7FFE77F4  8012E3D0	PC

	7FFE77F8  7FFEDE96	PC   SYS$CMKRNL+006
	7FFE77FC  03C00000	old PSL

Example #2:

In this example, the exception PC in the SIGNAL ARRAY is above 80000000
indicating the exception happened somewhere in a system routine.  To
find where we were in our program, further examination of the stack is
required.  You can try scanning downwards from the SIGNAL ARRAY for a
value which looks like it's in the proper range to be a PC return
address, or you can try scanning upwards from the bottom and attempt to
identify how the stack was formed.  In this case, the program executed a
JSB instruction to a system routine, the system routine pushed one value
onto the stack, and then crashed with an access violation.

     $ ANALYZE/CRASH SYS$SYSTEM:SYSDUMP.DMP
     SDA> SHOW STACK

Current operating stack (KERNEL):

 SP =>  7FFE7788  00000004 MECHANISM ARRAY.  4 arguments follow
	7FFE778C  7FFE77C0	establisher frame
	7FFE7790  FFFFFFFD	depth (-3, last chance exception vector)
	7FFE7794  00000042	R0 at time of exception
	7FFE7798  00000470	R1 at time of exception

	7FFE779C  00000001	signal/stop code

	7FFE77A0  00000005	SIGNAL ARRAY.  5 arguments follow
	7FFE77A4  0000000C	exception.  value = SS$_ACCVIO
	7FFE77A8  00000000	reason mask.  00=read, 04=write
	7FFE77AC  00000060	inaccessible virtual address
	7FFE77B0  8014EA38	exception PC  (somewhere in a system routine)
	7FFE77B4  00C80004	exception PSL

	7FFE77B8  00000002	(pushed on stack by system routine)

	7FFE77BC  00000668	PC (return address pushed on stack by JSB call.
                                    This is the value we need.  It tells us
                                    where in our program code we were when
                                    crash happened.)

	7FFE77C0  00000000	handler    (call frame placed on stack by call
	7FFE77C4  003C0000	mask/psw    to kernel mode routine FORCE_DEALLOCATE)
	7FFE77C8  7FEF2178	AP
	7FFE77CC  7FFE77E4	FP
	7FFE77D0  80157E7D	PC   (PROCESS_MANAGEMENT+0087D)

	7FFE77D4  00000004	R2   (registers saved as specified by entry
	7FFE77D8  7FF77185	R3    mask to kernel mode routine.)
	7FFE77DC  802356E0	R4
	7FFE77E0  7FFE5EC4	R5

	7FFE77E4  00000000	handler
	7FFE77E8  00000000	mask/psw
	7FFE77EC  7FEF2178	AP   (arguments are on user stack)
	7FFE77F0  7FEF215C	FP   (top of user stack)
	7FFE77F4  8012E3D0	PC

	7FFE77F8  7FFEDE96	PC   SYS$CMKRNL+006
	7FFE77FC  03C00000	old PSL

Example #3:

Analysis of system crash:

Time of system crash:  3-SEP-1992 21:58:45.20

CPU bugcheck codes:
	CPU 01 -- SSRVEXCEPT, Unexpected system service exception


$ ANALYZE/CRASH SYS$SYSTEM:SYSDUMP.DMP

VAX/VMS V5.4     -- System Dump Analysis  --  4-SEP-1992 09:50:29.83
System crash information
Time of system crash:  3-SEP-1992 21:58:45.20
Version of system: VAX/VMS VERSION V5.4
System type: VAX 6000-210
CPU bugcheck codes:
	CPU 01 -- SSRVEXCEPT, Unexpected system service exception

SDA> SHOW STACK

Current operating stack (KERNEL):

   SP =>  7FFE7694  00000000 handler
		7FFE7698  00000000	mask
		7FFE769C  7FFE773C	AP
		7FFE76A0  7FFE76F4	FP
		7FFE76A4  80000014	PC  (sys$call_handl+004)

		7FFE76A8  803046C4	PC  (JSB @#SYS$CALL_HANDL)

		7FFE76AC  00000002	signal argument list.  (2 arguments follow)
		7FFE76B0  7FFE76D0	address of signal array
		7FFE76B4  7FFE76B8	address of mechanism array

		7FFE76B8  00000004	MECHANISM ARRAY    (4 arguments follow)
		7FFE76BC  7FFE77CC	establisher frame
		7FFE76C0  FFFFFFFD	depth (-3, last chance exception vector)
		7FFE76C4  7FFA3F60	R0 at time of exception
		7FFE76C8  7FFE2684	R1 at time of exception

		7FFE76CC  000008F8	signal/stop code.  (SS$_NOHANDLER)

		7FFE76D0  00000005	SIGNAL ARRAY    (4 arguments follow)
		7FFE76D4  00000444	exception.  Value = $SS_PAGRDERR
		7FFE76D8  00000004	translation not valid reason
		7FFE76DC  7FFA3B70	virtual address of referenced page
		7FFE76E0  80318AA8	exception PC
		7FFE76E4  00000009	exception PSL

We identify the exception 00000444 as being $SS_PAGRDERR.  From the
"Introduction to VMS System Services" chapter on "Condition-Handling
Services" subsection "Types of Exception" table 10-1 (Section 10.1 in
the VMS 5.0 manual) we find there are two arguments which follow this
error:  1) the translation not valid bit mask code, 2) the virtual
address of referenced page.  After the arguments follow the exception PC
and PSL.

We thus determine:

Cause of crash:  SS$_PAGRDERR.  Read error occurred during an attempt
                 to read a faulted page from disk.

Translation not valid reason:  Specified virtual address not valid.

Virtual address of referenced page:  7FFA3B70

A check of the system error log SSY$ERRORLOG:ERRLOG.SYS using the DCL
command $ ANALYZE/ERROR revealed that an unrecoverable disk read error
occurred just before the system crashed.  A further check of the system
error log showed that all of our DUB_ disk drives had logged sporadic
errors in the last few days.  The cause of the crash was a faulty disk
controller for the DUB_ disk drives.

Back