DAVID W. DELEY
Here are a few examples of how to examine a system crash dump file using
the Symbolic Dump Analyzer. This may come in handy if you're trying to
debug some kernel mode code. The following examples occured while I was
developing the DEALLOC program on VMS 5.3.
The following commands may be used to obtain information about a system
crash:
$ ANALYZE/CRASH SYS$SYSTEM:SYSDUMP.DMP
SDA> SHOW CRASH
SDA> SHOW STACK
Scan the stack for the value FFFFFFFD. From there identify the
MECHANISM ARRAY and the SIGNAL ARRAY. From the SIGNAL ARRAY get the
exception value which follows. Now we must determine what exception
this is.
Create a list of all SS$_xxxx symbolic values as follows:
1. Create 5 line file GET.MAR below:
.TITLE GET
.LIBRARY /SYS$LIBRARY:LIB/
.LINK 'SYS$SYSTEM:SYS.STB' /SELECTIVE_SEARCH
$SSDEF GLOBAL
.END
2. Compile and link with a full map:
$ MACRO GET.MAR
$ LINK/NOEXE/MAP=SSDEF/FULL GET.OBJ
3. Examine SSDEF.MAP and find exception value to get SS$_xxxx symbol.
Find SS$_xxxx symbol in "Introduction to VMS System Services" chapter on
"Condition-Handling Services" subsection "Types of Exception" (Section
10.1 table 10-1 VMS 5.0 manual). The table will indicate what arguments
if any follow.
After the arguments (if any) come the exception PC an PSL in the SIGNAL
ARRAY. This tells where the program was when the exception occurred.
Example #1:
$ ANALYZE/CRASH SYS$SYSTEM:SYSDUMP.DMP
SDA> SHOW STACK
Current operating stack (KERNEL):
SP => 7FFE776C 00000000 handler
7FFE7770 00000000 mask/psw
7FFE7774 00000000 AP
7FFE7778 7FFE77C0 FP
7FFE777C 80000014 PC SYS$CALL_HANDL+004
7FFE7780 801302F2 PC (JSB @#SYS$CALL_HANDL)
7FFE7784 00000002 signal argument list. 2 arguments follow
7FFE7788 7FFE77A8 address of signal array
7FFE778C 7FFE7790 address of mechanism array
7FFE7790 00000004 MECHANISM ARRAY
7FFE7794 7FFE77E4 establisher frame
7FFE7798 FFFFFFFD depth (-3, last chance exception vector)
7FFE779C 00000001 R0 at time of exception
7FFE77A0 0000008C R1 at time of exception
7FFE77A4 000008F8 signal/stop code. ($SS$_NOHANDLER)
7FFE77A8 00000005 SIGNAL ARRAY
7FFE77AC 0000000C exception. value = SS$_ACCVIO
7FFE77B0 00000004 reason mask. 00=read, 04=write
7FFE77B4 00000018 inaccessible virtual address
7FFE77B8 000006E6 exception PC
7FFE77BC 00C20008 exception PSL
7FFE77C0 00000000 handler (call frame placed on stack by call
7FFE77C4 003C0000 mask/psw to kernel mode routine FORCE_DEALLOCATE)
7FFE77C8 7FEF2178 AP
7FFE77CC 7FFE77E4 FP
7FFE77D0 80157E7D PC (PROCESS_MANAGEMENT+0087D)
7FFE77D4 00000004 R2 (registers saved as specified by entry
7FFE77D8 7FF77185 R3 mask to kernel mode routine.)
7FFE77DC 80248FE0 R4
7FFE77E0 7FFE5EC4 R5
7FFE77E4 00000000 handler
7FFE77E8 00000000 mask/psw
7FFE77EC 7FEF2178 AP (arguments are on user stack)
7FFE77F0 7FEF215C FP (top of user stack)
7FFE77F4 8012E3D0 PC
7FFE77F8 7FFEDE96 PC SYS$CMKRNL+006
7FFE77FC 03C00000 old PSL
Example #2:
In this example, the exception PC in the SIGNAL ARRAY is above 80000000
indicating the exception happened somewhere in a system routine. To
find where we were in our program, further examination of the stack is
required. You can try scanning downwards from the SIGNAL ARRAY for a
value which looks like it's in the proper range to be a PC return
address, or you can try scanning upwards from the bottom and attempt to
identify how the stack was formed. In this case, the program executed a
JSB instruction to a system routine, the system routine pushed one value
onto the stack, and then crashed with an access violation.
$ ANALYZE/CRASH SYS$SYSTEM:SYSDUMP.DMP
SDA> SHOW STACK
Current operating stack (KERNEL):
SP => 7FFE7788 00000004 MECHANISM ARRAY. 4 arguments follow
7FFE778C 7FFE77C0 establisher frame
7FFE7790 FFFFFFFD depth (-3, last chance exception vector)
7FFE7794 00000042 R0 at time of exception
7FFE7798 00000470 R1 at time of exception
7FFE779C 00000001 signal/stop code
7FFE77A0 00000005 SIGNAL ARRAY. 5 arguments follow
7FFE77A4 0000000C exception. value = SS$_ACCVIO
7FFE77A8 00000000 reason mask. 00=read, 04=write
7FFE77AC 00000060 inaccessible virtual address
7FFE77B0 8014EA38 exception PC (somewhere in a system routine)
7FFE77B4 00C80004 exception PSL
7FFE77B8 00000002 (pushed on stack by system routine)
7FFE77BC 00000668 PC (return address pushed on stack by JSB call.
This is the value we need. It tells us
where in our program code we were when
crash happened.)
7FFE77C0 00000000 handler (call frame placed on stack by call
7FFE77C4 003C0000 mask/psw to kernel mode routine FORCE_DEALLOCATE)
7FFE77C8 7FEF2178 AP
7FFE77CC 7FFE77E4 FP
7FFE77D0 80157E7D PC (PROCESS_MANAGEMENT+0087D)
7FFE77D4 00000004 R2 (registers saved as specified by entry
7FFE77D8 7FF77185 R3 mask to kernel mode routine.)
7FFE77DC 802356E0 R4
7FFE77E0 7FFE5EC4 R5
7FFE77E4 00000000 handler
7FFE77E8 00000000 mask/psw
7FFE77EC 7FEF2178 AP (arguments are on user stack)
7FFE77F0 7FEF215C FP (top of user stack)
7FFE77F4 8012E3D0 PC
7FFE77F8 7FFEDE96 PC SYS$CMKRNL+006
7FFE77FC 03C00000 old PSL
Example #3:
Analysis of system crash:
Time of system crash: 3-SEP-1992 21:58:45.20
CPU bugcheck codes:
CPU 01 -- SSRVEXCEPT, Unexpected system service exception
$ ANALYZE/CRASH SYS$SYSTEM:SYSDUMP.DMP
VAX/VMS V5.4 -- System Dump Analysis -- 4-SEP-1992 09:50:29.83
System crash information
Time of system crash: 3-SEP-1992 21:58:45.20
Version of system: VAX/VMS VERSION V5.4
System type: VAX 6000-210
CPU bugcheck codes:
CPU 01 -- SSRVEXCEPT, Unexpected system service exception
SDA> SHOW STACK
Current operating stack (KERNEL):
SP => 7FFE7694 00000000 handler
7FFE7698 00000000 mask
7FFE769C 7FFE773C AP
7FFE76A0 7FFE76F4 FP
7FFE76A4 80000014 PC (sys$call_handl+004)
7FFE76A8 803046C4 PC (JSB @#SYS$CALL_HANDL)
7FFE76AC 00000002 signal argument list. (2 arguments follow)
7FFE76B0 7FFE76D0 address of signal array
7FFE76B4 7FFE76B8 address of mechanism array
7FFE76B8 00000004 MECHANISM ARRAY (4 arguments follow)
7FFE76BC 7FFE77CC establisher frame
7FFE76C0 FFFFFFFD depth (-3, last chance exception vector)
7FFE76C4 7FFA3F60 R0 at time of exception
7FFE76C8 7FFE2684 R1 at time of exception
7FFE76CC 000008F8 signal/stop code. (SS$_NOHANDLER)
7FFE76D0 00000005 SIGNAL ARRAY (4 arguments follow)
7FFE76D4 00000444 exception. Value = $SS_PAGRDERR
7FFE76D8 00000004 translation not valid reason
7FFE76DC 7FFA3B70 virtual address of referenced page
7FFE76E0 80318AA8 exception PC
7FFE76E4 00000009 exception PSL
We identify the exception 00000444 as being $SS_PAGRDERR. From the
"Introduction to VMS System Services" chapter on "Condition-Handling
Services" subsection "Types of Exception" table 10-1 (Section 10.1 in
the VMS 5.0 manual) we find there are two arguments which follow this
error: 1) the translation not valid bit mask code, 2) the virtual
address of referenced page. After the arguments follow the exception PC
and PSL.
We thus determine:
Cause of crash: SS$_PAGRDERR. Read error occurred during an attempt
to read a faulted page from disk.
Translation not valid reason: Specified virtual address not valid.
Virtual address of referenced page: 7FFA3B70
A check of the system error log SSY$ERRORLOG:ERRLOG.SYS using the DCL
command $ ANALYZE/ERROR revealed that an unrecoverable disk read error
occurred just before the system crashed. A further check of the system
error log showed that all of our DUB_ disk drives had logged sporadic
errors in the last few days. The cause of the crash was a faulty disk
controller for the DUB_ disk drives.