moshix.mvs/SVCDumpsGuide.txt

A BEGINNER'S GUIDE TO SVC DUMPS
by Anne Peticolas

Anne Peticolas is a systems programmer at the Veteran's
Administration Data Processing Center in Austin, Texas.

To the novice systems programmer, system dumps are baffling indeed.
It's often very unclear what the problem was, and even where, among
all those sheets of paper, to find what task failed and which are
its registers.  The best guide, of course, is an experienced and
patient co-worker.  But, failing that, this article attempts to
help a beginner start to read system dumps and extract meaningful
information.

There are two kinds of system dumps:  SVC dumps and standalone
dumps.  Techniques for reading them differ somewhat, and this
article will deal only with SVC dumps.

Print Parameters

So your system has produced a SVC dump.  You want to look at it,
so you're going to format and print it with AMDPRDMP.  What
parameters should you use?

A good set to begin with is:

SUMMARY, SUMDUMP, LPAMAP, EPA, NUCMAP EPA, LOGDATA, TRACE, MTRACE,
PRINT CURRENT

(Of course you will want to add JES2, JES3, or VTAMMAP to these
when appropriate.)  You can always reprint the dump using other
parameters should you need more later--but most times you will not.

On the top right-hand side of most pages in your printed dump, you
will have, reading from the right, a page number, a time, a date,
and a module name.  Very frequently the module name will be
IEAVTSDT.  This is not because IEAVTSDT is IBM's worst coded module
and has lots of problems.  Rather, IEAVTSDT is the dump task, and
it is often the current task when an SVC dump is taken.

Scheduled Vs. Synchronous Dumps

When you see IEAVTSDT, the dump you have in hand was a "scheduled
dump", and was produced by issuing an SDUMP macro with a BRANCH=YES
option.  The macro saved some information, scheduled an SRB for the
dump task, and returned to the dispatcher.

When BRANCH=NO is specified, on the other hand, an SVC 33 is issued
directly by the failing task.  Your system "freezes" (no work is
dispatched) while the SVC dump is produced.  This type of dump is
called a "synchronous dump".  It is easy to see that "current"
information is more likely to be relevant to your problem in a
synchronous than in a scheduled dump.

If information in a synchronous dump is more immediate, why would
a scheduled dump be taken?  One common reason is that for some
reason the failing task is not able to issue an SVC so the code
must be branched to.  (For instance, a task can't issue an SVC when
locked.)

The Dump Title

On the top left-hand side of most pages you will see the dump
title, for instance:

TITLE FROM DUMP:  SMF ABEND,ERRMOD=IFAPCWTR,
                  RECVMOD=IFAPCWTR

The title itself is somewhat informative, but you can get still
more information by looking at Appendix B of the MVS/XA Diagnostic
Techniques manual.  There the titles are listed alphabetically,
along with an explanation, tips on which areas to pay attention to
in the dump, and whether "a software record is written to
SYS1.LOGREC".  The component and the issuing module associated with
the dump title will be given, but since these modules are
frequently recovery routines (ESTAEs or FRRs), this does not
necessarily mean that the source of the problem has been located.

The Dump Summary Page

This is page 1 and usually 2 of the dump, immediately following the
print dump index.

Look at the area following the words MVS SYMPTOM STRING.  Here is
listed the abend code and some other information.  FI stands for
FAILING INSTRUCTION AREA.  This shows the code for six bytes below
and above where the failing PSW points.  Often, the instruction
that caused the trouble can be seen.

The REGS information shows which registers at the time of failure
point into that area, and what the displacement from the register
is.  For instance, "0B008" would mean that the failing PSW can be
obtained by adding 8 to register 11.  On pages 3 and 4 you can find
the registers and PSW at the time of failure, and ASID number, and
the SDWA.

MTRACE

Now is a good time to look at the output from the MTRACE verb.
Here will be seen the system log data, which obviously can be quite
helpful in discovering what was going on before the dump was taken.
Previous failures may be seen in the log (debugging should start
with the dump from the first problem to occur) along with clues to
what combination of circumstances caused a problem.  Of course, any
information the operators have provided or that has been observed
must also be taken into account.
LOGDATA

After a brief look at MTRACE output, the LOGDATA output is a good
place to go.  SYS1.LOGREC is used as a place to record information
about both hardware and software errors.  This verb will show the
most recently recorded information, but there will not necessarily
be anything related to the problem at hand.  Look carefully at the
time stamp.  If an error occurred some time before (and several
minutes is a long time to a computer), it's  most likely
irrelevant.

Look for a real connection to the problem.  Naturally, if the dump
title information indicated a record would be found here, one can
be expected.  In this case the errorid (right before the jobname
in the formatted LOGREC record) will match the errorid right below
the dump title on the dump summary page.

The LOGREC record will have information about registers, completion
code, PSW, and the RTM2 work area.  Often, in fact, it will tell
all that is needed to locate the problem.

Much of the information obtainable from the RTM2WA is the same as
that in the SDWA.  They can be cross-checked against each other for
verification.

PRINT CURRENT Output

Another avenue to pursue is the PRINT CURRENT output.  This prints
TCB's for current tasks in the system.  The completion codes of the
TCB's (at the far left of the first line of the formatted TCB) can
be scanned for non-zero completion codes.  Beware, however, of
assuming that any abnormal completion code is necessarily related
to the current problem.  After all, usually when a problem program
abends, normal recovery occurs and there is no system problem
whatsoever.

However, when a relevant failing TCB is located, valuable
information may be obtained, particularly if for some reason this
information is not available elsewhere in the dump.  The RTWA
pointer will point to the RTM2WA if it exists, and the RB chain
will give information that is similar to what might be found in an
application dump, with the WLIC field showing SVC's invoked.
(Remember SVC numbers can be looked up in Volume 1 Chapter 5 of the
Debugging Handbook.)  The RSV area at PRB+60 will give a program
name in EBCDIC, and the RTPSW field on the first line of the
formatted PRB will preserve information about the program check
that caused the abend (refer to page 23 of the SEARS card).  The
OPSW field in the PRB will give you information about the resume
PSW; at times this may be necessary.  The CDE's, as in an
application dump, will give the information about programs actually
loaded by this task.

LPAMAP And NUCMAP

The LPAMAP and NUCMAP parameters map out locations of modules in
the LPA and nucleus.  This can sometimes be useful to locate a
failure that can't otherwise be found.  Does the PSW (or another
address that's become of interest for good reason) point to an area
in one of these maps?

TRACE

The TRACE parameter prints the most recent entries in the system
trace table.  This table should always be read from the bottom up
as the most recent entries are last.  The system trace table
contains the CPUID, the ASID, the TCB address, the type, and the
unique fields mapped in the Debugging Handbook.  As mentioned
before, program checks can be looked up in the SEARS card.

Near an I/O interrupt, a SRB should be seen being dispatched.
Frequently, an entry will be seen which says:  "TRACE DATA IS NOT
AVAILABLE FROM ALL PROCESSORS BEFORE (or AFTER) THIS TIME."  Ignore
it.  An "*" indicates a significant entry.  Frequently this tells
why the dump was produced.  In any case, the trace table can give
a good idea of what was going on in the system.

Conclusion

So, now you know some places to look for meaningful information in
that tall heap of paper.  In closing, remember two things:

o Don't look for something interesting; look for something relevant
to your problem.  MVS is very interesting, and you can chase things
that are not related to your problem all day.

o Don't be over-awed and over-complicate your task.  Most often you
may not need to look at all the elements I've mentioned.

True, system dumps can be really tough; but with an idea of where
to start and a few techniques, they are often not much harder to
shoot than the average applications dump