mirror of
https://github.com/PDP-10/stacken.git
synced 2026-02-10 01:59:56 +00:00
4356 lines
164 KiB
Plaintext
4356 lines
164 KiB
Plaintext
@@ran_quest_help
|
|
|
|
Part of Instruct consists of a series of sixty questions. The questions
|
|
pertain to the system event files (ERROR.SYS and ERRLOG.SYS), the Spear
|
|
Library dialogs, and the Spear Library reports.
|
|
|
|
The Random Question feature is primarily a Course Administrator's tool.
|
|
It allows the Course Administrator to randomly select a few questions
|
|
that will help determine a student's progress. If the student is able
|
|
to answer 8 out of 10 random questions correctly, then chances are he
|
|
(or she) understands how to use the Spear Library. If not, then perhaps
|
|
a little more study time is needed.
|
|
|
|
Students can also use the Random Question feature as a self evaluation
|
|
tool. To do so, enter a random number in the range of 1 to 50. Instruct
|
|
will dispatch to a corresponding random question. Answer the question to
|
|
the best of your knowledge. Instruct will evaluate your answer and print
|
|
an approprate message.
|
|
|
|
At that point you can type: RANDOM and select another random question.
|
|
@@quest_help
|
|
|
|
You are participating in a teaching dialog informally referred to as the
|
|
"Rhetorical Approach to Learning". The approach involves a statement
|
|
about a subject, in this case the Spear Library. You are to determine
|
|
whether the statement is True or False.
|
|
|
|
If your answer is correct you will receive a short message and then go
|
|
on to the next statement. If your answer is incorrect, then the correct
|
|
answer will be explained and the statement will be repeated. If you are
|
|
not sure whether the statement is True or False, you can press the
|
|
RETURN key and the correct answer will be explained.
|
|
|
|
In addition to the True, False, and RETURN key response, you can type
|
|
NEXT if you want to skip to the next statement. You can also press the
|
|
BACKSPACE key if you want to return to the menu.
|
|
@@ans_help
|
|
|
|
You have just answered a question either correctly or incorrectly. You
|
|
now have three choices. You can:
|
|
|
|
1. Press the RETURN key. If you answered the question correctly you
|
|
will continue on to the next sequential question. If, however, you
|
|
answered the question incorrectly, then the question will be
|
|
repeated.
|
|
|
|
2. Type NEXT to continue on to the next sequential question regardless
|
|
of whether you answered the last question correctly or not.
|
|
|
|
3. Press the BACKSPACE key to repeat the last question regardless of
|
|
whether your answer was correct or not.
|
|
|
|
Your response please:
|
|
@@no_help
|
|
|
|
There really isn't anyway that we can help you at this point. Press the
|
|
BACKSPACE key and try reading the text again. If it still doesn't make
|
|
sense, then contact:
|
|
|
|
The Spear Team
|
|
MRO1-1 / M2
|
|
|
|
Sorry
|
|
@@text_help
|
|
|
|
Instruct is frame oriented. That is, it displays one frame or block of
|
|
information at a time. After you have read the frame you can:
|
|
|
|
1. Press the RETURN key to proceed to the next frame of information.
|
|
|
|
2. Press the BACKSPACE key to review the previous frame of information.
|
|
|
|
3. Type MENU if you want to go back to the subject menu.
|
|
@@menu_help
|
|
|
|
Instruct is organized around a hierarchy of subject menus. The menus
|
|
allow you to use Instruct as a reference tool. The top item on the
|
|
menu (item 0) introduces the subjects and explains there relationship.
|
|
The remaining items are subjects. You can select any item on the menu
|
|
by typing the number that corresponds to the item. You can also press
|
|
the RETURN key to automatically proceed to the first subject on the
|
|
menu. If you want to go back to the previous menu in the hierarchy you
|
|
can type MENU.
|
|
@@fwd_trans_help
|
|
|
|
Instruct is organized around a hierarchy of subject menus. You can use
|
|
the RETURN key feature to sequence through the subjects listed on the
|
|
menu. Each time you move from one subject to a other you will be notified.
|
|
At this point you can choose to go on by pressing the RETURN key, or you
|
|
can choose to go back to the menu and select a different subject by typing
|
|
MENU.
|
|
@@rev_trans_help
|
|
|
|
Instruct is designed in such a way that you can go forward and backward
|
|
through the subject matter. Each time you move from one subject to
|
|
another you will be notified. In this case you were notified that you
|
|
were about to back into to previous subject on the menu. At this point
|
|
you can:
|
|
|
|
1. Type MENU to go back to the subject menu.
|
|
|
|
2. Press the RETURN key to go back to where you came from.
|
|
|
|
3. Press the BACKSPACE key, or type/REVERSE to continue backing up.
|
|
However, if the subject that you are backing into required multiple
|
|
frames of text to explain, then you will back into the last frame.
|
|
|
|
4. Type BEGIN to backup to the first frame of the subject that you
|
|
are backing into.
|
|
@@ran_quest_res_error_msg
|
|
This is the Random Question response error message. The number that you
|
|
entered is not within the range of 1 to 50.
|
|
@@text_res_error_msg
|
|
This is the text response error message. Instruct displays one page of
|
|
text at a time. After you have read the text you can:
|
|
|
|
1. Press the RETURN key to go on to the next page.
|
|
|
|
2. Press the BACKSPACE key or type /R to go back to the previous page.
|
|
|
|
3. Type MENU to go back to the menu and select another subject.
|
|
|
|
4. Type /B to return to the Spear prompt. If you are using a student
|
|
ID, and if you specify that ID at the Instruct prompt, you will
|
|
return to the page that you were at when you typed /B.
|
|
|
|
5. Type anything else and you will get this message.
|
|
@@menu_res_error_msg
|
|
This is the menu response error message. Instruct uses a hierarchy of
|
|
menus. The menus allow you to use Instruct as a quick reference tool.
|
|
At a menu you can:
|
|
|
|
1. Type the number on the menu that corresponds to the subject that you
|
|
are interested in.
|
|
|
|
2. Type MENU to go back to the previous (higher level) menu.
|
|
|
|
3. Type /B to return to the Spear prompt. If you are using a student
|
|
ID, and if you specify that ID at the Instruct prompt, you will
|
|
return to the page that you were at when you typed /B.
|
|
|
|
4. Press the BACKSPACE key or type /R. You will get a message stating
|
|
that you are about to back into the Introduction to the menu.
|
|
|
|
5. Type anything else and you will get this message.
|
|
@@fwd_trans_res_error_msg
|
|
This is the forward response error message. You can sequence through
|
|
Instruct by pressing the RETURN key. If you do so, you will sequence
|
|
through an Introduction, followed by a menu, followed by the first
|
|
subject, followed by the second subject, etc. You will be notified each
|
|
time you move from one subject to another. At that point you can:
|
|
|
|
1. Press the RETURN key to continue sequencing through Instruct.
|
|
|
|
2. Press the BACKSPACE key or type /R to repeat the last page of text.
|
|
|
|
3. Type MENU to go back to the menu and select another subject.
|
|
|
|
4. Type /B to return to the Spear prompt. If you are using a student
|
|
ID, and if you specify that ID at the Instruct prompt, you will
|
|
return to the point that you were at when you typed /B.
|
|
|
|
5. Type anything else and you will get this message.
|
|
@@rev_trans_res_error_msg
|
|
This is the reverse-transition prompt/response error message.You are
|
|
sequencing through Instruct in a reverse direction. You were notified
|
|
that you are about to move in a reverse direction from one
|
|
subject to another. You can:
|
|
|
|
1. Press the RETURN key to begin sequencing in a forward direction.
|
|
|
|
2. Press the BACKSPACE key to continue going in a reverse direction.
|
|
|
|
3. Type BEGIN to go to the beginning of the subject.
|
|
|
|
4. Type MENU to go back to the menu and select another subject.
|
|
|
|
5. Type /B to return to the Spear prompt.
|
|
|
|
6. Type anything else and you will get this message.
|
|
@@ans_res_error_msg
|
|
This is a response error message. Your response does not match the list
|
|
of acceptable responses. For further information press the RETURN key,
|
|
then type: ? or HELP.
|
|
@@farewell
|
|
Instruct bids you farewell.
|
|
Type /Break to return to Spear.
|
|
@@course_admin
|
|
Spear Course Administrator and Student Guide
|
|
|
|
Course Description
|
|
|
|
The Instruct course consists of four main modules:
|
|
|
|
1. Fault Isolation Techniques - This module describes the nature of
|
|
intermittent faults and discusses some of the most common methods
|
|
used to isolate intermittent system and subsystem failures.
|
|
|
|
2. System Event File Organization and Content - This module describes
|
|
the overall organization and content of TOPS-10, TOPS-20, and
|
|
VAX/VMS system event files.
|
|
|
|
3. Spear Library Functions - This module explains how to use each
|
|
of the Spear maintenance functions: Retrieve, Summarize,
|
|
and Compute.
|
|
|
|
4. Guaranteed Uptime Program/NOTIFY - This module describes the GUP
|
|
service which ensures the highest level of reliability for your
|
|
system. This module also explains how to use NOTIFY to calculate
|
|
statistics and to log information related to system uptime.
|
|
@@course_admin_a
|
|
|
|
Each module consists of an introduction and a menu of subordinate
|
|
subjects. When appropriate, the subordinate subjects are further
|
|
broken down into introductions and menus. Thus, Instruct can be
|
|
used as both a tutorial and a reference tool.
|
|
|
|
If you want to use Instruct as a tutorial (i.e., sequence through
|
|
the course much as you would read a book) you can do so using the
|
|
RETURN key. You will proceed to the module introduction, then the
|
|
menu, then the first subject on the menu, followed by the next
|
|
subject, etc.
|
|
|
|
If you want to use Instruct as a reference tool, then instead of
|
|
pressing the RETURN key at the menu, select the subject number that
|
|
interests you. You will proceed directly to that subject. If, after
|
|
investigating the subject you want to return to the menu, type MENU.
|
|
@@course_map
|
|
Course Map
|
|
|
|
______________________________________
|
|
| Guaranteed Uptime Program/NOTIFY |
|
|
______________________________________
|
|
^ |--Applications
|
|
| |--Summarize
|
|
_________________________________ |--Compute
|
|
| Using the Spear Library |--------|--Retrieve
|
|
_________________________________ |--Klerr
|
|
^
|
|
|
|
|
_________________________________
|
|
| System Event Files |
|
|
_________________________________
|
|
^
|
|
|
|
|
______________________________________ ___________________________
|
|
| Course Administrator/Student Guide |-->| Troubleshooting |
|
|
______________________________________ ___________________________
|
|
@@course_map_a
|
|
The course map suggests a sequence to follow to learn about
|
|
Spear. This sequence reflects the following factors:
|
|
|
|
Spear processes the system event file and generates a number
|
|
of reports which are useful in supporting the system.
|
|
|
|
Spear allows the user to produce the following reports:
|
|
Summary of the system faults be device and time.
|
|
System reliability and uptime reports.
|
|
Dump of event log entries in multiple formats.
|
|
|
|
Spear also allows the user to maintain the event file, and
|
|
includes its own instruction package for its use.
|
|
@@feedback
|
|
Feedback is an important part of any system design. Technically,
|
|
feedback is defined as a representive sample of the output used
|
|
to control or correct the process.
|
|
|
|
The process, in this case, is The Spear Library. The output is the
|
|
ability of the Spear Library to help you evaluate system performance
|
|
and solve service related system problems. If you have any ideas or
|
|
suggestions for improving the usefulness of The Spear Library, please
|
|
contact:
|
|
|
|
|
|
Digital Equipment Corporation
|
|
The SPEAR Team MRO1-1 / M2
|
|
200 Forest Street
|
|
Marlboro, Mass. 01752
|
|
|
|
|
|
Thank you;
|
|
The Spear Team
|
|
@@random_question
|
|
The Random Question feature allows you to enter a random number in the
|
|
range of 1 to 50. Instruct will respond by presenting you with a random
|
|
question based on the course content.
|
|
|
|
This feature can be used by anyone who has a few minutes, and who would
|
|
like to pickup a few tidbits about the use of The Spear Library. The
|
|
feature can also be used by The Course Administrator as a tool to spot
|
|
check student progress.
|
|
|
|
After being informed that you have correctly answered a question, you
|
|
may select another random question by typing "RANDOM".
|
|
|
|
Type <return> if you wish to enter the random question mode.
|
|
@@spear_man
|
|
Using The Spear Manual
|
|
|
|
You can use The Spear Manual as a learning aid, a user's guide, or a
|
|
reference tool.
|
|
|
|
As a Learning Aid: Chapters 1, 2, and 3 provide an overview of the
|
|
Spear Library. They also provide background information required to
|
|
understand and use the Spear library.
|
|
|
|
As a User's Guide: Chapters 4 and 5 provide step-by-step procedures for
|
|
using the Spear functions; Retrieve, Summarize, and Compute.
|
|
The chapters explain, in detail, the command syntax and the response
|
|
parameters associated with each function.
|
|
|
|
As a Reference Tool: Chapter 6 and the appendices provide reference
|
|
material such as system event file formats, event record descriptions,
|
|
and examples of the report formats. This chapter and the appendices are
|
|
for reference only. They are not meant to be read from beginning to end.
|
|
@@R.T.cou_ovr_a
|
|
STOP - You are moving in a reverse direction through the menu. You are
|
|
about to back into the Course Administrator/Student Guide.
|
|
|
|
@@1.M.
|
|
|
|
Troubleshooting
|
|
|
|
Topic menu:
|
|
|
|
1. Attitude vs. Approach
|
|
|
|
2. The Formal Approach
|
|
|
|
3. The Systematic Approach
|
|
|
|
4. The Variable Approach
|
|
@@1.1.
|
|
Attitude vs. Approach
|
|
|
|
|
|
First and foremost; your success as a problem solver depends more on
|
|
your attitude than it does on your approach. Quite simply, if you
|
|
believe that you can (solve a particular problem), then you probably
|
|
will; if you believe that you can't, then you probably won't.
|
|
|
|
|
|
The only thing that a problem has going for it is your attitude.
|
|
Therefore, with the right attitude, you can solve almost any problem.
|
|
It's just a matter of time. Never give up and you'll never lose.
|
|
@@1.1.A.
|
|
Approach
|
|
|
|
The way you approach the solution to a problem will also, to a large
|
|
extent, determine your success as a problem solver. The more logical
|
|
and systematic your approach, the more successful you're likely to be.
|
|
|
|
Next on the menu are a couple of systematic problem solving approaches
|
|
that I think you will find to be both interesting and quite effective.
|
|
@@R.T.1.1.A.
|
|
STOP - You are moving in a reverse direction through the menu. You are
|
|
about to back into the Attitude vs. Approach section of the course.
|
|
|
|
Your response please:
|
|
@@1.2.
|
|
The Formal Approach
|
|
|
|
|
|
The Formal Approach consists of seven steps:
|
|
|
|
|
|
1. RESEARCH and DEFINE the problem (what is, or is not, happening)
|
|
|
|
2. VENTURE a testable educated guess (as to the cause of the problem)
|
|
|
|
3. SETUP a practical experiment (to test the educated guess)
|
|
|
|
4. PREDICT the result (before you conduct the experiment)
|
|
|
|
5. CONDUCT the experiment (keep an accurate set of notes)
|
|
|
|
6. EVALUATE the result (compare the actual and predicted results)
|
|
|
|
7. REFINE the definition and REPEAT the process (begining with step 2)
|
|
@@1.2.A.
|
|
Step 1 - RESEARCH and DEFINE the problem - If you're not familiar with
|
|
the system, begin your research at the Branch office. Look over the
|
|
records for the last couple of weeks. Try to get an idea of the size
|
|
and the application of the system. Also, find out when the system was
|
|
last serviced, by whom, and why.
|
|
|
|
When you first arrive on site take five or ten minutes to talk with the
|
|
customer, the operator, or anyone else that may be able to explain the
|
|
problem. Here's a partial list of the type of questions that you should
|
|
ask:
|
|
|
|
How serious is the problem ?
|
|
|
|
How long has it been going on?
|
|
|
|
Has the system ever had a problem like this before?
|
|
|
|
How has the system been performing lately?
|
|
|
|
Have there been any recent hardware or software changes lately?
|
|
@@1.2.B.
|
|
You can define the problem at the same time that you are doing the
|
|
research. Ask yourself three questions:
|
|
|
|
|
|
1. What is happening that shouldn't?
|
|
|
|
2. What is not happening that should?
|
|
|
|
3. What are the surrounding conditions?
|
|
|
|
The first two questions will help you identify the main error symptom.
|
|
The third question will help you identify the context or circumstances
|
|
that surround the symptom. That's important, because it's practically
|
|
impossible to solve a problem out of context.
|
|
@@1.2.C.
|
|
Once again, the questions to ask yourself when defining a problem are:
|
|
|
|
|
|
What is happening that shouldn't?
|
|
|
|
What is not happening that should?
|
|
|
|
What are the surrounding conditions?
|
|
|
|
|
|
The definition should be as complete as possible. It should also state,
|
|
in clear and concise terms, the major symptom and the conditions or
|
|
circumstances that surround that symptom. One more thing, and this is
|
|
important, you should write the definition down, at least in note form.
|
|
|
|
For example:
|
|
|
|
Def - 4 days/2020/256K/cache/TOPS-20(4.1)/UBANXM/freq:12-14 hrs.
|
|
@@1.2.D.
|
|
Or more formally:
|
|
|
|
During the last four days, the system, a 2020 with 256K and cache,
|
|
running TOPS-20 (4.1) has crashed about every 12 or 14 hours with
|
|
a UBANXM Bug Halt.
|
|
|
|
|
|
Note that the definition states only one main error symptom, UBANXM Bug
|
|
Halt. The rest of the information describes the conditions that surround
|
|
the error symptom (i.e., the context of the problem).
|
|
@@1.2.E.
|
|
Sometimes, however, a system will exhibit multiple error symptoms. In
|
|
such a case, each error symptom (including the surrounding conditions)
|
|
should be stated separately. This is important because, when you first
|
|
start working, you have no way of knowing, for sure, whether or not the
|
|
system actually has multiple problems.
|
|
|
|
|
|
Therefore, assume the worst case. If a system exhibits multiple error
|
|
symptoms treat each symptom separately. That way you will eliminate the
|
|
possibility of multiple errors compounding the problem solving process.
|
|
Also, if you separate multiple error symptoms, then you can investigate
|
|
the most obvious symptom first, which is sound troubleshooting practice.
|
|
@@1.2.F.
|
|
Review - The key points discussed so far are:
|
|
|
|
1. Talk to anyone who may know something about the problem.
|
|
|
|
2. DEFINE the problem. Find out exactly:
|
|
|
|
What is happening that shouldn't?
|
|
What is not happening that should?
|
|
What are the surrounding conditions?
|
|
|
|
3. Remember to get all the conditions and circumstances. It's next to
|
|
impossible to solve a problem out of context.
|
|
|
|
4. Write down the definition, at least in note form. Be clear, concise,
|
|
and as complete as possible.
|
|
|
|
4. Treat each error symptom as if it were a separate problem.
|
|
|
|
5. Attempt to solve the most obvious problem first.
|
|
@@1.2.G.
|
|
Step 2 - VENTURE a testable educated guess (TEG) as to the cause of the
|
|
problem. The truth of the matter is, when you first start out to solve
|
|
a problem, you can't know (for sure) what the cause is. Therefore, you
|
|
really don't have much of a choice; you have to begin with a guess.
|
|
|
|
Fortunately, if the guess is testable, it does not have to be accurate.
|
|
In fact, your first few guesses probably won't be accurate. But, if you
|
|
use this approach and your guesses are testable, then they will quickly
|
|
become accurate. In other words, they will either:
|
|
|
|
a) lead you directly to the cause of the problem, or
|
|
|
|
b) they will lead you to the realization that you could use some help.
|
|
|
|
Either way, you win.
|
|
@@1.2.H.
|
|
Here's a couple of testable educated guesses (TEGs) to go along with the
|
|
problem that was identified and defined earlier:
|
|
|
|
|
|
Def - 4 days/2020/256K/cache/TOPS-20(4.1)/UBANXM/freq:12-14 hrs.
|
|
|
|
|
|
TEG #1. A low voltage condition exists at one of the UBAs.
|
|
|
|
TEG #2. One of the Unibus cables is improperly seated.
|
|
@@1.2.I.
|
|
|
|
REMEMBER
|
|
|
|
TEGs don't have to be earth shattering.
|
|
|
|
But they do have to be testable.
|
|
@@1.2.J.
|
|
Step 3 - SETUP an experiment that will prove, or disprove, your TEG.
|
|
The experiment should be carefully thought out. You should make every
|
|
effort to ensure that it is a true, and accurate test of your guess.
|
|
Take your time. Make sure that your experiment is not inadvertently
|
|
testing something other than your TEG.
|
|
|
|
Here's why. If your experiment turns out to test something other than
|
|
your TEG, and you don't realize it, then you are liable to misinterpret
|
|
the result. Consequently, you may find yourself tripping down the Old
|
|
Garden Path.
|
|
@@1.2.K.
|
|
The Old Garden path, by the way, is a expression that refers to a
|
|
troubleshooting tangent, a lesson in pure frustration. The path or
|
|
tangent leads you away from the real cause of the problem, contributes
|
|
very little useful information, and consumes lots of valuable time
|
|
and effort.
|
|
|
|
|
|
So, give yourself a break. Don't take a chance on a trip down the garden
|
|
path. Instead, use the time to carefully think out your experiment.
|
|
@@1.2.L.
|
|
The experiment doesn't have to be complex or elaborate. Let's go back to
|
|
the problem definition and TEGs that we used earlier, and see if we can
|
|
devise a couple of simple experiments that will prove, or disprove, the
|
|
TEGs.
|
|
|
|
|
|
Def - 4 days/2020/256K/cache/TOPS-20(4.1)/UBANXM/freq:12-14 hrs.
|
|
|
|
TEG #1. A low voltage condition exists at one of the UBAs.
|
|
Exp #1. Use up a DVM to test the voltage at each UBA.
|
|
|
|
TEG #2. One of the Unibus cables is improperly seated.
|
|
Exp #2. Clean and reseat each cable in the Unibus.
|
|
@@1.2.M.
|
|
Review - The key points discussed so far are:
|
|
|
|
1. Research and Define the problem (in writing). Find out exactly:
|
|
a) What is happening that shouldn't?
|
|
b) What is not happening that should?
|
|
c) What are the surrounding conditions?
|
|
|
|
2. Treat each error symptom as if it was a separate problem. Then,
|
|
select the most obvious problem and work on it.
|
|
|
|
3. Venture a testable educated guess (TEG) as to what might be causing
|
|
the problem.
|
|
|
|
4. Setup an experiment that will either prove, or disprove your guess.
|
|
Take your time. Make sure the experiment is a valid test. If it's
|
|
not, you may waste a lot of time chasing a tangent.
|
|
|
|
If you've been following this course, right around now you should be
|
|
getting some idea of how effective a problem solving approach such as
|
|
this can be. Essentially, it is a systematic process of elimination.
|
|
Properly used it will isolate and ultimately eliminate virtually any
|
|
problem a system can develop. It's just a matter of time.
|
|
@@1.2.N.
|
|
Step 4 - PREDICT the result of the experiment before you conduct it.
|
|
The purpose of this step is to double check the validity of your
|
|
experiment. The prediction should be based on the assumption that:
|
|
|
|
1. Your TEG or guess is absolutely correct.
|
|
|
|
2. Your experiment is a true and valid test of your TEG.
|
|
|
|
|
|
Both of these assumptions will be verified later in Step 6.
|
|
@@1.2.O.
|
|
As trivial as this step may seem, it should never be skipped. Nor should
|
|
you ever leave it up to "maybe" type thinking:
|
|
|
|
Maybe...this will happen (or) Maybe...that will happen
|
|
|
|
|
|
When it comes to your experiment and the predicted result, "maybe" type
|
|
thinking leads to: "Gee.. that's interesting; wonder what it means" type
|
|
curiosity. And that my friend, will lead you right down Old Garden Path.
|
|
|
|
|
|
Therefore, if you decide to use this problem solving approach, keep in
|
|
mind that your prediction should be explicitly stated and well thought
|
|
out. Don't get tricked into going off on a wild turkey chase.
|
|
@@1.2.P.
|
|
Getting back to our example, let's add a couple of predictions:
|
|
|
|
|
|
Def - 4 days/2020/256K/cache/TOPS-20(4.1)/UBANXM/freq:12-14 hrs.
|
|
|
|
TEG #1. A low voltage condition exists at one of the UBAs.
|
|
Exp #1. Use up a DVM to test the voltage at each UBA.
|
|
Pre #1. The voltage at one of the UBAs will be out of tolerance.
|
|
|
|
TEG #2. One of the Unibus cables is improperly seated.
|
|
Exp #2. Clean and reseat each cable in the Unibus.
|
|
Pre #2. One of the cables will be loose or dirty.
|
|
@@1.2.Q.
|
|
Well, that's it for the hard part. The last three steps are relatively
|
|
simple and straight-forward. But before we go on, let's quickly review
|
|
the main points:
|
|
|
|
1. Research and Define the problem. Find out:
|
|
|
|
a) What is happening that shouldn't?
|
|
b) What is not happening that should?
|
|
c) What are the surrounding conditions?
|
|
|
|
2. Treat each error symptom as if it were a separate problem.
|
|
|
|
3. Venture a testable educated guess (TEG) as to the cause of the
|
|
problem.
|
|
|
|
4. Setup an experiment that will either prove, or disprove your TEG.
|
|
|
|
5. Predict the result of the experiment in advance. Assume that your
|
|
TEG is correct and your experiment is valid. Be explicit. State (in
|
|
writing) exactly what you expect to happen.
|
|
|
|
6. Avoid "maybe" type thinking. It's liable to get you into trouble.
|
|
@@1.2.R.
|
|
Step 5 - CONDUCT the experiment - This is the most exciting step in the
|
|
formal problem solving process. Here's all you have to do. Either:
|
|
|
|
1. Check the voltage at each UBA.
|
|
|
|
2. Clean and reseat each UBA cable.
|
|
|
|
|
|
Unfortunately, this is where a lot of people fall down. They're
|
|
overwhelmed by the task. So they tend to put it off. After all checking
|
|
the voltage at each UBA, or cleaning and reseating each UBA cable is
|
|
not a five minute job.
|
|
@@1.2.S.
|
|
But, if you properly set-up the experiment, then half the job is done.
|
|
Now if it's going to take a while to conduct the experiment, set a time
|
|
limit. Don't rush, but try to estimate how long it will take. You might
|
|
be surprised to find that, once you are set up, it only takes a few minutes
|
|
to check the voltage at each UBA. So if you've five UBAs to check, you
|
|
could easly be done in ten minutes. That's not so bad.
|
|
@@1.2.T.
|
|
When it comes to cleaning and reseating cables, however, that can be a
|
|
large undertaking. At two or three minutes per connector, that could
|
|
require forty five minutes or an hour to complete. At this point you
|
|
might want to revise your experiment: you might decide to clean and
|
|
reseat a third of the UBA cables, and see if that corrects the problem.
|
|
|
|
There is a trade-off involved here. You must consider; the seriousness
|
|
of the problem, the frequency of recurrence, and the amount of time
|
|
and effort necessary to prove (or disprove) your TEG. The decision is
|
|
subjective, and entirely up to you. The rule of thumb here is: Do what
|
|
you think is right.
|
|
@@1.2.U.
|
|
Step 6 - EVALUATE the result - After conducting the experiment compare
|
|
predicted result, with the actual result. If they match, then you have
|
|
accomplished one of two things.
|
|
|
|
1. You have either identified the cause of the problem, or
|
|
|
|
2. You have gathered some new, fairly reliable, information that
|
|
you can use to refine the problem definition.
|
|
|
|
|
|
|
|
If the predicted result and the actual result do not match, however,
|
|
then there is a conflict. Either the experiment tested something other
|
|
than your TEG, or your understanding of the experiment (the prediction)
|
|
was incorrect. In either case you should STOP IMMEDIATELY.
|
|
@@1.2.V.
|
|
You must figure out which was in error; the experiment, or the predicted
|
|
result. If, after some thought, you decide that the predicted result was
|
|
in error, that's ok. It means that the experiment was, in fact, a valid
|
|
test of your TEG. And, therefore, the result can be used with confidence
|
|
to refine the definition of the problem.
|
|
|
|
If, however, you discover that the experiment was in error; that is, the
|
|
experiment was not a valid test of your TEG, then be very careful. You
|
|
should reconsider the entire situation and either; revise the experiment
|
|
in such a way that it is a valid test of your hypothesis, or scrap the
|
|
whole thing and start over again.
|
|
@@1.2.W.
|
|
Now you can see the importance of predicting the result of an experiment
|
|
before you conduct it. If you are unable to determine whether or not the
|
|
experiment was, in fact, a valid test of your TEG then, you're liable to
|
|
"assume" that it was. And that kind of an assumption may lead you right
|
|
down the Old Garden Path.
|
|
|
|
The point here is: if you know what you expect to happen, then you are
|
|
much more likely to recognize cases where the experiment is not testing
|
|
what you think it is.
|
|
@@1.2.X.
|
|
CHANGES - If an experiment requires that you change the system in any
|
|
way (swap a cable, perform an adjustment, exchange a module, etc.) be
|
|
sure that you can restore the system to its original state should you
|
|
need to. One fool proof way of doing that is to keep notes. Notes don't
|
|
forget.
|
|
|
|
Surely, in the past at least, some very successful technicians didn't
|
|
keep any notes at all. But that doesn't mean that they shouldn't have;
|
|
it only means that they didn't. And that's too bad, because that means
|
|
that they were using part of their brain muscle to recall facts, there-
|
|
fore, less of their brain muscle was available to think about solving
|
|
the problem. Besides, once you get used to it, thinking is much more fun
|
|
than recalling facts. Don't you think?
|
|
@@1.2.Y.
|
|
Either you disagree, or you're not thinking.
|
|
@@1.2.Z.
|
|
So do I.
|
|
@@1.2.A1.
|
|
Back to CHANGES. If you change the system and the change doesn't correct
|
|
the problem, then you should restore the system to its original state
|
|
as soon as possible. If you don't, then you should realize that you are
|
|
running the risk of introducing new problems into the system and thus,
|
|
compounding the situation.
|
|
@@1.2.B1.
|
|
NEW SYMPTOMS - Finally, if you restore the system to its original state
|
|
and find that the symptoms have changed, STOP. Don't go on until you are
|
|
satisfied that you know the reason WHY the symptoms changed.
|
|
|
|
|
|
Remember
|
|
|
|
Symptoms Change For A Reason
|
|
@@1.2.C1.
|
|
Step 7 - REFINE the definition and REPEAT the process beginning with
|
|
Step 2 - Venture a TEG. This is the last step. Append the TEG, the
|
|
experiment, and the result of the experiment to the problem definition.
|
|
Even if the experiment disproved the TEG, append the information to
|
|
the definition. At least you know one thing that is not causing the
|
|
problem.
|
|
|
|
Then, once again, ask yourself:
|
|
|
|
1. What is happening that shouldn't?
|
|
|
|
2. What is not happening that should?
|
|
|
|
3. What are the surrounding conditions?
|
|
@@1.2.D1.
|
|
Take your time appending the new information to the problem definition.
|
|
Follow the same guide lines that you followed when you first constructed
|
|
the definition; be clear, be concise, and be as accurate and complete as
|
|
possible. That's one of the keys to using this problem solving approach
|
|
successfully.
|
|
|
|
|
|
Finally, close the loop. In other words, venture a new TEG, setup a new
|
|
experiment to test the TEG, predict the result, conduct the experiment,
|
|
evaluate the result, refine the definition, and continue to close the
|
|
loop. Eventually, if you use this approach, one of two things will
|
|
happen. Either:
|
|
|
|
1. you will identify and ultimately eliminate the problem, or
|
|
|
|
2. you will flat run out of TEGs, time, or both and end up calling for
|
|
support. But even a call for support is a TEG of fashions, because
|
|
you won't know until the end, whether or not you needed support from
|
|
the beginning.
|
|
@@1.2.E1.
|
|
One last word before we go on to the final summary. Earlier, we talked
|
|
about Attitude vs. Approach. During the discussion the statement was
|
|
made "never give up and you will never lose". The statement does not
|
|
mean never call support.
|
|
|
|
Support is a tool. It's there to help you do your job more efficiently.
|
|
Don't be afraid to use it. But, please be prepared to describe the exact
|
|
problem, what you've done, why, and what the results were. It will save
|
|
a lot of time, and you will get much better service.
|
|
@@1.2.F1.
|
|
Never give up really means, never let a problem go without finding out
|
|
what the cause was, and how the cause was finally isolated. Even if you
|
|
have to leave a problem (i.e., let someone else take over) always follow
|
|
up. Get back to the individual that solved the problem and find out what
|
|
the cause was and how he or she arrived at that conclusion.
|
|
|
|
That way, in your mind, no problem will go unsolved. And that's where
|
|
the solution takes place, in the mind. So never give up, never let a
|
|
problem go unsolved, and you'll never lose. It's as simple as that.
|
|
@@1.2.G1.
|
|
Final Summary:
|
|
|
|
1. RESEARCH and DEFINE the problem. Find out exactly:
|
|
a) What is happening that shouldn't?
|
|
b) What is not happening that should?
|
|
c) What are the surrounding conditions?
|
|
|
|
2. VENTURE a testable educated guess (TEG) as to the cause.
|
|
|
|
3. SETUP a practical experiment that will prove, or disprove, your TEG.
|
|
|
|
4. PREDICT the result before conducting the experiment. Know what you
|
|
expect to happen. Don't leave it up to "maybe" type thinking.
|
|
|
|
5. CONDUCT the experiment (keep an accurate set of notes) If you change
|
|
the system, restore it to its original state before you go on.
|
|
|
|
6. EVALUATE the result (predicted vs. actual). If the symptoms changed,
|
|
they changed for a reason. Find out why before you go on.
|
|
|
|
7. REFINE the definition and REPEAT the process (begining with step 2)
|
|
Tighten the loop. It's simply a matter of time and TEGs.
|
|
@@F.T.1.3.
|
|
That concludes the explanation of the Formal Troubleshooting Approach.
|
|
Next on the menu is the Systematic Approach.
|
|
@@R.T.1.2.G1
|
|
STOP - You are moving in a reverse direction through the menu. You are
|
|
about to back into the Formal Troubleshooting Approach.
|
|
|
|
Your response please:
|
|
@@1.3.
|
|
Systematic Substitution
|
|
|
|
Some old school hard-line purist technicians may not agree, but under
|
|
certain circumstances systematic substitution (of spare parts) is a
|
|
perfectly valid troubleshooting approach.
|
|
|
|
For example, let's assume that you are at home, working
|
|
in your cellar. Furthermore, let's assume that you are using a circular
|
|
saw to cut a 2 x 8 piece of oak planking.
|
|
@@1.3.A.
|
|
Suddenly, the saw binds, the lights dim and then they go out. From the
|
|
symptoms (the lights are out) and the conditions (at the time of failure
|
|
the saw was operating under a heavy load) you might logically conclude
|
|
that a fuse had blown.
|
|
|
|
Now, let's assume that you light a match and find your way to the fuse
|
|
box. Upon opening the box you discover a package of spare fuses and a
|
|
wiring diagram of the house. You decide to light another match. This
|
|
time you discover that the fuse box contains six 15amp fuses - two rows
|
|
of three fuses. At the same time, however, you also realize that the
|
|
match does not provide enough light for you to determine which fuse is
|
|
blown.
|
|
@@1.3.B.
|
|
At this point you have, roughly, six options:
|
|
|
|
1. You can stall for time hoping that the problem will disappear. This,
|
|
however, is not a very practical solution because: problems don't
|
|
just happen, they are caused; and, although some problems may go
|
|
away temporally, very rarely do they just disappear. Therefore, the
|
|
best approach is to identify and eliminate the cause of the problem.
|
|
So much for the wishful thinking approach.
|
|
@@1.3.C.
|
|
2. You can call an electrician, but that could be very expensive.
|
|
@@1.3.D.
|
|
3. You can go get a flashlight (if you can find one that works) and use
|
|
it to identify the blown fuse.
|
|
@@1.3.E.
|
|
4. You can light a couple more matches, study the wiring diagram and
|
|
attempt to figure out which fuse is blown. But let's say that your
|
|
ability to read an electrical wiring diagram is a bit rusty. So this
|
|
could be quite time consuming, and the results are not certain.
|
|
@@1.3.F.
|
|
5. You can use the spares and randomly substitute fuses until the
|
|
lights come on (i.e., the symptoms go away). This is a very risky
|
|
approach, however, because you could lose track of which fuses you
|
|
did and did not substitute. Thus, you could accidently overlook the
|
|
blow fuse and conclude that something else was causing the problem.
|
|
@@1.3.G.
|
|
6. You can use the spares and systematically substitute each fuse until
|
|
the lights come on. You might choose to begin with the upper left
|
|
most fuse and substitute left-to-right top-to-bottom. If, in fact,
|
|
the problem is being caused by blown fuse, then sooner or later the
|
|
lights will come back on.
|
|
@@1.3.H.
|
|
Now let's say that, after careful consideration of all six options, you
|
|
reject wishful thinking and random substitution because they are both
|
|
risky and impractical. Next, you dismiss the possibility of calling an
|
|
electrician because it seems unnecessary and it could be very expensive.
|
|
|
|
Finally, you eliminate using the wiring diagram to figure out exactly
|
|
which fuse is blown. The idea is feasible and even tempting, but under
|
|
the circumstances it's just too time consuming. Remember, you want to
|
|
get the saw back on line so you can finish cutting that piece of wood.
|
|
@@1.3.I.
|
|
That leaves you with two options; either go get a working flashlight,
|
|
or try the systematic substitution approach. If you opt for the flash-
|
|
light, that's a trip upstairs, a few minutes locate a flashlight, a trip
|
|
back down cellar and 30 seconds to replace the blown fuse. Total time
|
|
expended approximately five minutes. (That does not include the time
|
|
required to return the flashlight so that you can find it the next time
|
|
you need it.)
|
|
|
|
But suppose that instead of opting for the flashlight, you opted for the
|
|
systematic substitution approach and, on the fourth try you locate the
|
|
blown fuse. Total time expended (at 30 seconds per fuse): 2 minutes with
|
|
no trips involved. Not bad.
|
|
@@1.3.J.
|
|
One last supposition; suppose that instead of working in your cellar,
|
|
you had just arrived on site. Instead of a dead power line you're faced
|
|
with a failing subsystem. Instead of six fuses in a box, the subsystem
|
|
consists of four modules, a cable and a power supply. Finally, instead
|
|
of a box of spare fuses and a wiring diagram you have a spares kit, a
|
|
scope, and a set of prints. But the diagnostics that you need are not
|
|
on site. The same six options apply:
|
|
|
|
1. You can stall for time wishing the problem will go away.
|
|
2. You can call for support.
|
|
3. You can go back to the office and get the diagnostics that you need.
|
|
4. You can study the print set and try to figure out what's wrong.
|
|
5. You can randomly substitute the spares and hope to solve the problem.
|
|
6. You can systematically substitute the spares and quickly identify the
|
|
cause. There are, however, some things that you should be aware of:
|
|
@@1.3.K.
|
|
A. You must approach the substitution process systematically. If you
|
|
don't, you'll become confused and end up resorting to the random
|
|
method of substitution. The random method is so prone to error that
|
|
it's just not worth it.
|
|
|
|
B. If there are more than a few modules involved, keep notes. You may
|
|
not always need them, but when you do you'll find that they're worth
|
|
their weight in gold.
|
|
|
|
C. If you substitute a module and the problem doesn't go away, replace
|
|
the original module immediately. If you don't, you'll run the risk
|
|
of introducing new problems into the system. Spares tend to have a
|
|
higher failure rate than modules that have been in use for awhile.
|
|
|
|
D. If you substitute a module and the symptoms change, STOP. Replace the
|
|
original module. If the original symptoms return, then chances are
|
|
you have come upon a bad spare. Try it one more time. If the results
|
|
are the same; Tag the spare right away. If you don't, you're likely
|
|
to forget and reliable spares are a must.
|
|
@@1.3.L.
|
|
E. If you substitute a module and the symptoms change, and they remain
|
|
changed even after you replace the original, STOP. Chances are you
|
|
inadvertently changed something and didn't realize it. Retrace every
|
|
step. Symptoms change for a reason. Find the reason. Don't run the
|
|
risk of compounding the problem.
|
|
|
|
F. If you substitute a module and it seems to solve the problem don't
|
|
stop. Confirm the fix. Return the original module. The symptoms
|
|
should appear. If they don't, then you can't be sure that you found
|
|
the problem. If they do, then you can be pretty sure that you got it.
|
|
But, don't stop yet. Run the diagnostics one more time. Make sure
|
|
that no new problems have crept into the system. Finally, hang around
|
|
a few minutes, make sure that the equipment comes back on line ok.
|
|
|
|
G. Back to the case where the spare seemed to correct the problem, but
|
|
when you replaced the original module to confirm the fix, everything
|
|
seemed to work fine. In this case you may, or may not have identified
|
|
the cause of the problem. You don't know. So, leave the spare in the
|
|
system, tag the suspect module as potentially intermittent, and save
|
|
your notes. Such situations call for a different type of confirmation
|
|
technique.
|
|
@@1.3.M.
|
|
The technique is called the subjective time window. To use it, you must
|
|
establish a period of time during which you will monitor the problem.
|
|
Usually a week is adequate, if the problem was solid. If the problem was
|
|
intermittent, however, then you must determine the rate, or frequency of
|
|
the failure, triple it (at least), and use that as the period of time
|
|
during which you will monitor the problem.
|
|
|
|
If the problem does not recur during the time window that you set up,
|
|
then you can assume that you solved it. Tag the suspect module as
|
|
intermittent, return it for repair, file your notes, and close out
|
|
the paper work. If, however, the problem does recur, then you're all
|
|
set, replace the original module, update your notes, and pick up
|
|
where you left off. That's all there is to it. As some of the old
|
|
school hard-liners would say; Hey, at least you know what it's not,
|
|
and that's worth something.
|
|
@@F.T.1.4.
|
|
That concludes the explanation of the Systematic Substitution Approach.
|
|
Next on the menu is the Variable Approach.
|
|
@@R.T.1.3.M.
|
|
STOP - You are moving in a reverse direction through the menu. You are
|
|
about to back into the Systematic Substitution Approach.
|
|
|
|
Your response please:
|
|
@@1.4.
|
|
The Variable Approach
|
|
|
|
This short story was once told by a senior field service engineer to
|
|
illustrate a VERY important point about using the variable approach
|
|
to isolate the cause of an intermittent failure. The story is about a
|
|
telephone conversation he had while working for another company (not
|
|
to be mentioned).
|
|
@@1.4.A.
|
|
"At first, the diagnostic only failed every hour or two. So I performed
|
|
all the standard checks and adjustments. The problem got a little worse,
|
|
but it still wasn't solid. So then, I decided to vary the voltage and
|
|
clock margins awhile. That helped some. I pulled out a marginal module
|
|
and the symptoms changed so I knew I was getting closer.
|
|
|
|
Then I thought, maybe the problem had something to do with temperature,
|
|
so I blocked the fans for a few minutes. I just wanted to see if varying
|
|
the temperature would have any effect. Finally, I tapped around with the
|
|
back of my screw driver awhile. That really helped. I found a couple of
|
|
vibrational modules. But now I seem to have a new problem - I can't even
|
|
load the diagnostics. What do you think is wrong?"
|
|
@@1.4.B.
|
|
At that point, the senior engineer would bellow; "Now that's what I call
|
|
a dumb question - Obviously the guy beat the poor thing to death."
|
|
|
|
|
|
The story served its purpose. Clearly, it illustrates the problem with
|
|
indiscriminately using the variable approach. That is, if you're not
|
|
really careful, you're likely to cause more problems than you solve.
|
|
@@1.4.C.
|
|
The reason is: systems frequently operate in a controlled environment
|
|
for long periods of time. As a result the environmental operating range
|
|
of the system narrows. Normally, this is not a problem. As long as the
|
|
environment remains relatively stable the system will run indefinitely.
|
|
|
|
Keep in mind that if an intermittent problem is, in fact, being caused
|
|
by an environmentally sensitive component then, just a slight variation
|
|
in voltage, temperature or clock speed should be enough to aggravate it.
|
|
The rule of thumb is; BE CAREFUL.
|
|
@@1.4.D.
|
|
After all, if you had an intermittent problem, would you want a doctor
|
|
to double your heart rate in an effort to determine whether or not the
|
|
problem had something to do with your circulatory system? Probably not,
|
|
because a lot of working parts could get damaged in the process.
|
|
|
|
Keep that in mind next time you use the variable approach to isolate an
|
|
elusive intermittent system problem. It will, in the long run, save you
|
|
a lot of unnecessary grief and irritation. And don't forget the rule of
|
|
thumb:
|
|
|
|
BE CAREFUL
|
|
@@ts_end
|
|
Well, that concludes the Troubleshooting section of the course. We hope
|
|
that you found it useful. Also, if you have any comments or know of any
|
|
other troubleshooting approaches that you think should be added to this
|
|
section please get in touch with us. We're listed under FEEDBACK on the
|
|
main course menu.
|
|
Thank You
|
|
@@2.0.
|
|
System Event Files
|
|
(Overview)
|
|
|
|
Most operating systems maintain a system event file. The event file is
|
|
used to record information about certain events that happen within the
|
|
system (e.g., system reloads, configuration changes, hardware and
|
|
software detected errors, etc.).
|
|
|
|
The classification and type of information that is recorded in a system
|
|
event file is unique to the operating system maintaining the event file.
|
|
For example:
|
|
|
|
TOPS-10 supports approximately 55 event categories.
|
|
|
|
TOPS-20 supports approximately 25 event categories.
|
|
|
|
VAX/VMS supports approximately 20 event categories.
|
|
|
|
@@2.0.A.
|
|
The event categories are listed on the back of the Spear Reference card.
|
|
File Structures - There is nothing special about the file structure
|
|
associated with a system event file.
|
|
|
|
a. If the event file is maintained by a TOPS-10 operating system,
|
|
then it conforms to the standard TOPS-10 file structure. For
|
|
further information about the TOPS-10 file structure refer to
|
|
The TOPS-10 Software NoteBook 17 (Monitor Table Descriptions).
|
|
|
|
b. If the event file is maintained by a TOPS-20 operating system,
|
|
then it conforms to the standard TOPS-20 file structure. For
|
|
further information about the TOPS-20 file structure refer to
|
|
The TOPS-20 Software NoteBook 16 (Monitor Table Descriptions).
|
|
|
|
c. If the event file is maintained by a VAX/VMS operating system,
|
|
then it conforms to the standard VAX/VMS file structure. For
|
|
further information about the VAX/VMS file structure refer to
|
|
The VAX/VMS Software Support Notebook.
|
|
@@R.T.2.0.A.
|
|
STOP - You are moving in a reverse direction through the menu. You are
|
|
about to back into the System Event File Overview.
|
|
@@2.M.
|
|
System Event Files
|
|
|
|
Topic Menu:
|
|
|
|
1. Overview
|
|
|
|
2. TOPS-10 System Event Files
|
|
|
|
3. TOPS-20 System Event Files
|
|
|
|
4. VAX/VMS System Event Files
|
|
|
|
5. DEFINE.LIS
|
|
@@define_lst
|
|
DEFINE.LIS is a text file that describes the hardware and/or software
|
|
status that is saved for each entry type in both the TOPS-10 and the
|
|
TOPS-20 system event file. DEFINE.LIS is normally stored in the system
|
|
documentation area. To obtain a copy of the file type:
|
|
|
|
PRINT<DOC>DEFINE.LIS<cr>
|
|
|
|
If DEFINE.LIS is not in the system documentation area you can get a
|
|
copy from the Spear distribution tape. There are two procedures; one
|
|
for TOPS-10, the other for TOPS-20.
|
|
@@pri_define_tops_10
|
|
TOPS-10 procedure to copy DEFINE.LIS from the Spear tape to your area.
|
|
|
|
Assign a magtape (xxx), mount the Spear tape, run BACKUP, and type:
|
|
/TAPE MTxxx:<cr>
|
|
/REWIND<cr>
|
|
/INTERCHANGE<cr>
|
|
/FILES<cr>
|
|
/SUPERSEDE ALWAYS<cr>
|
|
/SKIP 1<cr>
|
|
Note: BACKUP will print "DONE" and reprompt. Type:
|
|
|
|
/RESTORE DEFINE.LIS = DEFINE.LIS<cr>
|
|
Note: BACKUP will print the following message and reprompt. Type:
|
|
!
|
|
"DEFINE LST"
|
|
"DONE"
|
|
|
|
/UNLOAD<cr>
|
|
/EXIT<cr>
|
|
Note: Remove and return the Spear distribution tape. Then type:
|
|
|
|
PRINT DEFINE.LIS<cr>
|
|
@@pri_define_tops_20
|
|
TOPS-20 procedure to copy DEFINE.LIS from the Spear tape to your area.
|
|
|
|
Assign a magtape (xxx), mount the Spear tape, run DUMPER, and type:
|
|
DUMPER> TAPE MTxxx:<cr>
|
|
DUMPER> REWIND<cr>
|
|
DUMPER> INTERCHANGE<cr>
|
|
DUMPER> FILES<cr>
|
|
DUMPER> SUPERSEDE ALWAYS<cr>
|
|
DUMPER> SKIP 1<cr>
|
|
Note: DUMPER will print two information messages and reprompt. Type:
|
|
|
|
DUMPER> RESTORE PS:<*>DEFINE.LIS PS:<your directory><cr>
|
|
Note: DUMPER will print the following message and reprompt. Type:
|
|
|
|
% RESTORING FILES TO PS:<your directory>
|
|
PS:<*>DEFINE.LIS => DEFINE.LIS [OK]
|
|
|
|
DUMPER> UNLOAD<cr>
|
|
|
|
Note: Remove and return the Spear distribution tape. Then type:
|
|
|
|
PRINT DEFINE.LIS<cr>
|
|
@@tops_10_ef
|
|
TOPS-10 System Event Files
|
|
|
|
The section of Instruct consists of a series of questions that pertain
|
|
to the TOPS-10 System Event File (ERROR.SYS). Before you attempt to
|
|
answer the questions you should review Chapter 2 of the Spear Manual.
|
|
|
|
Don't forget, you can use the /BREAK feature and return via your ID.
|
|
@@tops_10_ef_a
|
|
Press the RETURN key when you are ready.
|
|
@@tops_10_ef_q1
|
|
TOPS-10 System Event Files - Q1 of 10
|
|
|
|
True or False - Many of the questions that pertain to the TOPS-10
|
|
system event file also pertain to the TOPS-20 system event file.
|
|
@@tops_10_ef_q1_at
|
|
That's correct.
|
|
|
|
In fact, the questions are practically identical. In many cases so are
|
|
the answers. Therefore, if you have already answered the questions as
|
|
they pertain to the TOPS-20 system event file, then you can probably
|
|
afford to skip this section of Instruct. Of course, on the other hand,
|
|
you may want to answer the questions anyway. If that's the case, then
|
|
don't be confused by the redundancy.
|
|
@@tops_10_ef_q1_af
|
|
The statement is TRUE. The TOPS-10 system event file (ERROR.SYS) and the
|
|
TOPS-20 system event file (ERROR.SYS) are very similar. Therefore, it
|
|
stands to reason that many of the questions that pertain to one event
|
|
file will also pertain to the other event file.
|
|
@@tops_10_ef_q2
|
|
TOPS-10 System Event Files - Q2 of 10
|
|
|
|
True or False - The TOPS-10 System Event File is called ERROR.SYS.
|
|
@@tops_10_ef_q2_at
|
|
That's correct.
|
|
|
|
Both the TOPS-10 and the TOPS-20 system event file are called ERROR.SYS.
|
|
The VAX/VMS system event file is called ERRLOG.SYS.
|
|
@@tops_10_ef_q2_af
|
|
The statement is TRUE. The idea of a system event file (ERROR.SYS) was
|
|
first implemented in the early 170's for TOPS-10. Initially, the file
|
|
was used only to record main memory, channel, and disk errors. The idea
|
|
proved to be a good one and new entries were added to the file until now
|
|
ERROR.SYS is the main source of information for solving intermittent
|
|
system failures.
|
|
|
|
In the mid 1970's the idea of a system event file along with the file
|
|
name ERROR.SYS was carried over to TOPS-20. Thus, both the TOPS-10 and
|
|
the TOPS-20 system event file are called ERROR.SYS.
|
|
@@tops_10_ef_q3
|
|
TOPS-10 System Event Files - Q3 of 10
|
|
|
|
True or False - Prior to the Spear library, TOPS-10 used a program
|
|
called SYSERR to record entries in the system event files.
|
|
@@tops_10_ef_q3_at
|
|
The statement is FALSE. Neither SYSERR nor the Spear library have any-
|
|
thing to do with the recording of entries in the system event file. That
|
|
is strictly a function of the operating system.
|
|
|
|
Both SYSERR and the Spear library are designed to process the contents
|
|
of the system event file. SYSERR was a report generator. Basically, it
|
|
allowed the user to select and translate specific entries in the event
|
|
file. The SPEAR library (SYSERR's replacement) is more sophisticated. In
|
|
addition to translating event file entries it also attempts to localize
|
|
the cause of intermittent disk and tape subsystem failures. Note however
|
|
that neither SYSERR nor Spear have anything to do with recording the
|
|
system event file.
|
|
@@tops_10_ef_q3_af
|
|
That is correct.
|
|
|
|
Both SYSERR and its replacement, the SPEAR library, are designed to
|
|
process the contents of the system event file. They have nothing to do
|
|
with recording the entries. That is a strictly function of the operating
|
|
system.
|
|
@@tops_10_ef_q4
|
|
TOPS-10 System Event Files - Q4 of 10
|
|
|
|
True or False - All hardware detected failures are recorded in the
|
|
system event file.
|
|
@@tops_10_ef_q4_at
|
|
The statement is FALSE. Only failures that require operating system
|
|
intervention are recorded in the system event file. Failures that do
|
|
not require operating system intervention are not recorded in the event
|
|
file. For example, some subsystems attempt error recovery locally. In
|
|
most cases, if the recovery is successful then the operating is not
|
|
notified. Thus, those kinds of errors are normally not recorded in the
|
|
system event file.
|
|
@@tops_10_ef_q4_af
|
|
That's correct.
|
|
|
|
Only errors that require operating system intervention are recorded in
|
|
the system event file.
|
|
@@tops_10_ef_q5
|
|
TOPS-10 System Event Files - Q5 of 10
|
|
|
|
True or False - Every record in a TOPS-10 system event file consists of
|
|
a header section and a body section.
|
|
@@tops_10_ef_q5_at
|
|
That's correct.
|
|
|
|
Furthermore, the header and body section of each entry type is described
|
|
in a file called DEFINE.LIS. To obtain a copy of DEFINE.LIS, refer to
|
|
Appendix A on the Event File Menu.
|
|
@@tops_10_ef_q5_af
|
|
The statement is TRUE. Each entry in the TOPS-10 system event file
|
|
consists of a header section and a body section. The header identifies
|
|
the entry type (i.e., event code), the date and time that the entry was
|
|
recorded, the processor serial number, the length of the header section
|
|
and the length of the body section. Currently, the header section is set
|
|
at four words, the body section varies in size depending on the type of
|
|
entry.
|
|
@@tops_10_ef_q6
|
|
TOPS-10 System Event Files - Q6 of 10
|
|
|
|
True or False - Each record in a TOPS-10 system event file represents
|
|
one complete system event.
|
|
@@tops_10_ef_q6_at
|
|
The statement is true with one exception, KLERR. KLERR entries are built
|
|
by the console front-end whenever the KL10 crashes. When the system is
|
|
restarted the entry is transfered via the DTE to KL main memory and then
|
|
recorded in the system event file.
|
|
|
|
Because the buffer area set aside for communications between the console
|
|
and KL main memory is significantly smaller than a typical KLERR entry,
|
|
the entry divided into segments. Each segment is given a unique sequence
|
|
number and recorded as a separate record in the event file. Technically,
|
|
therefore, the statement is FALSE.
|
|
@@tops_10_ef_q6_af
|
|
That's correct.
|
|
|
|
The KLERR entry consists of multiple records. Each record has a separate
|
|
sequence number. When a KLERR entry is translated, however, only the
|
|
first sequence number is used to identify the entry. The other sequence
|
|
numbers are masked-out to avoid confusion.
|
|
@@tops_10_ef_q7
|
|
TOPS-10 System Event Files - Q7 of 10
|
|
|
|
True or False - The synchronization word is used to recover from hard
|
|
read errors that occur while reading the system event file.
|
|
@@tops_10_ef_q7_at
|
|
That's correct.
|
|
|
|
Whenever Spear uses the synchronization word to recover from a hard
|
|
read error it will print the message "Bad header found - RESYNCing".
|
|
@@tops_10_ef_q7_af
|
|
The statement is TRUE. The first word in each system event file data
|
|
block is a synchronization pointer. The pointer points to the starting
|
|
location of the next record in the file. Thus, if a hard read error
|
|
occurs while reading a record Spear skips to the next data block, reads
|
|
the sync word, finds the starting location of the next record, and
|
|
continues reading the file.
|
|
|
|
The idea of adding a synchronization word to each data block in a system
|
|
event file was incorporated in the mid 1970's. Prior to that time, if a
|
|
hard read error occurred while reading the event the remaining records
|
|
in the file were lost. Now only the records affected by the read error are
|
|
lost.
|
|
@@tops_10_ef_q8
|
|
TOPS-10 System Event Files - Q8 of 10
|
|
|
|
True or False - When the TOPS-10 operating system detects a device error
|
|
the following occurs:
|
|
|
|
1. Normal operation is suspended and applicable hardware and/or software
|
|
status is captured (at error) and saved in the Unit Data Block (UDB).
|
|
|
|
2. If applicable, an error recovery algorithm is applied.
|
|
|
|
3. Regardless of whether the recovery algorithm is successful or not,
|
|
the applicable hardware and/or software status is captured again
|
|
(at end) and appended to the UDB.
|
|
|
|
4. The error status stored in the UDB is formatted, assigned a sequence
|
|
number, and appended to to the system event file.
|
|
|
|
5. If the system was able to recover from the error normal operation
|
|
continues. If, however, the system was unable to recover from
|
|
the error, then the job affected by the error is notified and it
|
|
handles the error.
|
|
@@tops_10_ef_q8_at
|
|
That's correct.
|
|
|
|
The action outlined in the question is typical of the way TOPS-10
|
|
handles most device errors. Non-device errors (e.g., CPU errors) and
|
|
errors that affect the operating system itself are also handled in a
|
|
similar manner. If, however, there is no recovery algorithm or if the
|
|
recovery algorithm is unsuccessful, then those errors may result in a
|
|
user job or system crash.
|
|
@@tops_10_ef_q8_af
|
|
The statement is TRUE. Most TOPS-10 device errors are handled this way.
|
|
@@tops_10_ef_q9
|
|
TOPS-10 System Event Files - Q9 of 10
|
|
|
|
True or False - The exact content and format of each TOPS-10 event
|
|
record is described in the Spear Manual.
|
|
@@tops_10_ef_q9_at
|
|
The statement is FALSE. The Spear Manual does describes the report
|
|
formats generated by Retrieve, but it does not describe the content
|
|
and format of the actual event records.
|
|
@@tops_10_ef_q9_af
|
|
That's correct.
|
|
|
|
The event records are described in a file called DEFINE.LIS.
|
|
@@tops_10_ef_q10
|
|
TOPS-10 System Event Files - Q10 of 10
|
|
|
|
True or False - The fifth word in a 011 type record is used to save the
|
|
results of the DATAI performed at the time of the failure.
|
|
|
|
Note: Refer to DEFINE.LIS. If you do not have a copy of DEFINE.LIS and you
|
|
want one, refer to Appendix A on the Event File Menu.
|
|
@@tops_10_ef_q10_at
|
|
The statement is FALSE. Open the DEFINE.LIS to the 011 entry. It starts
|
|
some place around line number 00450. The line number are listed at the
|
|
left of the page.
|
|
|
|
Now skipping over the word, byte, and bit definitions, go down the
|
|
center, or word number column, until you get to word number 5. To the
|
|
left you will see that word number 5 is defined as CONI_INITIAL. To the
|
|
right you will see that CONI_INITIAL is described as "controller status
|
|
at error".
|
|
|
|
Now find word 16. You will see that it is defined as "RH_DATA_BAR_ERR",
|
|
and described as: DATAI from RH10 block address register at error time.
|
|
@@tops_10_ef_q10_af
|
|
That's correct.
|
|
|
|
Word 5 is used to save the CONI status word. The DATAI status is saved
|
|
word 16. Anytime you want to know exactly what hardware and software
|
|
status is saved in an entry type you can consult DEFINE.LIS
|
|
|
|
Now, if you haven't already done so, take a few minutes to look over the
|
|
contents of the file. The introduction explains the overall organization
|
|
and format of an event file record. Following the introduction, each of
|
|
the event types are described in detail.
|
|
|
|
When you are finished, take a few more minutes and compare the reports
|
|
listed in the Spear Manual with the corresponding record descriptions
|
|
listed in DEFINE.LIS. As a result, you will have a better understanding
|
|
of the system event file and the reports that are generated from it.
|
|
@@tops_10_ef_lq
|
|
That's it. There are only 10 questions about TOPS-10 System Event Files.
|
|
Press the RETURN key to return to the System Event File Menu.
|
|
@@tops_20_ef
|
|
TOPS-20 System Event Files
|
|
|
|
The section of Instruct consists of a series of questions that pertain
|
|
to the TOPS-20 System Event File (ERROR.SYS). Before you attempt to
|
|
answer the questions you should review Chapter 2 of the Spear Manual.
|
|
|
|
Don't forget, you can use the /BREAK feature and return via your ID.
|
|
@@tops_20_ef_a
|
|
Press the RETURN key when you are ready.
|
|
@@tops_20_ef_q1
|
|
TOPS-20 System Event Files - Q1 of 10
|
|
|
|
True or False - Many of the questions that pertain to the TOPS-10
|
|
system event file also pertain to the TOPS-20 system event file.
|
|
@@tops_20_ef_q1_at
|
|
That's correct.
|
|
|
|
In fact, the questions are practically identical. In many cases so are
|
|
the answers. Therefore, if you have already answered the questions as
|
|
they pertain to the TOPS-10 system event file, then you can probably
|
|
afford to skip this section of Instruct. Of course, on the other hand,
|
|
you may want to answer the questions anyway. If that's the case, then
|
|
don't be confused by the redundancy.
|
|
@@tops_20_ef_q1_af
|
|
The statement is TRUE. The TOPS-10 system event file (ERROR.SYS) and the
|
|
TOPS-20 system event file (ERROR.SYS) are very similar. Therefore, it
|
|
stands to reason that many of the questions that pertain to one event
|
|
file will also pertain to the other event file.
|
|
@@tops_20_ef_q2
|
|
TOPS-20 System Event Files - Q2 of 10
|
|
|
|
True or False - The TOPS-20 System Event File is called ERROR.SYS.
|
|
@@tops_20_ef_q2_at
|
|
That's correct.
|
|
|
|
Both the TOPS-10 and the TOPS-20 system event file are called ERROR.SYS.
|
|
The VAX/VMS system event file is called ERRLOG.SYS.
|
|
@@tops_20_ef_q2_af
|
|
The statement is TRUE. The idea of a system event file (ERROR.SYS) was
|
|
first implemented in the early 1970's for TOPS-10. Initially, the file
|
|
was used only to record main memory, channel, and disk errors. The idea
|
|
proved to be a good one and new entries were added to the file until now
|
|
ERROR.SYS is the main source of information for solving intermittent
|
|
system failures.
|
|
|
|
In the mid 1970's the idea of a system event file along with the file
|
|
name ERROR.SYS was carried over to TOPS-20. Thus, both the TOPS-10 and
|
|
the TOPS-20 system event file are called ERROR.SYS.
|
|
@@tops_20_ef_q3
|
|
TOPS-20 System Event Files - Q3 of 10
|
|
|
|
True or False - Prior to the Spear library, TOPS-20 used a program
|
|
called SYSERR to record entries in the system event files.
|
|
@@tops_20_ef_q3_at
|
|
The statement is FALSE. Neither SYSERR nor the Spear library have any-
|
|
thing to do with the recording of entries in the system event file. That
|
|
is strictly a function of the operating system.
|
|
|
|
Both SYSERR and the Spear library are designed to process the contents
|
|
of the system event file. SYSERR was a report generator. Basically, it
|
|
allowed the user to select and translate specific entries in the event
|
|
file. The SPEAR library (SYSERR's replacement) is more sophisticated. In
|
|
addition to translating event file entries it also attempts to localize
|
|
the cause of intermittent disk and tape subsystem failures. Note however
|
|
that neither SYSERR nor Spear have anything to do with recording the
|
|
system event file.
|
|
@@tops_20_ef_q3_af
|
|
That is correct.
|
|
|
|
Both SYSERR and its replacement, the SPEAR library, are designed to
|
|
process the contents of the system event file. They have nothing to do
|
|
with recording the entries. That is a strictly function of the operating
|
|
system.
|
|
@@tops_20_ef_q4
|
|
TOPS-20 System Event Files - Q4 of 10
|
|
|
|
True or False - All hardware detected failures are recorded in the
|
|
system event file.
|
|
@@tops_20_ef_q4_at
|
|
The statement is FALSE. Only failures that require operating system
|
|
intervention are recorded in the system event file. Failures that do
|
|
not require operating system intervention are not recorded in the event
|
|
file. For example, some subsystems attempt error recovery locally. In
|
|
most cases, if the recovery is successful then the operating is not
|
|
notified. Thus, those kinds of errors are normally not recorded in the
|
|
system event file.
|
|
@@tops_20_ef_q4_af
|
|
That's correct.
|
|
|
|
Only errors that require operating system intervention are recorded in
|
|
the system event file.
|
|
@@tops_20_ef_q5
|
|
TOPS-20 System Event Files - Q5 of 10
|
|
|
|
True or False - Every record in a TOPS-20 system event file consists of
|
|
a header section and a body section.
|
|
@@tops_20_ef_q5_at
|
|
That's correct.
|
|
|
|
Furthermore, the header and body section of each entry type is described
|
|
in a file called DEFINE.LIS. To obtain a copy of DEFINE.LIS, refer to
|
|
Appendix A on the Event File Menu.
|
|
@@tops_20_ef_q5_af
|
|
The statement is TRUE. Each entry in the TOPS-20 system event file
|
|
consists of a header section and a body section. The header identifies
|
|
the entry type (i.e., event code), the date and time that the entry was
|
|
recorded, the processor serial number, the length of the header section
|
|
and the length of the body section. Currently, the header section is set
|
|
at four words, the body section varies in size depending on the type of
|
|
entry.
|
|
@@tops_20_ef_q6
|
|
TOPS-20 System Event Files - Q6 of 10
|
|
|
|
True or False - Each record in a TOPS-20 system event file represents
|
|
one complete system event.
|
|
@@tops_20_ef_q6_at
|
|
The statement is true with one exception, KLERR. KLERR entries are built
|
|
by the console front-end whenever the KL10 crashes. When the system is
|
|
restarted the entry is transfered via the DTE to KL main memory and then
|
|
recorded in the system event file.
|
|
|
|
Because the buffer area set aside for communications between the console
|
|
and KL main memory is significantly smaller than a typical KLERR entry,
|
|
the entry divided into segments. Each segment is given a unique sequence
|
|
number and recorded as a separate record in the event file. Technically,
|
|
therefore, the statement is FALSE.
|
|
@@tops_20_ef_q6_af
|
|
That's correct.
|
|
|
|
The KLERR entry consists of multiple records. Each record has a separate
|
|
sequence number. When a KLERR entry is translated, however, only the
|
|
first sequence number is used to identify the entry. The other sequence
|
|
numbers are masked-out to avoid confusion.
|
|
@@tops_20_ef_q7
|
|
TOPS-20 System Event Files - Q7 of 10
|
|
|
|
True or False - The synchronization word is used to recover from hard
|
|
read errors that occur while reading the system event file.
|
|
@@tops_20_ef_q7_at
|
|
That's correct.
|
|
|
|
Whenever Spear uses the synchronization word to recover from a hard
|
|
read error it will print the message "Bad header found - RESYNCing".
|
|
@@tops_20_ef_q7_af
|
|
The statement is TRUE. The first word in each system event file data
|
|
block is a synchronization pointer. The pointer points to the starting
|
|
location of the next record in the file. Thus, if a hard read error
|
|
occurs while reading a record Spear skips to the next data block, reads
|
|
the sync word, finds the starting location of the next record, and
|
|
continues reading the file.
|
|
|
|
The idea of adding a synchronization word to each data block in a system
|
|
event file was incorporated in the mid 1970's. Prior to that time, if a
|
|
hard read error occurred while reading the event the remaining records
|
|
in the file were lost. Now only the record effected by the read error is
|
|
lost.
|
|
@@tops_20_ef_q8
|
|
TOPS-20 System Event Files - Q8 of 10
|
|
|
|
True or False - When the TOPS-20 operating system detects a device error
|
|
the following occurs:
|
|
|
|
1. Normal operation is suspended and applicable hardware and/or software
|
|
status is captured (at error) and saved in a buffer.
|
|
|
|
2. If applicable, an error recovery algorithm is applied.
|
|
|
|
3. Regardless of whether the recovery algorithm is successful or not,
|
|
the applicable hardware and/or software status is captured again
|
|
(at end) and appended to the buffer.
|
|
|
|
4. The contents of the buffer are formatted, assigned a sequence number,
|
|
and appended to to the system event file.
|
|
|
|
5. If the system was able to recover from the error normal operation
|
|
continues. If, however, the the system was unable to recover from
|
|
the error, then the job effected by the error is notified and it
|
|
handles the error.
|
|
@@tops_20_ef_q8_at
|
|
That's correct.
|
|
|
|
The action outlined in the question is typical of the way TOPS-20
|
|
handles most device errors. Non-device errors (e.g., CPU errors) and
|
|
errors that affect the operating system itself are also handled in a
|
|
similar manner. If, however, there is no recovery algorithm or if the
|
|
recovery algorithm is unsuccessful, then those errors may result in a
|
|
user job or system crash.
|
|
@@tops_20_ef_q8_af
|
|
The statement is TRUE. Most TOPS-20 device errors are handled this way.
|
|
@@tops_20_ef_q9
|
|
TOPS-20 System Event Files - Q9 of 10
|
|
|
|
True or False - The exact content and format of each TOPS-20 event
|
|
record is described in the Spear Manual.
|
|
@@tops_20_ef_q9_at
|
|
The statement is FALSE. The Spear Manual does describes the report
|
|
formats generated by Retrieve. But it does not describe the content
|
|
and format of the actual event records.
|
|
@@tops_20_ef_q9_af
|
|
That's correct.
|
|
|
|
The event records are described in a file called DEFINE.LIS. To obtain a
|
|
copy of DEFINE.LIS refer to Appendix A on the Event File Menu.
|
|
|
|
P.S. You will need a copy of DEFINE.LIS to answer the next question.
|
|
@@tops_20_ef_q10
|
|
TOPS-20 System Event Files - Q10 of 10
|
|
|
|
True or False - The thirty second word in a 111 type record is used to
|
|
save the first channel control word.
|
|
|
|
Note: Refer to DEFINE.LIS. If you do not have a copy of DEFINE.LIS and you
|
|
want one, refer to Appendix A on the Event File Menu.
|
|
@@tops_20_ef_q10_at
|
|
The statement is FALSE. Open the DEFINE.LIS to the 111 entry. It starts
|
|
some place around line number 01320. The line number are listed at the
|
|
left of the page.
|
|
|
|
Now skipping over the word, byte, and bit definitions, go down the
|
|
center, or word number column, until you get to word number 32. To the
|
|
left you will see that word number 32 is defined as RETRY_CNT. To the
|
|
right you will see that the RETRY_CNT is saved in bit 18 through 35 of
|
|
the word and it is described as "final retry error count".
|
|
|
|
Now find word number 28. You will see that it is defined as CCW1, it
|
|
consists of 36 bits, and it is described as "first chan control word".
|
|
@@tops_20_ef_q10_af
|
|
That's correct.
|
|
|
|
Word 32 is used to save the error retry count. The first channel control
|
|
word is saved in word 28. Anytime you want to know exactly what hardware
|
|
and software status is saved in an entry type you can consult DEFINE.LIS
|
|
|
|
Now, if you haven't already done so, take a few minutes to look over the
|
|
contents of the file. The introduction explains the overall organization
|
|
and format of an event file record. Following the introduction, each of
|
|
the event types are described in detail.
|
|
|
|
When you are finished, take a few more minutes and compare the reports
|
|
listed in the Spear Manual with the corresponding record descriptions
|
|
listed in DEFINE.LIS. As a result, you will have a better understanding
|
|
of the system event file and the reports that are generated from it.
|
|
@@tops_20_ef_lq
|
|
That's it. There are only 10 questions about TOPS-20 System Event Files.
|
|
Press the RETURN key to return to the System Event File Menu.
|
|
@@vax_vms_ef
|
|
|
|
You do not need to understand file structures to use System Event Files
|
|
to isolate system failures. However, in order to be effective you should
|
|
understand something about their format and content. Chapter 5 of the
|
|
VAX11 Spear Manual describes the overall format and content. Appendix B
|
|
of the VAX11 Spear Manual describes in detail, the content of each
|
|
record type that you will find in the system event file.
|
|
|
|
This section of Instruct consists of a series of general and specific
|
|
questions about the VAX/VMS System Event File (ERRLOG.SYS). Before you
|
|
attempt to answer the questions you should review Chapter 5 and Appendix
|
|
B in the Spear Manual. (Don't forget, you can use the /BREAK feature and
|
|
return via your student ID.)
|
|
@@
|
|
@@vax_vms_ef_a
|
|
Press the RETURN key when you are ready.
|
|
@@
|
|
@@vax_vms_ef_q1
|
|
Q1 of 10 (VAX/VMS System Event Files)
|
|
|
|
True or False - Several of the questions that pertain to the VAX/VMS
|
|
system event file also pertain to the TOPS-20 system event file.
|
|
@@
|
|
@@vax_vms_ef_q1_at
|
|
That's correct.
|
|
|
|
@@
|
|
@@vax_vms_ef_q1_af
|
|
The statement is TRUE. The VAX/VMS system event file (ERRLOG.SYS) and the
|
|
TOPS-20 system event file (ERROR.SYS) are very similar in concept. Therefore,
|
|
it stands to reason that many of the questions that pertain to one event
|
|
file will also pertain to the other event file.
|
|
@@
|
|
|
|
@@vax_vms_ef_q2
|
|
Q2 of 10 (VAX/VMS System Event Files)
|
|
|
|
True or False - In addition to the Spear library, VAX/VMS uses a program
|
|
called SYE to record entries in the system event files.
|
|
@@
|
|
@@vax_vms_ef_q2_at
|
|
VAX/VMS Q2
|
|
|
|
The statement is FALSE. Neither SYE nor the Spear library have any-
|
|
thing to do with the recording of entries in the system event file. That
|
|
is strictly a function of the operating system.
|
|
|
|
Both SYE and the Spear library are designed to process the contents
|
|
of the system event file. SYE is a report generator. Basically, it
|
|
allows the user to select and translate specific entries in the event
|
|
file. The SPEAR library is more sophisticated. In
|
|
addition to translating event file entries it also attempts to localize
|
|
the cause of intermittent disk and tape subsystem failures. Note however
|
|
that neither SYE nor Spear have anything to do with recording the
|
|
system event file.
|
|
@@
|
|
@@vax_vms_ef_q2_af
|
|
VAX/VMS Q2
|
|
|
|
That is correct.
|
|
|
|
Both SYE and the SPEAR library are designed to
|
|
process the contents of the system event file. They have nothing to do
|
|
with recording the entries. That is a strictly function of the operating
|
|
system.
|
|
@@
|
|
@@vax_vms_ef_q3
|
|
Q3 of 10 (VAX/VMS System Event Files)
|
|
|
|
True or False - The VAX/VMS System Event File is called ERRLOG.SYS.
|
|
@@
|
|
@@vax_vms_ef_q3_at
|
|
VAX/VMS Q3
|
|
|
|
That's correct.
|
|
The VAX/VMS system event file is called ERRLOG.SYS.
|
|
Both the TOPS-10 and the TOPS-20 system event file are called ERROR.SYS.
|
|
@@
|
|
@@vax_vms_ef_q3_af
|
|
VAX/VMS Q3
|
|
The statement is TRUE. The idea of a system event file (ERROR.SYS) was
|
|
first implemented in the early 1970's for TOPS-10. Initially, the file
|
|
was used only to record main memory, channel, and disk errors. The idea
|
|
proved to be a good one and new entries were added to the file until now
|
|
ERROR.SYS is the main source of information for solving intermittent
|
|
system failures.
|
|
|
|
In the mid 1970's the idea of a system event file along with the file
|
|
name ERROR.SYS was carried over to TOPS-20. Thus, both the TOPS-10 and
|
|
the TOPS-20 system event file are called ERROR.SYS.
|
|
|
|
@@
|
|
@@vax_vms_ef_q4
|
|
Q4 of 10 (VAX/VMS System Event Files)
|
|
|
|
True or False -
|
|
More than one process may do read access on the error file at the same
|
|
time.
|
|
|
|
@@
|
|
@@vax_vms_ef_q4_at
|
|
VAX/VMS Q4
|
|
|
|
That's correct.
|
|
More than one process may read the file at the same time.
|
|
@@
|
|
@@vax_vms_ef_q4_af
|
|
VAX/VMS Q4
|
|
|
|
The statement is TRUE
|
|
The problem arises when the operating system tries to write to the file and
|
|
finds some other process reading the file. In this case, the operating system
|
|
creates a new file.
|
|
@@
|
|
@@vax_vms_ef_q5
|
|
Q5 of 10 (VAX/VMS System Event Files)
|
|
|
|
True or False -
|
|
All I/O device errors are logged under the device error record format
|
|
regardless of the type of device.
|
|
@@
|
|
@@vax_vms_ef_q5_at
|
|
VAX/VMS Q5
|
|
|
|
That's correct.
|
|
The CPU and memory errors are recorded different record formats but not
|
|
I/O device errors
|
|
@@
|
|
@@vax_vms_ef_q5_af
|
|
VAX/VMS Q5
|
|
|
|
The statement is TRUE
|
|
Only TOPS-10 and TOPS-20 use different record formats for different types
|
|
of I/O devices.
|
|
@@
|
|
@@vax_vms_ef_q6
|
|
Q6 of 10 (VAX/VMS System Event Files)
|
|
|
|
True or False -
|
|
Only device errors and other hardware detected errors are recorded in the
|
|
VMS system error file.
|
|
@@
|
|
@@vax_vms_ef_q6_at
|
|
VAX/VMS Q6
|
|
|
|
The statement is FALSE
|
|
Many other types of information are also recorded in the error file
|
|
such as volume mounts and dismounts. Software detected errors are also
|
|
recorded in this file as well as text messages from the operator.
|
|
@@
|
|
@@vax_vms_ef_q6_af
|
|
VAX/VMS Q6
|
|
|
|
That's correct.
|
|
There are many other sources of the information found in the error file.
|
|
@@
|
|
@@vax_vms_ef_q7
|
|
Q7 of 10 (VAX/VMS System Event Files)
|
|
|
|
True or False -
|
|
The format of the device error entry is the same regardless of the type
|
|
of VAX CPU used in the system.
|
|
@@
|
|
@@vax_vms_ef_q7_at
|
|
VAX/VMS Q7
|
|
|
|
That's correct.
|
|
Only the CPU specific entries are different.
|
|
@@
|
|
@@vax_vms_ef_q7_af
|
|
VAX/VMS Q7
|
|
|
|
The statement is TRUE
|
|
Only the CPU specific entries are different.
|
|
@@
|
|
@@vax_vms_ef_q8
|
|
Q8 of 10 (VAX/VMS System Event Files)
|
|
|
|
True or False -
|
|
If the operating system must create a new version of the error file,
|
|
ERRLOG.SYS, it renames the current version to ERRLOG.OLD and then
|
|
creates the new file.
|
|
@@
|
|
@@vax_vms_ef_q8_at
|
|
VAX/VMS Q8
|
|
|
|
The statement is FALSE
|
|
The operating system will create a new file using the same name and the next
|
|
higher version number.
|
|
@@
|
|
@@vax_vms_ef_q8_af
|
|
VAX/VMS Q8
|
|
|
|
That's correct.
|
|
The convention of renaming the error file to ERRLOG.OLD has nothing to do
|
|
with the operating system.
|
|
@@
|
|
@@vax_vms_ef_q9
|
|
Q9 of 10 (VAX/VMS System Event Files)
|
|
|
|
True or False -
|
|
|
|
The media identification is not included as part of the information recorded
|
|
in a device error.
|
|
@@
|
|
@@vax_vms_ef_q9_at
|
|
VAX/VMS Q9
|
|
|
|
That's correct.
|
|
The media information is recorded in the system event file when the media is
|
|
mounted or dismounted.
|
|
@@
|
|
@@vax_vms_ef_q9_af
|
|
VAX/VMS Q9
|
|
|
|
The statement is TRUE
|
|
The media information is recorded in the system event file when the media is
|
|
mounted or dismounted.
|
|
@@
|
|
@@vax_vms_ef_q10
|
|
Q10 of 10 (VAX/VMS System Event Files)
|
|
|
|
True or False -
|
|
|
|
Some device error records in the event file may have no apparent indication
|
|
of any error occuring.
|
|
@@
|
|
@@vax_vms_ef_q10_at
|
|
VAX/VMS Q10
|
|
|
|
That's correct.
|
|
Media off line is a good example. It this case the "on-line" bit would
|
|
be off indicating the error.
|
|
@@
|
|
@@vax_vms_ef_q10_af
|
|
VAX/VMS Q10
|
|
|
|
The statement is TRUE
|
|
Media off line is a good example. It this case the "on-line" bit would
|
|
be off indicating the error.
|
|
@@
|
|
|
|
@@vax_vms_ef_lq
|
|
That's it. There are only 10 questions about VAX/VMS System Event Files.
|
|
Press the RETURN key to get back to the System Event File Menu.
|
|
|
|
@@3.0.
|
|
Spear Library
|
|
|
|
Introduction
|
|
|
|
Spear is an on-line maintenance software library that runs under three
|
|
operating systems: TOPS-10, TOPS-20, and VAX/VMS. Currently, the library
|
|
contains three functions: Summarize, Retrieve, and Compute.
|
|
|
|
These functions; Summarize and Retrieve, are designed
|
|
to help you sort and evaluate 32- and 36-bit system event files. The
|
|
third function, Compute, calculates system availability. Its purpose is
|
|
to help you prepare crash and up time reports and determine overall
|
|
system performance.
|
|
|
|
@@3.0.A.
|
|
Each Spear Library function supports a dialog style user interface. The
|
|
dialog prompts for information and waits for a response. If the prompt
|
|
accepts a default, the default will be (parenthetically) included as
|
|
part of the prompt.
|
|
|
|
@@R.T.3.0.
|
|
STOP - You are moving in a reverse direction through the menu. You are
|
|
about to back into the Spear Library Introduction.
|
|
|
|
Your response please:
|
|
@@R.T.3.1.0.B.
|
|
STOP - You are moving in a reverse direction through the menu. You are
|
|
about to back into the the Introduction
|
|
|
|
@@R.T.3.1.1.F.
|
|
STOP - You are moving in a reverse direction through the menu. You are
|
|
about to back into
|
|
@@M.
|
|
Spear Course Menu
|
|
|
|
|
|
1. Course Administrator/Student Guide
|
|
|
|
2. Troubleshooting
|
|
|
|
3. System Event Files
|
|
|
|
4. Using The Spear Library
|
|
|
|
5. Guaranteed Uptime Program/NOTIFY
|
|
|
|
6. Feedback
|
|
|
|
7. Random Questions
|
|
|
|
8. Dialog Changes
|
|
|
|
@@R.T.3.1.2.M.
|
|
STOP - You are moving in a reverse direction thru the menu. You are
|
|
about to back into the Menu.
|
|
|
|
Your response please:
|
|
@@3.2.0.
|
|
The Big Picture Input File
|
|
:
|
|
Retrieve accepts event ..........:.........
|
|
files and packet files. : :
|
|
Event File Packet File
|
|
The Selected information .....:..... :
|
|
in an event file can be: : : :
|
|
Included in, or Excluded Include Exclude Packet
|
|
from, the output file. :.........: Numbers
|
|
: :
|
|
One or more Packets can Selection and :
|
|
be selected from a Packet Time Criteria :
|
|
file. :..................:
|
|
:
|
|
Output Mode
|
|
.....:.....
|
|
Retrieve can translate the : :
|
|
selected entries or it can ASCII Binary
|
|
save the selected entries :.........:
|
|
in a binary history file. :
|
|
Output File
|
|
@@R.T.3.2.0.
|
|
STOP - You are moving in a reverse direction thru the menu. You are
|
|
about to back into the Retrieve Overview.
|
|
@@3.2.M.
|
|
Spear Library - Retrieve
|
|
|
|
Topic menu:
|
|
|
|
|
|
1. Overview
|
|
|
|
2. Retrieve Dialog
|
|
|
|
3. Retrieve Questions & Answers
|
|
@@3.2.1.
|
|
The basic Retrieve dialog consists of eight selection prompts and
|
|
one confirmation prompt.
|
|
|
|
RETRIEVE mode
|
|
-------------
|
|
Event or packet file (default):
|
|
|
|
Selection to be (INCLUDED):
|
|
|
|
Selection type (ALL):
|
|
|
|
Time from (EARLIEST):
|
|
|
|
Time to (LATEST):
|
|
|
|
Output mode (ASCII):
|
|
|
|
Report format (SHORT):
|
|
|
|
Output to ([DSK]:RETRIE.RPT):
|
|
|
|
Type [cr] to confirm (/GO):
|
|
@@3.2.1.A.
|
|
The first selection prompt:
|
|
|
|
Event or packet file (default):
|
|
|
|
allows you to specify the name of the input file. The default response
|
|
(SYS:ERROR.SYS for TOPS-10, SERR:ERROR.SYS for TOPS-20, and
|
|
SYS$ERRORLOG:ERRLOG.SYS for VAX/VMS) is enclosed in parentheses and can
|
|
be selected by pressing the RETURN key.
|
|
Retrieve accepts two types of files: standard system event files (such
|
|
as those generated by TOPS-10, TOPS-20, or VAX/VMS systems), and Packet
|
|
files
|
|
|
|
If you specify a system event file Retrieve will continue with the basic
|
|
dialog. If you specify a Packet file, however, Retrieve will switch to
|
|
the Packet selection dialog. Since the Packet dialog is short (1 prompt)
|
|
it will be explained next. Then we will continue with the basic dialog.
|
|
|
|
This prompt also supports standard Help and question mark (?) responses.
|
|
@@RETRIEVE INPUT
|
|
|
|
Selection Criteria ___________. .___ Short Report
|
|
.________!_________. !___ Full Report
|
|
Event File ___. ! Event Retrieval ! !___ Raw Data Report
|
|
!___! Translation !___!
|
|
Packet File ___! ! and/or Storage ! !
|
|
!__________________! !___ Device History
|
|
Merge File (binary) ___________! Files (binary)
|
|
|
|
Retrieve can be used to generate reports, or it can be used to establish
|
|
and maintain device history files. If you choose to generate a report,
|
|
you can select one of three formats: Short, Full, or Octal (Hexadecimal
|
|
on VAX/VMS systems). If you choose to generate a device history file you
|
|
will be asked if you want to merge it with an existing (history) file.
|
|
@@3.2.1.B.
|
|
If you specify a Packet file at the input file prompt, Retrieve will
|
|
prompt you for the packet numbers that you want to select.
|
|
|
|
Event or packet file (SERR:ERROR.SYS): DSK:A1225.PAK<CR>
|
|
|
|
Packet numbers:
|
|
|
|
Each numbered packet contains a list of sequence numbers. The sequence
|
|
numbers identify the individual records that were used by Analyze as
|
|
evidence to support the theories listed in the corresponding Analyze
|
|
Report file. There is one packet for each theory listed in the report.
|
|
|
|
You can use Retrieve to translate (or save in a separate binary file)
|
|
the records listed in the packet files. Typically, you would translate
|
|
a packet if you wanted to examine the records that were used as evidence
|
|
to support a particular theory. You would save the records if you were
|
|
building or maintaining a history file for a particular device or a
|
|
specific type of error.
|
|
|
|
This prompt also supports standard Help and Question mark (?) responses.
|
|
@@3.2.1.C.
|
|
If you specify multiple packet numbers, each number should be separated
|
|
by a comma. You should realize, however, that if you specify more than
|
|
one packet number the records listed in the packets will be grouped and
|
|
translated (or saved) according to sequence numbers. In other words, the
|
|
records will not be grouped according to packet number.
|
|
|
|
After prompting for packet numbers, Retrieve will skip the "Time from"
|
|
and "Time to" prompts and pickup the basic dialog at the "Output mode"
|
|
prompt. From that point on, there is no difference between the Event
|
|
File dialog and the Packet File dialog.
|
|
|
|
Event or packet file (SERR:ERROR.SYS): DSK:A1225.PAK<cr>
|
|
|
|
Packet numbers: 3,7,14<cr>
|
|
|
|
Output mode (ASCII):
|
|
|
|
Report format (SHORT):
|
|
|
|
Output to ([DSK]:RETRIE.RPT):
|
|
|
|
Type [cr] to confirm (/GO):
|
|
@@3.2.1.D.
|
|
Back to the basic Retrieve dialog. The second selection prompt:
|
|
|
|
Event or packet file (default):
|
|
Selection to be (INCLUDED):?
|
|
INCLUDED
|
|
EXCLUDED
|
|
|
|
allows you to specify whether the selected entries will be included in,
|
|
or excluded from, the output file. Included is the normal response. If,
|
|
however, you specify Excluded, then ALL the entries in the input file
|
|
(except those that you select later in the dialog) will be extracted and
|
|
translated or saved in the output file.
|
|
|
|
The Exclude feature is used to purge entries from a system event file
|
|
before the file is translated or saved. For example, suppose
|
|
a communications node developed a problem that caused the event file to
|
|
fill up with Network entries; since you know what caused the problem you
|
|
might want to remove the entries before you process or save the file.
|
|
Note: the original (or input) file will not be altered in any way.
|
|
|
|
This prompt also supports standard Help and Question mark (?) responses.
|
|
@@RETRIEVE TYPE
|
|
The following example illustrates the difference between Include
|
|
and Exclude.
|
|
|
|
Include(event type C) Exclude(event type C)
|
|
|
|
Time: From To From To
|
|
: : : :
|
|
Input file: CABBACBCCAABCABBCAACCBCA CABBACBCCAABCABBCAACCBCA
|
|
Output file: CC C C CABBACB AAB ABB AACCBCA
|
|
@@3.2.1.E.
|
|
The third selection prompt asks you to choose from two separate lists.
|
|
|
|
Selection type (ALL):
|
|
|
|
Type one or more of the following from the first group:
|
|
|
|
ERROR
|
|
STATISTICS
|
|
DIAGNOSTICS
|
|
CONFIGURATION
|
|
OTHER
|
|
|
|
If you choose more than one of these types, separate each with
|
|
a comma.
|
|
|
|
Or, type one of the following from the second group:
|
|
|
|
the RETURN key, or ALL
|
|
SEQUENCE
|
|
CODE
|
|
@@3.2.1.EA.
|
|
|
|
|
|
ERROR - indicates that you want to select entries that contain
|
|
actual failure data. If you select ERROR you can also specify
|
|
the particular error types for which you are looking in relation
|
|
to the specific device.
|
|
|
|
STATISTICS - indicates that you want to select statistic
|
|
entries.
|
|
|
|
DIAGNOSTICS - indicates that you want to select entries
|
|
created by a diagnostic.
|
|
|
|
CONFIGURATION - indicates that you want to select configuration
|
|
entries.
|
|
|
|
OTHER - indicates that you want to select entries that do not fit
|
|
into the other types.
|
|
|
|
These responses will be explained later, after the frames relating to
|
|
SEQUENCE and CODE.
|
|
@@3.2.1.EB.
|
|
|
|
ALL (or the RETURN key) - indicates that you want to select
|
|
all the entries in the file. (This is the default). You can
|
|
further qualify the selection at the Time prompts.
|
|
|
|
SEQUENCE - indicates that you want to select entries according
|
|
to sequence numbers. This response will be explained next.
|
|
|
|
CODE - indicates that you want to select entries based on the
|
|
event codes assigned each type of entry by the operating
|
|
system. This response will be explained after the SEQUENCE
|
|
response.
|
|
|
|
@@3.2.1.EC.
|
|
|
|
When you specify SEQUENCE in response to the "Selection type"
|
|
prompt, Retrieve will prompt you for the sequence numbers that
|
|
you want to select.
|
|
|
|
Selection type (ALL): SEQUENCE<cr>
|
|
|
|
Sequence numbers: 22,24,35-67,12<cr>
|
|
|
|
You can select as many sequence numbers as you want. Individual
|
|
sequence numbers must be separated by commas, groups of sequence
|
|
numbers must be specified by entering the first and last sequence
|
|
numbers in the group. The sequence numbers must be separated by a
|
|
dash (-). For example, 35-67 indicates that you want to select
|
|
sequence numbers 35 through 67.
|
|
@@3.2.1.ED.
|
|
If you specify CODE in response to the "Selection type" prompt,
|
|
Retrieve will prompt you for the event codes that you want to
|
|
select.
|
|
|
|
Selection type (ALL): CODE<cr>
|
|
|
|
Event codes: 133,161-163<cr>
|
|
|
|
You can select as many event codes as you want. Each event code
|
|
must be separated by a comma. You can also select groups of
|
|
event codes. The first and last event codes in the group must be
|
|
separated by a dash (-). For example, 161-163 indicates that you
|
|
want to select event codes 161 through 163.
|
|
|
|
This prompt also supports standard Help and Question mark (?)
|
|
responses.
|
|
@@3.2.1.EE.
|
|
If you specify ERROR, STATISTICS, DIAGNOSTICS, or OTHER, or a
|
|
combination of these responses to the "Selection" prompt,
|
|
Retrieve will enter the "Error class" dialog.
|
|
|
|
Selection type (ALL): ERROR
|
|
|
|
Category(ALL):
|
|
|
|
ALL
|
|
MAINFRAME
|
|
DISK
|
|
TAPE
|
|
CI
|
|
NI
|
|
UNITRECORD
|
|
NETWORK
|
|
OPERATING-SYSTEM
|
|
COMM
|
|
PACKID
|
|
REELID
|
|
HELP
|
|
@@3.2.1.EF.
|
|
|
|
ALL (or the RETURN key) - indicates that you want to select
|
|
all errors. (This is the default).
|
|
|
|
MAINFRAME - indicates that you want to select errors occurring in
|
|
specific mainframe components.
|
|
|
|
DISK - indicates that you want to select errors occurring on disk
|
|
units. After selecting DISK, you can specify ALL the specific
|
|
disks by name (DPA3, RPB7), or by disk type (RP06, RM05).
|
|
|
|
TAPE - indicates that you want to select errors occurring on tape
|
|
units. After selecting TAPE, you can specifiy ALL, or specify the
|
|
tape names or types in question.
|
|
|
|
CI - indicates that you want to select CI-related errors. After
|
|
selecting CI, you can specify ALL, or the specific component of
|
|
interest.
|
|
|
|
NI - indicates that you want to select NI-related errors.
|
|
@@3.2.1.EG.
|
|
|
|
UDA - indicates that you want to select UDA-related errors.
|
|
After selecting UDA, you can specify ALL, or the specific
|
|
component of interest.
|
|
|
|
UNITRECORD - indicates that you want to select errors occurring
|
|
on unit-record devices such as card readers and line printers.
|
|
After selecting UNITRECORD, you can specify ALL, or type the
|
|
specific device names or types in question.
|
|
|
|
OPERATING-SYSTEM - indicates that you want to select operating
|
|
system codes. After selecting OPERATING-SYSTEM, you can specify
|
|
ALL, or type the name of a specific STOPCODE or BUG type.
|
|
|
|
COMM - indicates that you want to select errors occurring on
|
|
communication devices.
|
|
@@3.2.1.EH.
|
|
|
|
PACKID - indicates that you want to select specific disk packs.
|
|
After typing PACKID, you can type ALL, or type the specific pack
|
|
identifiers.
|
|
|
|
REELID - indicates that you want to select specific tape reels.
|
|
After typing REELID, you can type ALL, or the specific tape
|
|
identifiers.
|
|
|
|
HELP - indicates that you want to get detailed information
|
|
on the above categories.
|
|
|
|
All categories except for COMM and NI prompt further for specific
|
|
device types. Type ? at the subprompt level to get a list of
|
|
acceptable responses.
|
|
|
|
If you choose the DISK drive, TAPE drive, or CI controller subprompt,
|
|
Retrieve then prompts you further for an error type. Type ? at the
|
|
subprompt level to get a list of acceptable responses.
|
|
@@3.2.1.EI.
|
|
|
|
|
|
RETRIEVE keeps prompting you for categories until you either type
|
|
FINISHED, or press the RETURN key.
|
|
|
|
Next Category (FINISHED):
|
|
|
|
Type one of the following:
|
|
|
|
The RETURN key, or FINISHED to take the default,
|
|
|
|
or,
|
|
|
|
another category.
|
|
@@3.2.1.V.
|
|
Back to the basic dialog. The fourth selection prompt:
|
|
|
|
Time from (EARLIEST):
|
|
|
|
allows you to specify the time at which you want the selection process
|
|
to begin. The default response (EARLIEST) is inclosed in parentheses
|
|
and can be selected by pressing the RETURN key. You can also specify
|
|
real and relative time.
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
@@3.2.1.W.
|
|
The fifth selection prompt:
|
|
|
|
Time to (LATEST):
|
|
|
|
allows you to specify the time at which you want the selection process
|
|
to end. The default response (LATEST) is inclosed in parentheses
|
|
and can be selected by pressing the RETURN key. Again, you can also
|
|
specify real and relative time.
|
|
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
@@3.2.1.X.
|
|
The sixth selection prompt allows you to specify the type of file that
|
|
you want Retrieve to generate.
|
|
|
|
Output mode (ASCII): ?
|
|
|
|
ASCII - indicates that you want the selected entries extracted
|
|
and translated in a report.
|
|
|
|
BINARY - indicates that you want the selected entries extracted
|
|
and saved in a binary file.
|
|
|
|
We will discuss the ASCII response first.
|
|
|
|
This prompt also supports standard Help and question mark (?) responses.
|
|
@@3.2.1.Y.
|
|
If you specify ASCII in response to the "Output mode" prompt, Retrieve
|
|
will prompt you for the type format that you want.
|
|
|
|
Output mode (ASCII):
|
|
|
|
Report format (SHORT): ?
|
|
|
|
SHORT - indicates that you want a brief translation of each
|
|
selected entry.
|
|
|
|
FULL - indicates that you want a detailed translation of each
|
|
selected entry.
|
|
|
|
OCTAL - indicates that you want an octal translation of each
|
|
selected entry. Normally, octal translations are used
|
|
to debug errors in Spear or the software routines that
|
|
record the entries.
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
@@3.2.1.Z.
|
|
If you specify BINARY in response to the "Output mode" prompt, Retrieve
|
|
will ask you if you want to merge the selected entries with an existing
|
|
binary file.
|
|
|
|
Output mode (ASCII): BINARY<cr>
|
|
|
|
Merge with (NONE):
|
|
|
|
Normally, merging is done only if you are maintaining a device history
|
|
file. For example, suppose the processor was experiencing a highly
|
|
intermittent failure. Let's say that on the average, the failure occurred
|
|
once a week. Given that situation, you might need several weeks or
|
|
even a months worth of error information to isolate the cause of the
|
|
problem.
|
|
|
|
Since an event file can get quite large over a period of several weeks
|
|
or a month, you might consider establishing a history file to keep track
|
|
of the failure. The merge feature is designed to help you do this. It
|
|
allows you to combine the currently selected entries with previously
|
|
selected entries and merge them in the output file.
|
|
@@3.2.1.A1.
|
|
The Merge prompt also supports standard Help and question mark (?)
|
|
responses.
|
|
|
|
@@3.2.1.B1.
|
|
The eighth and last selection prompt:
|
|
|
|
Output to ([DSK]:RETRIE.RPT):
|
|
|
|
allows you to specify the name of the output or file. The default file
|
|
name is DSK:RETRIE.RPT for TOPS-10/TOPS-20, and RETRIE.RPT for VAX/VMS
|
|
(if you are generating a report). The default becomes DSK:RETRIE.SYS
|
|
for TOPS-10/TOPS-20, and RETRIE.SYS for VAX/VMS (if you are building or
|
|
maintaining a binary history file).
|
|
|
|
You can override the entire default by specifying a new file name. You
|
|
can also override any field in the default response by specifying only
|
|
the field that you want to override.
|
|
|
|
For example, if you were to type:
|
|
|
|
Report to (DSK:SUMMAR.SYS): CPU<cr>
|
|
|
|
the output file specification would become DSK:CPU.SYS
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
@@3.2.1.C1.
|
|
Finally, the confirmation prompt:
|
|
|
|
Type <cr> to confirm (/GO):
|
|
|
|
provides an opportunity for you to review and change any responses
|
|
entered up to that point. If you want to review the response list
|
|
type /SHOW. If you are satisfied with the response list press the
|
|
RETURN key or type /GO.
|
|
|
|
If you want to change a response, press the backspace key until you
|
|
arrive at the corresponding prompt, make the change, and then type /GO.
|
|
@@3.2.1.D1.
|
|
That concludes the explanation of the Retrieve dialog. Next on the menu
|
|
is a set of questions about the Retrieve dialog.
|
|
|
|
@@RETRIEVE CODES
|
|
Generally speaking, the TOPS-10, TOPS-20 and VAX/VMS operating systems
|
|
handle errors in a similar manner. That is, when an error occurs they
|
|
snapshot pertinent hardware and software status (at error). Then, if
|
|
applicable, an error retry algorithm is applied. Next, regardless of
|
|
whether or not the retry algorithm was successful, a second snapshot
|
|
is taken (at end). Finally, the captured status is put into a record,
|
|
assigned a code, and appended to the system event file.
|
|
|
|
The operating systems differ, however, in the way that they snapshot
|
|
the status, implement the retry algorithms, and assign codes to the
|
|
error or event record.
|
|
@@3.M.
|
|
Spear Library
|
|
|
|
Topic Menu:
|
|
|
|
1. Introduction
|
|
|
|
2. Retrieve
|
|
|
|
3. Compute
|
|
|
|
4. Summarize
|
|
|
|
5. Applications
|
|
|
|
6. Klerr
|
|
|
|
@@R.T.3.2.1.D1.
|
|
STOP - You are moving in a reverse direction thru the menu. You are
|
|
about to back into the Retrieve Dialog explanation.
|
|
@@ret_dia_q1
|
|
Retrieve Dialog - Q1 of 10
|
|
|
|
True or False - Retrieve can be used to translate and/or save the
|
|
records listed in the packets that are generated by Analyze?
|
|
@@ret_dia_q1_at
|
|
That's Correct.
|
|
|
|
This feature allows you to translate and/or save the individual records
|
|
that were used as evidence to support specific theories.
|
|
@@ret_dia_q1_af
|
|
The statement is TRUE. Retrieve can translate the Packets generated by
|
|
Analyze.
|
|
|
|
Remember, there is a packet associated with each theory listed in the
|
|
Analyze report. The Packet contains pointers that identify the records
|
|
that were used as evidence to support the theory.
|
|
|
|
Thus, anytime you question the validity of a theory and want to examine
|
|
the evidence yourself, you can do so by specifying the Packet file as
|
|
input to Retrieve. When Retrieve prompts for the packet number, enter
|
|
the number that corresponds to the theory that you are investigating and
|
|
then, specify the desired output mode (Short, Full, or Octal).
|
|
@@ret_dia_q2
|
|
Retrieve Dialog - Q2 of 10
|
|
|
|
True or False - Retrieve can be used to generate and maintain device
|
|
history files?
|
|
@@ret_dia_q2_at
|
|
That's correct.
|
|
|
|
You can use Retrieve to build and maintain history files for:
|
|
|
|
a) entire subsystems (disks, tapes, networks, etc.),
|
|
|
|
b) logical devices (DP220, MT300, CPU0, etc.),
|
|
|
|
c) physical option types (RP06s, TU45s etc.) or,
|
|
|
|
d) disk and tape storage media (Pack or Reel IDs).
|
|
|
|
@@ret_dia_q2_af
|
|
The statement is TRUE. Retrieve can be used to build and maintain device
|
|
history files. The procedure is relatively simple. Here's what to do:
|
|
|
|
First, select the device via the "Error class" prompt. Next, specify the
|
|
time frame. Then, when Retrieve prompts for Output mode, specify BINARY.
|
|
|
|
Retrieve will ask you if you want to merge the selected entries with an
|
|
existing binary history file. If you are building a new history file
|
|
press the RETURN key or type: NONE. If, however,a history already exists
|
|
for the selected device and you just want to combine the entries, then
|
|
specify the name of the history file in response to the "Merge" prompt.
|
|
|
|
Finally, Retrieve will prompt for the output file name. Again, if you
|
|
are building a new history file, then specify a unique file name. If,
|
|
however, you are updating an exiting history file, then specify the
|
|
name of the history file you are updating. In most cases it would be
|
|
the same file that you specified in response to the "Merge with" prompt.
|
|
@@ret_dia_q3
|
|
Retrieve Dialog - Q3 of 10
|
|
|
|
True or False - If, in response to the "Type <cr> to confirm (GO):"
|
|
prompt, you type "/DISPLAY" - Retrieve will display the current list
|
|
of responses?
|
|
@@ret_dia_q3_at
|
|
The statement is FALSE. The switch is called "/SHOW". If after making
|
|
your selections, you type "/SHOW", Retrieve will display each prompt
|
|
and the corresponding response as illustrated in the following example.
|
|
|
|
Type [cr] to confirm (/GO): /SHOW
|
|
|
|
RETRIEVE mode
|
|
-------------
|
|
Event or packet file: SYSTEM:ERROR.SYS
|
|
Output to: DSK:RETRIE.TXT
|
|
Merge with: NONE
|
|
Time from: EARLIEST
|
|
Time to: LATEST
|
|
Selection to be: INCLUDED
|
|
Output mode: ASCII
|
|
Report format: SHORT
|
|
Selection type: ERROR
|
|
Error class: DISK, TAPE,
|
|
Disk drives: DP120, DP230,
|
|
Tape drives: MT300,
|
|
|
|
Type [cr] to confirm (/GO):
|
|
@@ret_dia_q3_af
|
|
That's correct.
|
|
|
|
It's the "/SHOW" switch that will cause the current list or responses
|
|
to be displayed.
|
|
|
|
If, after reviewing the list, you decide that you want to change a
|
|
response, you can press the BACKSPACE key (or type /REVERSE) until you
|
|
get back to the response that you want to change. At that point you can
|
|
add to the response, or you can type /CLEAR and enter a new response.
|
|
@@ret_dia_q4
|
|
Retrieve Dialog - Q4 of 10
|
|
|
|
True or False - Retrieve can be used to select entries that pertain
|
|
to specific Disk Packs or Magtape Reel ID's?
|
|
@@ret_dia_q4_at
|
|
That's correct.
|
|
|
|
Pack and Reel ID's were added to the selection criteria so that you could
|
|
use the EXCLUDE mode to remove entries from the event file that pertain
|
|
to known bad media. Thus, you can clean up the file a bit, resubmit it
|
|
to Analyze, and see if media problems were covering up other more subtle
|
|
hardware problems.
|
|
@@ret_dia_q4_af
|
|
The statement is TRUE. If you type "?" at the "Error class" prompt you
|
|
will see that PACKID and REELID are among the selection criteria
|
|
available.
|
|
@@ret_dia_q5
|
|
Retrieve Dialog - Q5 of 10
|
|
|
|
True or False - If you specify a file name in response to the
|
|
"Merge with (NONE):" prompt, Retrieve will automatically append
|
|
the selected entries to that file?
|
|
@@ret_dia_q5_at
|
|
The statement is FALSE. Retrieve will NOT automatically append the
|
|
selected entries to the file that you specify in response to the
|
|
"Merge with" prompt.
|
|
|
|
Instead what happens is: the selected entries and the entries in the
|
|
"merge file" are combined and written out to the file that you specify
|
|
in response to the "Output to" prompt.
|
|
|
|
Selected Entries Merge with "file name"
|
|
| |
|
|
|_____________________|
|
|
|
|
|
Output "file name"
|
|
|
|
If at the "Output to" prompt, however, you specify the same file name
|
|
that you specified at the "Merge with" prompt, then the entries will be
|
|
combined and written in that file. Incidentlly, that is the recommended
|
|
method for maintaining device history files.
|
|
@@ret_dia_q5_af
|
|
That's correct.
|
|
|
|
Retrieve will NOT change the "merge" file in any way unless you direct
|
|
it to do so by specifying the same file name at the "Output to" prompt.
|
|
@@ret_dia_q6
|
|
Retrieve Dialog - Q6 of 10
|
|
|
|
True or False - Sequence numbers are used to identify the relative
|
|
position of the records in a system event file?
|
|
@@ret_dia_q6_at
|
|
That is correct.
|
|
|
|
Record sequence numbers are included as part the header in all Short
|
|
Full, and Octal reports translated by Retrieve. The sequence number is
|
|
the simplest way to refer to a specific record.
|
|
|
|
As long as the order of the records in the file are not disturbed, the
|
|
sequence numbers will remain valid. Thus, if you request a Short ASCII
|
|
translation of several records and then decide that you want a Full
|
|
translation of one or two of those records, you can do so by specifying
|
|
the sequence numbers to Retrieve.
|
|
@@ret_dia_q6_af
|
|
The statement is TRUE. Sequence numbers reflect each records relative
|
|
position in a file.
|
|
|
|
Remember, sequence numbers are dynamically assigned to each record as a
|
|
file is read. For example, if a file contains 623 records then, the
|
|
first record in the file will be assigned sequence number 1, the second
|
|
record will be assigned sequence number 2, etc. Finally, the last record
|
|
in the file will be assigned sequence number 623.
|
|
@@ret_dia_q7
|
|
Retrieve Dialog - Q7 of 10
|
|
|
|
True or False - Retrieve can be used to select entries based on the
|
|
event codes assigned to the entries by the operating system.
|
|
@@ret_dia_q7_at
|
|
That's correct.
|
|
|
|
If, for example, you wanted to select all KS10 Halt Status Block entries
|
|
you could: reference the Spear Manual or look on the back of the Spear
|
|
Reference Card to get the code number, specify "CODE" at the "Selection
|
|
type" prompt, and, when Retrieve prompted for "Event code:" enter 033
|
|
for TOPS-10 or 133 for TOPS-20.
|
|
@@ret_dia_q7_af
|
|
The statement is TRUE.
|
|
|
|
If you type "?" in response to the "Selection type" prompt, you will see
|
|
"CODE" listed as one of the acceptable responses. If you select "Code"
|
|
Retrieve will prompt you for the "Event codes".
|
|
|
|
The event types and the corresponding event codes are listed on the back
|
|
panel of the Spear Reference Card. In addition, the detailed information
|
|
contained in of each entry types is described in the Spear Manual.
|
|
@@ret_dia_q8
|
|
Retrieve Dialog - Q8 of 10
|
|
|
|
True or False - Typing /C in response to the "Next Category (FINISHED):"
|
|
prompt will clear all entries selected up to that point.
|
|
@@ret_dia_q8_at
|
|
That's correct.
|
|
|
|
Keep in mind, however, that in addition to clearing selected entries,
|
|
the /Clear switch will also reset the prompt response to the default.
|
|
|
|
In other words, suppose you type /SHOW before starting Retrieve. Then,
|
|
let's say that you decide that you don't want the selected magtape
|
|
entries after all, so you press the BACKSPACE key until you get back to
|
|
the "Error class" prompt. At that point you specify "Tape", Retrieve
|
|
prompts for tape drives and you type /CLEAR.
|
|
|
|
You might think that you are no longer selecting any Magtape entries.
|
|
But that is not the case. Instead, what you did was cleared the selected
|
|
list and thus, reinstated the default (ALL).
|
|
@@ret_dia_q8_af
|
|
The statement is TRUE. The /CLEAR switch provides a mechanism for
|
|
changing selected entry types.
|
|
|
|
For example, suppose you had just selected some event codes for
|
|
translation and you're about to press the RETURN key to start Retrieve
|
|
but, before doing so, you typed "/SHOW" just to double check yourself.
|
|
|
|
Now, suppose you discover that, for some reason, you entered the wrong
|
|
list of event codes. Here's what to do:
|
|
|
|
1. Press the BACKSPACE key until you get back to the "Selection type"
|
|
prompt.
|
|
|
|
2. Then, in response to the "Selection type" prompt specify "CODE".
|
|
|
|
3. When Retrieve prompts for the Event codes, type "/CLEAR" to clear the
|
|
existing list of event codes and then enter the correct list.
|
|
|
|
4. Finally, type "/SHOW" as a last check and then, if everything is OK
|
|
type "/GO" to start Retrieve.
|
|
@@ret_dia_q9
|
|
Retrieve Dialog - Q9 of 10
|
|
|
|
True or False - Entries can be retrieved by logical names (i.e. CPU0)
|
|
as well as by physical names (i.e. RP06)?
|
|
@@ret_dia_q9_at
|
|
Technically, the statement is FALSE. Retrieve can recognize some, but
|
|
not all, logical and physical names.
|
|
|
|
@@ret_dia_q9_af
|
|
That is correct.
|
|
|
|
Retrieve recognizes some, but not all, physical and logical names. Just
|
|
as a double check before running, Retrieve will list all selected names
|
|
that it considers to be logical. Thus, if you made a typing error or
|
|
entered a physical name that it does NOT recognize, you'll know because
|
|
Retrieve will list it as a logical name.
|
|
@@ret_dia_q10
|
|
Retrieve Dialog - Q10 of 10
|
|
|
|
True or False - Retrieve can be used to extract entries based on
|
|
STOPCODES or BUGxxx code names?
|
|
@@ret_dia_q10_at
|
|
That's correct.
|
|
|
|
The "Mainframe Error and Crash Summary" section of the Analyze report
|
|
breaks down STOPCODES (for TOPS-10) and BUGxxx (for TOPS-20 and VAX/VMS)
|
|
by: type, name, and number of occurances.
|
|
|
|
Thus, given the Analyze report, you can then use Retrieve to translate
|
|
or save the STOPCODE or BUGxxx entries for further investigation. This
|
|
feature is particularlly helpful when it come to saving and investigating
|
|
very intermittent system crashes.
|
|
@@ret_dia_q10_af
|
|
The statement is TRUE. Retrieve can be used to extract entries based
|
|
on STOPCODES and BUGxxx code names.
|
|
|
|
If you specify "CODE", Retrieve will prompt for "Event codes".
|
|
At that point you can enter the names of one or more STOPCODES or
|
|
BUGxxx that you want retrieved.
|
|
|
|
For example, if you typed:
|
|
|
|
Selection type (ALL): CODE<cr>
|
|
Event codes: DX2FUS,P2RAE<cr>
|
|
|
|
|
|
Retrieve will translate (or save) all entries that are related to
|
|
either of the Event codes (DX2FUS and P2RAE).
|
|
@@3.2.1.1.
|
|
That's it. There are only ten questions about the Retrieve dialog. If
|
|
you have gotten this far, then chances are you have a pretty good idea
|
|
of how to use Retrieve. Therefore, it is with great honor, that Instruct
|
|
pronounces you a "Retrieve-Dialog Subject Matter Expert".
|
|
@@3.3.0.
|
|
Compute calculates the following system performance factors:
|
|
|
|
System Availability (AS) - System Availability is the percentage of time
|
|
that the system was available for use. (It includes Standalone time.)
|
|
|
|
User Availability (UA) - User Availability is the percentage of time
|
|
that the system was available for use by the user community.
|
|
|
|
System Effectiveness (SE) - System Effectiveness (SE) is the percentage
|
|
of probability that the system remained available for a given period of
|
|
time (t).
|
|
|
|
|
|
The remainder of this introduction briefly explains the formulas used
|
|
by Compute to calculate these factors. For a more detailed explanation
|
|
of the formulas refer to the Spear Manual.
|
|
@@3.3.0.A.
|
|
The following formula is used to calculate System Availability (SA):
|
|
|
|
SA = (1.0) - CDT/(TDT + TRT)
|
|
|
|
|
|
where:
|
|
|
|
CDT = Chargeable Down Time
|
|
TDT = Total Down Time
|
|
TRT = Total Run Time
|
|
|
|
|
|
Remember - System Availability is the percentage of time that the system
|
|
was available for use. (It includes Standalone time.)
|
|
@@3.3.0.B.
|
|
The following formula is used to calculate User Availability (UA):
|
|
|
|
UA = (1.0) - CDT/(CDT + TRT)
|
|
|
|
|
|
where:
|
|
|
|
CDT = Chargeable Down Time
|
|
TRT = Total Run Time
|
|
|
|
|
|
Remember - User Availability is the percentage of time that the system
|
|
was available for use by the user community.
|
|
@@3.3.0.C.
|
|
The following formula is used to calculate System Effectiveness (SE):
|
|
|
|
SE = (SA) * (e** (-t/MTBF))
|
|
|
|
|
|
where:
|
|
|
|
SA = System Availability
|
|
e = the Napierian or natural base of logarithms (2.71828+)
|
|
t = an arbitrary period of time for which the SE factor is calculated.
|
|
Typically, Compute calculates the SE factor for four time periods:
|
|
6 minutes, 30 minutes, 1 hour, and 4 hours.
|
|
MTBF= The mean, or average time between failures (chargeable Downtimes).
|
|
e** means "e" raised to the power of (-t/MTBF).
|
|
|
|
|
|
Remember - System Effectiveness (SE) is the percentage of probability
|
|
that the system remained available for a given period of time (t).
|
|
@@R.T.3.3.0.
|
|
STOP - You are moving in a reverse direction through the menu. You are
|
|
about to back into the Introduction to Compute.
|
|
@@3.3.M.
|
|
Spear Library - Compute
|
|
|
|
Topic menu:
|
|
|
|
|
|
1. Overview
|
|
|
|
2. Compute Dialog
|
|
|
|
3. Questions & Answers
|
|
@@3.3.1.
|
|
Compute Dialog - The Compute dialog consists of seven selection prompts
|
|
and one confirmation prompt.
|
|
|
|
COMPUTE mode
|
|
------------
|
|
Event file (default):
|
|
|
|
Report period (LAST-WEEK):
|
|
|
|
Time from (EARLIEST):
|
|
|
|
Time to (LATEST):
|
|
|
|
Report type (SINGLE-REPORT):
|
|
|
|
Availability Report to ([DSK]:COMPUT.RPT):
|
|
|
|
Reload report to ([DSK]:RELOAD.RPT):
|
|
|
|
Type <cr> to confirm (/GO):
|
|
|
|
@@3.3.1.A.
|
|
The first selection prompt:
|
|
|
|
Event file (default):
|
|
|
|
allows you to specify the name of the file that contains the system
|
|
performance entries that you want Compute to use in its calculations.
|
|
|
|
The default response (SYS:AVAIL.SYS for TOPS-10, SERR:ERROR.SYS for
|
|
TOPS-20, and SYS$SYSDISK:[SYSERR]:ERRLOG.SYS for VAX/VMS) is enclosed
|
|
in parentheses, and can be selected by pressing the RETURN key. You can
|
|
override the entire default response by specifying a new file name, or
|
|
you can override any field in the default response by specifying
|
|
only the field that you want to override.
|
|
|
|
For example, if you were to type:
|
|
|
|
Event file (SERR:ERROR.SYS): .LWK<cr>
|
|
|
|
the input file specification would become SERR:ERROR.LWK
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
|
|
@@COMPUTE INPUT
|
|
|
|
|
|
.--------------. .___ Summary Report
|
|
| Calculate | |
|
|
System Event File ___| System |___|___ Availability Report
|
|
(or AVAIL.Ann) | Availability | |
|
|
|______________| |___ Reload Report
|
|
|
|
|
|
TOPS-10, TOPS-20, and VAX/VMS record entries that are used by Compute
|
|
to calculate overall system performance. Under TOPS-10 the entries are
|
|
recorded in a file called AVAIL.SYS. Under TOPS-20 and VAX/VMS the
|
|
entries are recorded in ERROR.SYS and ERRLOG.SYS respectively.
|
|
|
|
@@3.3.1.B.
|
|
The second selection prompt
|
|
|
|
Report period (LAST-WEEK):
|
|
|
|
allows you to specify the time period for which you want system
|
|
performance calculated. Compute is designed to calculate system
|
|
performance for the previous week. That is, from a week ago Sunday at
|
|
00:00:01 to last Sunday at 00:00:01. Thus, by running Compute weekly
|
|
you can monitor overall system performance and note any trends in
|
|
availability or effectiveness.
|
|
|
|
You can also direct Compute to calculate system performance for this
|
|
week or any other period of time. If you specify THIS-WEEK, then
|
|
Compute calculates system performance from last Sunday at 00:00:01 to
|
|
the present. If you specify OTHER Compute will prompt for the specific
|
|
time period.
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
|
|
@@3.3.1.C.
|
|
The third and fourth selection prompts
|
|
|
|
Time from (EARLIEST):
|
|
|
|
Time to (LATEST):
|
|
|
|
are displayed only if you specify OTHER in response to the Report Period
|
|
Prompt. The time prompts allow you to specify the specific time period
|
|
for which you want system performance calculated. You can specify the
|
|
default times (Earliest and Latest respectively), or you can specify
|
|
either real or relative time.
|
|
|
|
Both of these prompts also support standard Help and question mark (?)
|
|
responses.
|
|
|
|
@@3.3.1.D.
|
|
The fifth selection prompt
|
|
|
|
Report type (SINGLE-REPORT):
|
|
|
|
is also displayed only if you specify OTHER in response to the Report
|
|
Period Prompt. The Report Type prompt allows you to specify the type of
|
|
report that you want. You can specify the default, SINGLE-REPORT, in
|
|
which case Compute will generate a single report that reflects system
|
|
performance for the selected time period.
|
|
|
|
You can also specify MULTIPLE-REPORTS, in which case Compute will
|
|
generate (in addition to the single report) a set of weekly reports that
|
|
reflect system performance for the selected time period.
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
|
|
@@3.3.1.E.
|
|
The sixth selection prompt
|
|
|
|
Availability Report to ([DSK]:COMPUT.RPT):
|
|
|
|
allows you to specify the destination of the 132 column Availability
|
|
Report. The default destination (DSK:COMPUT.RPT for TOPS-10/TOPS-20,
|
|
and COMPUT.RPT for VAX/VMS) is enclosed in parentheses and can be
|
|
selected by pressing the RETURN key. Compute automatically outputs
|
|
a 72 column Summary Report to your terminal.
|
|
|
|
You can replace the entire default destination by specifying a new file
|
|
name, or you can replace any field in the default by specifying only the
|
|
field that you want to override. For example, if you were to type:
|
|
|
|
Availability Report to (DSK:COMPUT.RPT): FS:
|
|
|
|
the output file specification would become FS:COMPUT.RPT
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
|
|
@@compute output
|
|
|
|
Compute generates two reports; a 72 column Summary Report, and a 132
|
|
column Availability Report. The Summary Report is automatically output
|
|
to your terminal. At this prompt Compute is waiting for you to specify
|
|
a destination for the Availability Report. You can:
|
|
|
|
1. Press the RETURN key to select the default file specification:
|
|
DSK:COMPUT.RPT.
|
|
|
|
2. Enter a unique file specification (e.g., DSK:WK21.RPT). The file
|
|
specification format is: dev:<user>filename.filetype.version.
|
|
|
|
|
|
If you specified multiple reports, then Compute will generate a set of
|
|
weekly reports in addition to COMPUT.RPT. The reports will be named
|
|
Cmmdd.RPT. Where mmdd corresponds to the month and day of each week.
|
|
@@3.3.1.F.
|
|
The last selection Prompt
|
|
|
|
Reload report to ([DSK]:RELOAD.RPT):
|
|
|
|
allows you to specify the destination of the Reload Log Report.
|
|
The Reload Report uses 132 columns and lists the system name, the
|
|
operating system version, the number of times the system was reloaded,
|
|
and the operator's response to the question "Why Reload?"
|
|
You can select the default response (DSK:RELOAD.RPT for TOPS-10/TOPS-20,
|
|
and RELOAD.RPT for VAX/VMS) by pressing the RETURN key.
|
|
|
|
You can replace the entire default destination by specifying a new file
|
|
name, or you can replace any field in the default response by specifying
|
|
only the field that you want to replace. For example, if you typed:
|
|
|
|
Reload report to (DSK:RELOAD.RPT): .LWK<cr>
|
|
|
|
the output file specification would become DSK:RELOAD.LWK
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
|
|
@@3.3.1.G.
|
|
The confirmation prompt:
|
|
|
|
Type <cr> to confirm (/GO):
|
|
|
|
provides an opportunity for you to review and change any responses
|
|
entered up to that point. If you want to review the response list
|
|
type /SHOW. If you are satisfied with the response list press the
|
|
RETURN key or type /GO.
|
|
|
|
If you want to change a response, press the BACKSPACE key until you
|
|
arrive at the corresponding prompt, make the change, and then type /GO.
|
|
@@com_dia_q1
|
|
Compute Dialog Q1 of 5
|
|
|
|
True or False - The formulas used by Compute to calculate: System
|
|
Availability (SA), User Availability (UA), and System Effectiveness (SE)
|
|
are described in the Spear Manual?
|
|
@@com_dia_q1_at
|
|
That's correct.
|
|
|
|
The formulas:
|
|
|
|
SA = (1.0) - CDT/(TDT + TRT)
|
|
|
|
UA = (1.0) - CDT/(CDT + TRT)
|
|
|
|
SE = (SA) * (e** (-t/MTBF))
|
|
|
|
are also briefly explained in the Introduction section of this module.
|
|
@@com_dia_q1_af
|
|
The statement is TRUE. The formulas used by Compute to calculate
|
|
system availability, user availability and system effectiveness are
|
|
described in the Spear Manual.
|
|
|
|
|
|
You should become familiar with those formulas before you attempt to
|
|
interpret the reports generated by Compute.
|
|
@@com_dia_q2
|
|
Compute Dialog Q2 of 5
|
|
|
|
True or False - The entries used by Compute to calculate system
|
|
performance are recorded in the system event file: ERROR.SYS for
|
|
TOPS-10 and TOPS20, and ERRLOG.SYS for VAX/VMS?
|
|
@@com_dia_q2_at
|
|
The statement is FALSE. Under TOPS-20 and VAX/VMS the entries are
|
|
recorded in the system event files. However, under TOPS-10 the entries
|
|
are recorded in a file called AVAIL.SYS.
|
|
@@com_dia_q2_af
|
|
That's correct.
|
|
|
|
TOPS-10 records the entries in a file called AVAIL.SYS. As a general
|
|
rule, most TOPS-10 sites rename the AVAIL.SYS file to AVAIL.Ann weekly.
|
|
(Where nn is a number in the range of 01 to 99.) Typically, the first
|
|
AVAIL.SYS file becomes AVAIL.A01, the second AVAIL.A02, etc.
|
|
|
|
Thus, if the latest AVAIL.Ann file was AVAIL.A25, and you wanted Compute
|
|
to calculate system performance for the last four weeks, then you would
|
|
specify AVAIL.A22 as the input file.
|
|
@@com_dia_q3
|
|
Compute Dialog Q3 of 5
|
|
|
|
True or False - Compute generates two types of reports; a 72 column
|
|
Summary Report that highlights overall system performance, and a 132
|
|
column Full Report that provides more detail?
|
|
@@com_dia_q3_at
|
|
That's correct.
|
|
|
|
The Summary report is automatically displayed on your terminal. It will
|
|
provide a picture of overall performance. The Full report backs up the
|
|
Summary report with specific details.
|
|
|
|
Note: The Full report requires 132 columns and is generally not suited
|
|
for display on most terminals.
|
|
@@com_dia_q3_af
|
|
The statement is TRUE. Compute generates two types of reports; a Summary
|
|
Report that highlights overall system performance, and a Full Report
|
|
that details system availability and effectiveness.
|
|
|
|
The Summary report is automatically output to your terminal when you run
|
|
Compute. The following example illustrates a typical Summary report:
|
|
|
|
|
|
Compute Summary Report From: 7-Jun-81 01:00 To: 14-Jun-81 01:00
|
|
period length (HRS): 168.000
|
|
|
|
SYSTEM Availability % : 100.000
|
|
USER Availability % : 100.000
|
|
|
|
|
|
Effectiveness Six minutes Thirty minutes One Hour Four Hours
|
|
factor 99.584 97.938 95.918 94.648
|
|
|
|
Report file name: DSK:COMPUT.RPT
|
|
|
|
Note: The Effectiveness Factor is the probability that a six minute, a
|
|
thirty minute, a one hour and a four hour job will run to completion.
|
|
@@com_dia_q4
|
|
Compute Dialog Q4 of 5
|
|
|
|
True or False - Compute uses the operators response to the question:
|
|
"Why Reload" to determine User Availability?
|
|
Downtime?
|
|
@@com_dia_q4_at
|
|
The statement is FALSE. The operators response to the question: "Why
|
|
Reload" is used by Compute to distinguish between Chargeable Downtime
|
|
and Non-chargeable Downtime.
|
|
@@com_dia_q4_af
|
|
That's correct.
|
|
|
|
The operators response to the question: "Why Reload" is to distinguish
|
|
between Chargeable Downtime and Non-chargeable Downtime. The following
|
|
operator responses constitute:
|
|
|
|
Chargeable Downtime - STOPCD, BUGHLT, HALT, PARITY, HARDWARE, NXM, HUNG,
|
|
LOOP, AND CM (Corrective Maintenance).
|
|
|
|
Non-chargeable Downtime - PM (Preventive Maintenance), OPERATOR, POWER,
|
|
STATIC, NEW, SCHEDULED, STANDALONE, and OTHER.
|
|
@@com_dia_q5
|
|
Compute Dialog Q5 of 5
|
|
|
|
True or False - In addition to the Summary Report and the Full Report,
|
|
Compute also generates a Reload Report called COMPUT.RLD?
|
|
@@com_dia_q5_at
|
|
The statement is true, only in that Compute generates a Reload Report.
|
|
The report is actually called RELOAD.RPT not COMPUT.RLD.
|
|
@@com_dia_q5_af
|
|
That's correct.
|
|
|
|
The name of the report is: RELOAD.RPT. The following example illustrates
|
|
the type of information it contains.
|
|
|
|
|
|
SYSTEM 2116 THE BIG ORANGE, TOPS-20 MONITOR 4(3530)
|
|
Built on: 28-May-81 11:41:11 Version: 400,,3530
|
|
Loaded on: 10-Jun-81 20:20:45 Crashed on: 14-Jun-81 07:00:16
|
|
Reloaded on: 14-Jun-81 07:25:08 Why reload: OTHER
|
|
Run time: 6.004 Down time: 0.414
|
|
|
|
SYSTEM 2116 THE BIG ORANGE, TOPS-20 MONITOR 4(3530)
|
|
Built on: 28-May-81 11:41:11 Version: 400,,3530
|
|
Loaded on: 14-Jun-81 07:25:10 Crashed on: 15-Jun-81 08:38:20
|
|
Reloaded on: 15-Jun-81 08:38:20 Why reload: OTHER
|
|
Run time: 25.219 Down time: 0.000
|
|
|
|
|
|
The Reload Report and the Full Report, are intended to help you complete
|
|
system Crash and Uptime reports.
|
|
@@3.3.1.1.
|
|
That's the last question about the Compute dialog. Press the RETURN key
|
|
to return to the menu.
|
|
@@3.3.2.
|
|
The Compute Report questions were not ready in time for this Field Test
|
|
Version of Spear. Press the BACKSPACE key or type MENU to return to the
|
|
Compute menu.
|
|
@@com_rpt_q1
|
|
Compute Report - Q1 of 5
|
|
|
|
True or False -
|
|
|
|
@@com_rpt_q1_at
|
|
That's correct.
|
|
|
|
@@com_rpt_q1_af
|
|
The statement is TRUE.
|
|
|
|
@@com_rpt_q2
|
|
Compute Report - Q2 of 5
|
|
|
|
True or False -
|
|
|
|
@@com_rpt_q2_at
|
|
The statement is FALSE.
|
|
|
|
@@com_rpt_q2_af
|
|
That's correct.
|
|
|
|
@@com_rpt_q3
|
|
Compute Report - Q3 of 5
|
|
|
|
True or False -
|
|
|
|
@@com_rpt_q3_at
|
|
The statement is FALSE.
|
|
|
|
@@com_rpt_q3_af
|
|
That's correct.
|
|
|
|
@@com_rpt_q4
|
|
Compute Report - Q4 of 5
|
|
|
|
True or False -
|
|
|
|
@@com_rpt_q4_at
|
|
That's correct.
|
|
|
|
@@com_rpt_q4_af
|
|
The statement is TRUE.
|
|
|
|
@@com_rpt_q5
|
|
Compute Report - Q5 of 5
|
|
|
|
True or False -
|
|
|
|
@@com_rpt_q5_at
|
|
The statement is FALSE.
|
|
|
|
@@com_rpt_q5_af
|
|
That's correct.
|
|
|
|
@@3.3.2.1.
|
|
That's it. There are only five questions about the Compute Report.
|
|
|
|
Press the RETURN key to return to the Compute menu.
|
|
@@3.4.0.
|
|
Summarize Overview - Summarize is designed to read and summarize the
|
|
contents of system event files.
|
|
|
|
The purpose of this Instruct module is to ensure that you understand
|
|
the dialog and the report associated with the Summarize function. The
|
|
module consists of two parts. Part one briefly explains the Summarize
|
|
dialog and then asks some questions to ensure that there are no mis-
|
|
understandings.
|
|
|
|
Part two of this module briefly explains the format and organization of
|
|
the Summarize Report. (You will be asked to generate or obtain a typical
|
|
Summarize Report.) The remainder of the module consists of a series of
|
|
questions about the report. Again, the purpose of the questions is to
|
|
ensure that there are no misunderstandings about the general format and
|
|
content of the report.
|
|
|
|
Objective - Upon completion of this module you should have no difficulty
|
|
using the Summarize dialog or understanding the format, organization and
|
|
content of a typical Summarize report.
|
|
@@R.T.3.4.0.
|
|
STOP - You are moving in a reverse direction through the menu. You are
|
|
about to back into the Summarize Overview.
|
|
@@3.4.M.
|
|
Spear Library - Summarize
|
|
|
|
Topic menu:
|
|
|
|
|
|
1. Overview
|
|
|
|
2. Summarize Dialog
|
|
|
|
3. Summarize Dialog Questions & Answers
|
|
|
|
4. Summarize Report
|
|
|
|
5. Summarize Report Questions & Answers
|
|
@@3.4.1.
|
|
Summarize Dialog - The Summarize dialog consists of six selection prompts
|
|
and one confirmation prompt.
|
|
|
|
SUMMARIZE mode
|
|
--------------
|
|
|
|
Event file (default):
|
|
|
|
Category (ALL):
|
|
|
|
Time from (EARLIEST):
|
|
|
|
Time to (LATEST):
|
|
|
|
Show Error Distribution(YES):
|
|
|
|
Report to ([DSK]:SUMMAR.RPT):
|
|
|
|
Type <cr> to confirm (/GO):
|
|
|
|
@@3.4.1.A.
|
|
The first selection prompt:
|
|
|
|
Event file (default):
|
|
|
|
allows you to specify the name of the system event file that you want
|
|
summarized. The default response (SYS:ERROR.SYS for TOPS-10,
|
|
SERR:ERROR.SYS for TOPS-20, and SYS$ERRORLOG:ERRLOG.SYS for VAX/VMS)
|
|
is enclosed in parentheses and can be selected by pressing the RETURN key.
|
|
You can override the entire default response by specifying a new file
|
|
name. You can also override any field in the default response by
|
|
specifying only the field that you want to override.
|
|
|
|
For example, if you were to type:
|
|
|
|
Event file (SERR:ERROR.SYS): .LWK<cr>
|
|
|
|
the input file specification would become SERR:ERROR.LWK
|
|
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
|
|
@@3.4.1.AA.
|
|
After you have specified the source of input, SUMMARIZE prompts you
|
|
for the category.
|
|
|
|
Category(ALL):
|
|
|
|
ALL
|
|
MAINFRAME
|
|
DISK
|
|
TAPE
|
|
CI
|
|
NI
|
|
UNITRECORD
|
|
NETWORK
|
|
OPERATING-SYSTEM
|
|
COMM
|
|
PACKID
|
|
REELID
|
|
HELP
|
|
|
|
@@3.4.1.AB.
|
|
|
|
ALL (or the RETURN key) - indicates that you want to select
|
|
all errors. (This is the default).
|
|
|
|
MAINFRAME - indicates that you want to select errors occurring in
|
|
specific mainframe components.
|
|
|
|
DISK - indicates that you want to select errors occurring on disk
|
|
units. After selecting DISK, you can specify ALL the specific
|
|
disks by name (DPA3, RPB7), or by disk type (RP06, RM05).
|
|
|
|
TAPE - indicates that you want to select errors occurring on tape
|
|
units. After selecting TAPE, you can specifiy ALL, or specify the
|
|
tape names or types in question.
|
|
|
|
CI - indicates that you want to select CI-related errors. After
|
|
selecting CI, you can specify ALL, or the specific component of
|
|
interest.
|
|
|
|
NI - indicates that you want to select NI-related errors.
|
|
|
|
@@3.4.1.AC.
|
|
|
|
UDA - indicates that you want to select UDA-related errors.
|
|
After selecting UDA, you can specify ALL, or the specific
|
|
component of interest.
|
|
|
|
UNITRECORD - indicates that you want to select errors occurring
|
|
on unit-record devices such as card readers and line printers.
|
|
After selecting UNITRECORD, you can specify ALL, or type the
|
|
specific device names or types in question.
|
|
|
|
OPERATING-SYSTEM - indicates that you want to select operating
|
|
system codes. After selecting OPERATING-SYSTEM, you can specify
|
|
ALL, or type the name of a specific STOPCODE or BUG type.
|
|
|
|
COMM - indicates that you want to select errors occurring on
|
|
communication devices.
|
|
|
|
@@3.4.1.AD.
|
|
|
|
PACKID - indicates that you want to select specific disk packs.
|
|
After typing PACKID, you can type ALL, or type the specific pack
|
|
identifiers.
|
|
|
|
REELID - indicates that you want to select specific tape reels.
|
|
After typing REELID, you can type ALL, or the specific tape
|
|
identifiers.
|
|
|
|
HELP - indicates that you want to get detailed information
|
|
on the above categories.
|
|
|
|
All categories except for COMM and NI prompt further for specific
|
|
device types. Type ? at the subprompt level to get a list of
|
|
acceptable responses.
|
|
|
|
@@3.4.1.AE.
|
|
|
|
|
|
SUMMARIZE keeps prompting you for categories until you either type
|
|
FINISHED, or press the RETURN key.
|
|
|
|
Next Category (FINISHED):
|
|
|
|
Type one of the following:
|
|
|
|
The RETURN key, or FINISHED to take the default,
|
|
|
|
or,
|
|
|
|
another category.
|
|
|
|
@@3.4.1.B.
|
|
The third selection prompt:
|
|
|
|
Time from (EARLIEST):
|
|
|
|
allows you to specify the time at which to begin summarizing the system
|
|
event file. The default response (EARLIEST) is enclosed in parentheses
|
|
and can be selected by pressing the RETURN key. You can also specify
|
|
real and relative time.
|
|
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
|
|
@@3.4.1.C.
|
|
The fourth selection prompt:
|
|
|
|
Time to (LATEST):
|
|
|
|
allows you to specify the time at which to end summarizing the system
|
|
event file. The default response (LATEST) is enclosed in parentheses
|
|
and can be selected by pressing the RETURN key. Again, you can also
|
|
specify real and relative time.
|
|
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
|
|
@@3.4.1.DA.
|
|
|
|
The fifth selection prompt:
|
|
|
|
Show Error Distribution (YES):
|
|
|
|
allows you to specify whether or not you want to receive error
|
|
distribution tables. The default response (YES) is enclosed in
|
|
parentheses and can be selected by pressing the RETURN key.
|
|
If you type NO, you will suppress the error distribution tables
|
|
from the report.
|
|
|
|
@@3.4.1.D.
|
|
The sixth selection prompt:
|
|
|
|
Report to ([DSK]:SUMMAR.RPT):
|
|
|
|
allows you to specify the name of the output or Report file. The default
|
|
response (DSK:SUMMAR.RPT for TOPS-10/TOPS-20, and SUMMAR.RPT for VAX/VMS)
|
|
is enclosed in parentheses and can be selected by pressing the RETURN key.
|
|
You can override the entire default response by specifying a new file name.
|
|
You can also override any field in the default response by specifying only
|
|
the field that you want to override.
|
|
|
|
For example, if you were to type:
|
|
|
|
Report to (DSK:SUMMAR.RPT): FS:<cr>
|
|
|
|
the output file specification would become FS:SUMMAR.RPT
|
|
|
|
|
|
The prompt also supports standard Help and question mark (?) responses.
|
|
|
|
@@3.4.1.E.
|
|
Finally, the confirmation prompt:
|
|
|
|
Type <cr> to confirm (/GO):
|
|
|
|
provides an opportunity for you to review and change any responses
|
|
entered up to that point. If you want to review the response list
|
|
type /SHOW. If you are satisfied with the response list press the
|
|
RETURN key or type /GO.
|
|
|
|
If you want to change a response, press the backspace key until you
|
|
arrive at the corresponding prompt, make the change, and then type /GO.
|
|
|
|
@@3.4.1.F.
|
|
That concludes the explanation of the Summarize dialog. Next on the
|
|
menu is a set of questions about the Summarize dialog.
|
|
|
|
@@sum_dia_q1
|
|
Summarize Dialog - Q1 of 7
|
|
|
|
True or False - If you do NOT want to change any of the Summarize
|
|
default responses, you can type /GO at the Event file prompt?
|
|
@@sum_dia_q1_at
|
|
That is correct.
|
|
|
|
All Spear Library functions begin by setting the response list to the
|
|
default values. You can change the responses or type /GO at any time.
|
|
The function will use the responses that you have specified up to
|
|
that point and default the rest. If you make no changes the default
|
|
response list is used.
|
|
@@sum_dia_q1_af
|
|
The statement is TRUE. When you first enter a Spear library dialog, the
|
|
response list is set to the default values. Thus, if you type /GO at the
|
|
Event file prompt Summarize will begin execution using the defaults. The
|
|
result will be report that summarizes the contents of the entire event
|
|
file.
|
|
@@sum_dia_q2
|
|
Summarize Dialog - Q2 of 7
|
|
|
|
True or False - If you type HELP in response to any Summarize prompt,
|
|
a ONE page message explaining the prompt and the acceptable response
|
|
to that prompt will be displayed?
|
|
@@sum_dia_q2_at
|
|
That is correct.
|
|
|
|
All Spear Library prompts support the HELP and (?) command. The Help
|
|
messages are limited to one page, and the prompt is repeated immediately
|
|
following the message. Typing (?) will result in a list of acceptable
|
|
responses without explanation.
|
|
@@sum_dia_q2_af
|
|
The statement is TRUE. You can type HELP<cr> any time you are not sure
|
|
how you should respond to a particular prompt. You will receive a one
|
|
page HELP message that explains the prompt and the acceptable responses
|
|
to that prompt.
|
|
@@sum_dia_q3
|
|
Summarize Dialog - Q3 of 7
|
|
|
|
True or False - Summarize will accept and summarize the contents of any
|
|
binary event file, including a binary event file generated by Retrieve?
|
|
@@sum_dia_q3_at
|
|
That is correct.
|
|
|
|
Summarize will accept (as input) any file that conforms to the standard
|
|
binary event file format. Currently, that includes event files generated
|
|
by: TOPS-10, TOPS-20, VAX/VMS, or Retrieve.
|
|
|
|
There is one restriction, however, the event file must have been
|
|
generated by the same type of system that you are using to summarize
|
|
the file. In other words, the TOPS-10 version of Spear can NOT be used
|
|
to process event files generated by TOPS-20 etc.
|
|
@@sum_dia_q3_af
|
|
The statement is TRUE. Retrieve does not change the file format when it
|
|
generates a binary (or History) file. Therefore, since Summarize is
|
|
designed to handle standard binary event files, it will accept binary
|
|
event files generated by Retrieve.
|
|
@@sum_dia_q4
|
|
Summarize Dialog - Q4 of 7
|
|
|
|
True or False - In order to take the default response at a Summarize
|
|
prompt you must press the ESCAPE key before pressing the RETURN key?
|
|
@@sum_dia_q4_at
|
|
The statement is FALSE. You don't have to press ESCAPE/RETURN to take
|
|
the default response. You need only press the RETURN key.
|
|
|
|
Originally, the purpose of the ESCAPE key was to display the default
|
|
response. However, as a result of feedback during product Field Test,
|
|
the prompts were changed. They now display the default responses in
|
|
parentheses. Thus, the original purpose of the ESCAPE was nullified.
|
|
@@sum_dia_q4_af
|
|
That is correct.
|
|
|
|
Since the default response is enclosed in parentheses, there is no need
|
|
to use the ESCAPE key.
|
|
@@sum_dia_q5
|
|
Summarize Dialog - Q5 of 7
|
|
|
|
True or False - Summarize will accept and summarize Packet Files
|
|
generated by Analyze?
|
|
@@sum_dia_q5_at
|
|
The statement is FALSE. A Packet file is not a standard binary event
|
|
file. It is a special file produced by Analyze that contains pointers
|
|
that identify the records that were used as evidence to support the
|
|
theories listed in the corresponding Analyze Report file.
|
|
@@sum_dia_q5_af
|
|
That is correct.
|
|
|
|
Summarize only accepts standard binary event files. Since a Packet file
|
|
is not a standard binary event file, Summarize will not accept it.
|
|
@@sum_dia_q6
|
|
Summarize Dialog - Q6 of 7
|
|
|
|
True or False - If you want to change the name of the report file from
|
|
DSK:SUMMAR.RPT to DSK:TEST.RPT, you need only type TEST at the Report
|
|
prompt?
|
|
@@sum_dia_q6_at
|
|
That is correct.
|
|
|
|
You can substitute fields at any Spear file specification prompt. For
|
|
example, if you wanted the report to go to FS: and you wanted to call
|
|
it SUMMAR.LWK, you could type:
|
|
|
|
Report to(DSK:SUMMAR.RPT): FS:.LWK<cr>
|
|
@@sum_dia_q6_af
|
|
The statement is TRUE. All Spear Library file-name prompts accept field
|
|
substitution. You can substitute the output device, the file name, the
|
|
file extension, or any combination thereof.
|
|
@@sum_dia_q7
|
|
Summarize Dialog - Q7 of 7
|
|
|
|
True or False - Both the "Time from" and the "Time to" prompt accept
|
|
real and relative time?
|
|
@@sum_dia_q7_at
|
|
|
|
That is correct.
|
|
|
|
All Spear Library "Time" prompts accept both real and relative time
|
|
specifications.
|
|
@@sum_dia_q7_af
|
|
The statement is TRUE. All Spear Library "Time" prompts accept both real
|
|
and relative time specifications. The format for real time is:
|
|
|
|
dd-mmm-yy hh:mm:ss where dd is the numerical day, mmm is the first
|
|
three letters of the month, yy is the last two digits of the year,
|
|
and hh:mm:ss represent the hour, minute, and second respectively.
|
|
|
|
The format for relative time is:
|
|
|
|
-dd where dd represents some number of past days. The time defaults
|
|
to 00:00:01.
|
|
@@3.4.1.1.
|
|
That's it. If you have gotten this far, then chances are you have a good
|
|
handle on the Summarize dialog. Next on the menu is a brief explanation
|
|
of the Summarize Report format.
|
|
@@3.4.2.
|
|
Summarize Report - The Summarize Report consists of four major sections:
|
|
|
|
1. A File Environment and Entry Occurrence Count section.
|
|
|
|
2. A Monitor Detected Error and Reload section.
|
|
|
|
3. A Front-end, Channel and Device Summary section.
|
|
|
|
4. A Channel and Device Breakdown section.
|
|
|
|
|
|
This part of Instruct involves a series of questions. The questions are
|
|
designed to ensure that you understand the format and general content of
|
|
a typical Summarize Report.
|
|
@@3.4.2.A.
|
|
Before proceeding further, you should have a copy of a Summarize Report.
|
|
You can type /BREAK and generate one using the Spear Library or, you can
|
|
use the one in the Spear Manual.
|
|
|
|
When you are ready to proceed press the RETURN key.
|
|
@@sum_rpt_q1
|
|
Summarize Report - Q1 of 8
|
|
|
|
True or False - If you are running on a TOPS-20 System, the "Monitor
|
|
Detected Errors and Reloads" section of the Summarize Report identifies
|
|
the number of BUGHLT, BUGCHK, and BUGINF that occurred during the summary
|
|
period?
|
|
@@sum_rpt_q1_at
|
|
That is correct.
|
|
|
|
The BUGHLTs, BUGCHKs, and BUGINFs described in the TOPS-20 Software
|
|
Notebooks (Volume 16).
|
|
@@sum_rpt_q1_af
|
|
The statement is TRUE. You would have no way of knowing this, however,
|
|
if, during the summary period that you selected, there were no BUGHLTs,
|
|
BUGCHKs, or BUGINFs recorded. Summarize does not print this section of
|
|
the report unless there were BUGxxx events recorded during the summary
|
|
period.
|
|
@@sum_rpt_q2
|
|
Summarize Report - Q2 of 8
|
|
|
|
True or False - The "File Environment" section of the Summarize Report
|
|
always lists the total number and type of entries recorded in the system
|
|
event file that was submitted as input?
|
|
@@sum_rpt_q2_at
|
|
The statement is FALSE. The Summarize Report only lists the entries that
|
|
were recorded during the period of time being summarized. Although that
|
|
period of time could, it does not always reflect the entire event file.
|
|
@@sum_rpt_q2_af
|
|
That is correct.
|
|
|
|
Only the events that occurred on or between the time the user specified,
|
|
at the "Time from" and the "Time to" prompts, are summarized.
|
|
@@sum_rpt_q3
|
|
Summarize Report - Q3 of 8
|
|
|
|
True or False - Under the "File Environment" section of the Summarize
|
|
Report, the term "inconsistencies" refers to the number of unknown event
|
|
types that were found in the summarized period of the event file?
|
|
@@sum_rpt_q3_at
|
|
The statement is FALSE.
|
|
|
|
The term "inconsistencies" means that Spear encountered a nonrecoverable
|
|
read error while reading the event file. In such cases it loses sync and
|
|
must use the resynchronization word in the next data block to recover.
|
|
For further information about the resync process refer to the DEFINE.LIS
|
|
file and the Spear Manual.
|
|
@@sum_rpt_q3_af
|
|
That is correct.
|
|
|
|
The term "inconsistencies" refers to the number of times Summarize lost
|
|
sync reading the event file and had to use the resynchronization word
|
|
in the next data block to recover.
|
|
@@sum_rpt_q4
|
|
Summarize Report - Q4 of 8
|
|
|
|
True or False - The "Entry Occurrence Counts" section of the Summarize
|
|
Report lists the event code and the number of times each event type
|
|
appeared in the summarized period of the system event file?
|
|
@@sum_rpt_q4_at
|
|
That is correct.
|
|
|
|
The entry types are catalogued by entry code and described in Appendix B
|
|
of the Spear Manual. Sometime, when you get a chance, you should take a
|
|
look at Appendix B. It lists, in detail, the information recorded for
|
|
each entry type in the system event file.
|
|
@@sum_rpt_q4_af
|
|
The statement is TRUE. If you take a look at the report you'll see a
|
|
decimal number, followed by name, followed by a number in parentheses.
|
|
|
|
The decimal number indicates the number of times a particular entry type
|
|
appeared in the file; the name refers to the entry type; and the number
|
|
in parentheses refers to the code assigned to the entry type by the
|
|
system software developers.
|
|
@@sum_rpt_q5
|
|
Summarize Report - Q5 of 8
|
|
|
|
True or False - Under the "RP04/RP05/RP06 Breakdown" section of the
|
|
Summarize Report only the contents of Error Register 1 are listed?
|
|
@@sum_rpt_q5_at
|
|
The statement is FALSE. If there are any error bits set in Error
|
|
Register 2 they will be listed also. However, if none of the disk
|
|
error summarized had a bit set in Error Register 2 then, of course,
|
|
the contents of Error Register 2 would not be listed. If that's the
|
|
case, then you're correct.
|
|
|
|
@@sum_rpt_q5_af
|
|
That is correct.
|
|
|
|
Summarize does not try to hide information. However, because the report
|
|
was designed so that it could be displayed on a terminal (i.e., 72
|
|
columns), the contents of Error Register 2 are listed below the contents
|
|
of Error Register 1. The purpose of the question was to point that out
|
|
because, at a glance, you might think that Error Register 2 was part of
|
|
a different summary.
|
|
@@sum_rpt_q6
|
|
Summarize Report - Q6 of 8
|
|
|
|
True or False - For the most part the Summarize Report is easy to read
|
|
and understand?
|
|
@@sum_rpt_q6_at
|
|
We're glad that you're satisfied. However, if have any suggestion or
|
|
ideas that will improve the format or content of the report please use
|
|
the FEEDBACK feature on the Main Course Menu to let us know.
|
|
@@sum_rpt_q6_af
|
|
OK. Changing the report format is a relatively easy task. If you would
|
|
take the time to let us know how the report could be improved we'll do
|
|
our best to make the changes in the next release. You will find our
|
|
address listed under FEEDBACK on the Main Course Menu.
|
|
@@sum_rpt_q7
|
|
Summarize Report - Q7 of 8
|
|
|
|
True or False - In a Summarize Report, asterisks (***) will be printed
|
|
if a number exceeds the maximum digits for a field?
|
|
@@sum_rpt_q7_at
|
|
|
|
That is correct.
|
|
|
|
Each asterisk represents one digit of the total spaces set aside for
|
|
a numeric value (that includes the decimal point, if the number is
|
|
decimal). In other words, if three spaces were set aside for a value
|
|
(say 99.), then three asterisks (***) will be printed should the value
|
|
exceed 99.
|
|
@@sum_rpt_q7_af
|
|
|
|
The statement is TRUE. The number of digits that can be printed is
|
|
limited to the space available in the report (i.e., 72 columns). Thus,
|
|
there is always a possibility that the number of digits necessary to
|
|
report a count will exceed the available space. When such a case occurs
|
|
a string of asterisks (***) will be printed.
|
|
|
|
@@sum_rpt_q8
|
|
Summarize Report - Q8 of 8
|
|
|
|
True or False - The following Summarize report indicates that DP160
|
|
experienced 5 errors: 2 Hard Errors and 3 Soft Errors?
|
|
|
|
RP04/RP05/RP06 Breakdown:
|
|
|
|
Error Register 1
|
|
|
|
D U O D W I A H H E W F P R I I
|
|
C N P T L A O C C C C E A M L L
|
|
K S I E E E E R E H F R R R R F
|
|
C
|
|
S/N 1957
|
|
DP160 H 1. 1.
|
|
S 3.
|
|
@@sum_rpt_q8_at
|
|
The statement is FALSE. You cannot determine how many Hard and Soft
|
|
error a device experienced by looking at the Breakdown section because;
|
|
(and this is important to remember) the Breakdown section indicates the
|
|
number of times the error bit was set when Hard errors occurred, and the
|
|
number of times the error bit was set when Soft errors occurred.
|
|
|
|
The following RP04/RP05/RP06 Summary taken from the same Summarize
|
|
report that the Breakdown was taken from bears this out. It indicates
|
|
that DP160 experienced a total of 4 errors; 1 Hard and 3 Soft.
|
|
|
|
RP04/5/6 Summary:
|
|
|
|
Hard Soft
|
|
S/N 1957
|
|
DP160 1. 3.
|
|
|
|
|
|
The point is; don't be tricked into thinking that the system had more
|
|
errors than it actually had. When you want to know the total number of
|
|
errors experienced by a Channel or a Device go by the Summary NOT the
|
|
Breakdown.
|
|
@@sum_rpt_q8_af
|
|
That is correct.
|
|
|
|
The Breakdown reflects the number of times each bit was set during Hard
|
|
and Soft errors. If you want to know the total number of Hard and Soft
|
|
for a given Channel or Device refer to the Summaries.
|
|
@@3.4.2.1.
|
|
Well, that's it. You have just completed the Summarize Report section
|
|
of Instruct. Assuming that you have also completed the Dialog section,
|
|
you should feel that you are a qualified Summarize user.
|
|
|
|
If for some reason you do not agree, or again, if you have any ideas or
|
|
suggestions that will make either Instruct or Summarize a better product
|
|
please let us know. You will find our mailing address listed under
|
|
FEEDBACK on the Course Menu.
|
|
|
|
Press the RETURN key to return to the Spear Library Menu.
|
|
@@SUMMARIZE INPUT
|
|
|
|
System Event File ___. .----------.
|
|
! ! Event !
|
|
!___! File !____ Summary Report
|
|
Retrieve ! ! Summary !
|
|
(binary) File ___! !__________!
|
|
|
|
INPUT PROCESS OUTPUT
|
|
|
|
Summarize reads the specified event file, summarizes its contents and
|
|
produces a report file. The contents are summarized by: event code,
|
|
STOPCODE or BUGxxx code types, front-end reloads, channel errors, disk
|
|
errors and magtape errors.
|
|
|
|
@@sum_dia_qx
|
|
Summarize Dialog - Qx of x
|
|
|
|
True or False -
|
|
|
|
@@sum_dia_qx_at
|
|
That is correct.
|
|
The statement is FALSE
|
|
|
|
@@sum_dia_qx_af
|
|
That is correct.
|
|
The statement is TRUE
|
|
|
|
@@sum_rpt_qx
|
|
Summarize Report - Qx of x
|
|
|
|
True or False -
|
|
|
|
@@sum_rpt_qx_at
|
|
That is correct.
|
|
The statement is FALSE
|
|
|
|
@@sum_rpt_qx_af
|
|
That is correct.
|
|
The statement is TRUE
|
|
|
|
@@3.5.1.
|
|
Spear Library Applications
|
|
|
|
The Spear Library can be used in conjunction with either the Systematic
|
|
Substitution Troubleshooting Approach, or the Formal Troubleshooting
|
|
Approach to isolate the cause of intermittent failures.
|
|
@@3.5.1.A.
|
|
The first thing you want to do is ensure that Summarize is run on a daily
|
|
basis. The best way to do this is to run it via a daily Batch job. If
|
|
you not sure how to do that you can ask an experienced operator to give
|
|
you a hand, or, if your on a TOPS-20 system, you can try the using this
|
|
Batch Control File
|
|
|
|
|
|
@SUBMIT SPEAR /TIME:30 /AFTER:TODAY ! Resubmit SPEAR again tomorrow.
|
|
@RENAME *.RPT *.RPO ! Rename yesterdays report file.
|
|
@SPEAR ! Run SPEAR.
|
|
*SUMMARIZE /GO ! Summarize yesterday's errors.
|
|
*EXIT ! Then leave.
|
|
@IF (ERROR) ! Continue even if there's an error.
|
|
@PRINT *.RPT /NOTE:"SPEAR - F-S" ! Print two copies of the report:
|
|
@PRINT *.RPT /NOTE:"SPEAR - OPER" ! one for FS and one for Operations.
|
|
@@3.5.1.B.
|
|
Or, if your on a TOPS-10 system you can try using this Control File.
|
|
|
|
|
|
.SUBMIT SPEAR /TIME:30 /AFTER:23:59 ! Resubmit Spear again tomorrow.
|
|
.R SPEAR ! Run Spear.
|
|
*SUMMARIZE /GO ! SUMMARIZE yesterday's errors.
|
|
*EXIT ! Then leave.
|
|
.IF (ERROR) ! Continue even if there's an error.
|
|
.PRINT *.RPT /NOTE:"SPEAR - SITE" ! Print two copies of the report:
|
|
.PRINT *.RPT /NOTE:"SPEAR - F-S" ! one for FS and one for the Site.
|
|
.RENAME *.RPD = *.RPT ! Rename today's report so that it
|
|
! won't be printed again tomorrow.
|
|
@@3.5.1.C.
|
|
Once you have the Batch File running you can use the daily reports to
|
|
monitor the over all performance of the system. If the error rate for
|
|
a particular device or subsystem starts to go up, you will see it
|
|
reflected in the various summaries and histograms.
|
|
@@3.5.1.D.
|
|
Next, a few hours before you get the system for routine maintenance
|
|
submit the last seven days or so of the event file for summarization. Allow
|
|
yourself about an hour to look over the report and decide on a fault
|
|
isolation strategy. For example; suppose the report indicates
|
|
that, among other things:
|
|
|
|
DP140 reported 5 recoverable Index Errors while PS1: was mounted.
|
|
|
|
|
|
Since intermittent Index Errors are generally caused by either a faulty
|
|
Servo Track or a faulty Index Module; during the maintenance period you
|
|
could swap the Index module in DP140 with the Index module in another
|
|
drive (let's say DP220).
|
|
|
|
Then, when you return the system to operations you could ask that they
|
|
move PS1: to a different drive (perhaps DP110). The rest is a matter of
|
|
"wait and watch". You do the waiting and you use summarize (on a daily or
|
|
weekly basis) to do the watching.
|
|
@@3.5.1.E.
|
|
1. If the problem moves to DP220, then you know that the Index module
|
|
was the cause of the failure.
|
|
|
|
2. If the problem moves to DP110, then you know that the medium (PS1:)
|
|
was the cause of the failure.
|
|
|
|
3. If the problem does not move, then you know that cause was not the
|
|
Index module nor was it the medium. So, the next chance you get, put
|
|
everything back the way it was and try something else. Sooner or
|
|
later the report is bound to reflect the fact that you have
|
|
identified and either moved or eliminated the cause of the problem.
|
|
@@3.5.1.F.
|
|
When used in this manner SPEAR becomes a very powerful troubleshooting
|
|
tool. The principal is simple. If you move a faulty component from one
|
|
piece of equipment to another the error symptoms will move with it. If
|
|
they don't, then at least you know what the problem is not.
|
|
|
|
This particular isolation technique was developed, during product load
|
|
test, by the South Massachusetts Field Service Office. If you can come
|
|
up with any neat ways of using the Spear Library to simplify system
|
|
maintenance please let us know. We'd be glad to try and include it in
|
|
this Application Section. You'll find our address listed under Feedback
|
|
on the main course menu.
|
|
@@3.6.0
|
|
|
|
The KLERR function provides expanded reporting of the KL10 function
|
|
reads supplied by the Front-End on a monitor crash.
|
|
|
|
SPEAR can be used to generate detailed reports of and/or summaries of
|
|
KLERR data blocks. You can always get a summary, but you must select
|
|
one of three formats if you want a detailed report of each event.
|
|
@@3.6.1
|
|
|
|
The following summary options will be available:
|
|
|
|
o ALL -- This will result in a complete listing containing the
|
|
number of times each signal was true and false.
|
|
|
|
o ERRORS-ONLY -- This will result in a single-page list
|
|
containing the number of times an error signal was true and
|
|
the number of times it was false.
|
|
|
|
o NONE -- This will result in no summary at all.
|
|
|
|
@@3.6.2
|
|
|
|
The following report format options will be available:
|
|
|
|
o SUMMARY-ONLY -- This will result in no entry-by-entry output.
|
|
Only the final summary of signals will be printed.
|
|
|
|
o FULL -- The result will be a set of detailed reports that
|
|
list all of the registers and signals (true or false) as well
|
|
as fields.
|
|
|
|
o TRUE-SIGNALS -- The result will be a set of detailed reports
|
|
that list all of the registers but only the "true" signals
|
|
and not the fields.
|
|
|
|
o CRAM-BAD-WORD -- The result will be a set of reports,
|
|
consisting of one line for each record which included a CRAM
|
|
parity error. This line will report the CRAM location and
|
|
contents.
|
|
|
|
@@3.6.3
|
|
|
|
The following output formats will be available for the CRAM word:
|
|
|
|
o MICROCODE -- This format is used to compare the bad cram word
|
|
with the microcode listing.
|
|
|
|
o OCTAL -- This format matches the one shown in the KL10
|
|
Maintenance Handbook and can help isolate the failing cram
|
|
module.
|
|
|
|
o TRACON -- Used to compare with "TRACON" snapshots.
|
|
|
|
@@KLERR END
|
|
This concludes the KLERR section of the course. We hope you found it
|
|
useful. Also, if you have any comments about this section please get
|
|
in touch with us. Our address is found under FEEDBACK on the main
|
|
course menu.
|
|
@@4.0.
|
|
The Guaranteed Uptime Program is a service that allows you and DIGITAL to
|
|
work together to select and maintain the highest level of reliability for
|
|
your system.
|
|
|
|
Together you and DIGITAL determine the percentage of Uptime your site
|
|
requires, from 96% to 99%. Uptime is defined as any time the
|
|
system is NOT down - with downtime defined as:
|
|
|
|
(1) that time within the hours of contract coverage when the system is
|
|
turned over to DIGITAL for corrective maintenance due to operating
|
|
system malfunction resulting in a system crash and failure to restart.
|
|
|
|
(2) Failure of DIGITAL-supplied hardware which in your opinion makes the
|
|
system unavailable for use.
|
|
|
|
@@4.0.A.
|
|
The NOTIFY program and the SPEAR function COMPUTE are the two programs
|
|
that provide the tools to monitor the operation of the system and
|
|
calculate the statistics needed to measure uptime. NOTIFY is the program
|
|
that allows you to keep the current contract coverage in a file known as
|
|
the contract file.
|
|
|
|
The NOTIFY program also allows you to keep an outage log that contains the
|
|
date and time you report the system inoperable and the date and time you
|
|
accept the system back from DIGITAL as being fixed.
|
|
|
|
When you run NOTIFY, you input two types of information:
|
|
|
|
(1) The date and time you notified DIGITAL that the system was
|
|
down and the date and time DIGITAL returned the repaired
|
|
system to you
|
|
|
|
(2) The number of hours a day that you have DIGITAL maintenance
|
|
coverage. The NOTIFY program then creates a binary file in
|
|
your area called NOTIFY.SYS.
|
|
|
|
This is the file COMPUTE uses to produce the system uptime statistics.
|
|
|
|
@@4.0.B.
|
|
The NOTIFY program contains three modes:
|
|
|
|
DISPLAY
|
|
PURGE
|
|
UPDATE
|
|
|
|
The DISPLAY mode allows you to translate NOTIFY.SYS into ASCII so you can
|
|
display all or part of the outage log or contract file.
|
|
|
|
The PURGE mode allows you to delete a portion of the data base in NOTIFY.SYS,
|
|
either from the contract file or from the outage log.
|
|
|
|
The UPDATE mode allows you to write log entries or to insert or modify
|
|
contract coverage into NOTIFY.SYS.
|
|
@@4.0.C.
|
|
To collect the data needed to measure uptime, do the following:
|
|
|
|
1. Run NOTIFY to establish a contract file containing the number
|
|
of hours you have DIGITAL coverage for corrective maintenance.
|
|
|
|
2. When you determine that the system is inoperable, call DIGITAL
|
|
to report the system-down condition and turn your system over
|
|
to DIGITAL for service.
|
|
|
|
3. When the system is returned to you, run NOTIFY from the same
|
|
directory containing the contract file to log:
|
|
a) reported time (the date and time you notified DIGITAL).
|
|
b) accepted time (the date and time DIGITAL returned the
|
|
system to you).
|
|
|
|
4. After collecting 13 weeks of data run COMPUTE from the same
|
|
directory that you have been running NOTIFY.
|
|
|
|
@@4.0.D.
|
|
To run the NOTIFY program, type one of the following:
|
|
|
|
$ RUN SYS$SYSTEM:NOTIFY<cr> on VAX/VMS,
|
|
|
|
@NOTIFY<cr> on TOPS-20,
|
|
|
|
.R NOTIFY<cr> on TOPS-10.
|
|
|
|
NOTIFY responds with the following prompt:
|
|
|
|
NOTIFY>
|
|
|
|
At this point, as well as after any other prompt, you can type ? or HELP
|
|
to get detailed information on both the prompt and on acceptable responses.
|
|
|
|
Type DISPLAY if you want to check the outage log or if
|
|
you want to check the contract.
|
|
|
|
Type UPDATE if you want to enter or revise contract coverage, or if you want
|
|
to report an outage.
|
|
|
|
Type PURGE if you want to delete entries from either the contract file or
|
|
from the outage log.
|
|
@@4.0.E.
|
|
The NOTIFY program and the SPEAR function COMPUTE look for the NOTIFY.SYS
|
|
file in your default directory. If more than one person will be using
|
|
NOTIFY and COMPUTE, you may want to agree on where the NOTIFY.SYS file will
|
|
reside. Or you may want to change the location of NOTIFY.SYS.
|
|
|
|
To change the location of NOTIFY.SYS, use a text editor to modify the file
|
|
called NOTIFY.SPE. You can modify the file specification for NOTIFY.SYS to
|
|
specify a specific device and directory, or you can even change the name of
|
|
the file itself. Both NOTIFY and COMPUTE will use this file specification.
|
|
|
|
For a more detailed explanation of the NOTIFY program refer to the
|
|
GUIDE TO MEASURING UPTIME document.
|
|
@@GUP END
|
|
This concludes the Guaranteed Uptime Program/NOTIFY section of the course.
|
|
We hope you found it useful. Also, if you have any comments about this
|
|
section please get in touch with us. Our address is found under FEEDBACK on
|
|
the main course menu.
|
|
@@
|
|
@@rec_alg
|
|
Recovery Algorithms - Most operating systems have some sort of algorithm
|
|
or procedure for error recovery. This section of Instruct explains the
|
|
algorithms used by TOPS-10 and TOPS-20 to recover from disk read errors.
|
|
@@R.T.rec_alg
|
|
STOP - You are moving in a reverse direction through the course. You are
|
|
about to back into the Introduction to the Recovery Algorithms.
|
|
@@rec_menu
|
|
Disk Read Error Recovery Algorithms
|
|
|
|
Topic Menu
|
|
|
|
0. Introduction
|
|
|
|
1. TOPS-10 Disk Recovery Algorithm
|
|
|
|
2. TOPS-20 Disk Recovery Algorithm
|
|
|
|
@@t10_dsk_rec_alg
|
|
TOPS-10 RP04/05/06 Disk Read Error Recovery Algorithm
|
|
|
|
TOPS-10 and TOPS-20 use a similar algorithm to recover from disk read
|
|
data errors. The algorithm involves 31 retry attempts. Under TOPS-10,
|
|
if an ECC correctable error is detected during a read header or data
|
|
operation the following occurs:
|
|
|
|
1. The transfer is terminated.
|
|
|
|
2. The software reconstructs the data using the calculated ECC value.
|
|
|
|
3. The transfer is restarted beginning at the next sector (i.e., the
|
|
sector following the sector in error).
|
|
|
|
|
|
If the read data error is not ECC correctable, however, the following
|
|
recovery algorithm is evoked.
|
|
@@t10_dsk_rec_alg_a
|
|
1. Non (ECC) recoverable read error
|
|
2. Repeat read operation (attempt ECC correction)
|
|
3. Repeat read operation (attempt ECC correction)
|
|
4. Repeat read operation (attempt ECC correction)
|
|
5. Repeat read operation (attempt ECC correction)
|
|
6. Repeat read operation (attempt ECC correction)
|
|
7. Repeat read operation (attempt ECC correction)
|
|
8. Repeat read operation (attempt ECC correction)
|
|
9. Repeat read operation (attempt ECC correction)
|
|
10. Repeat read operation (attempt ECC correction)
|
|
11. Repeat read operation (attempt ECC correction)
|
|
12. Repeat read operation (attempt ECC correction)
|
|
13. Repeat read operation (attempt ECC correction)
|
|
14. Repeat read operation (attempt ECC correction)
|
|
15. Repeat read operation (attempt ECC correction)
|
|
16. Repeat read operation (attempt ECC correction)
|
|
17. Repeat read operation (attempt ECC correction)
|
|
|
|
Next Offset is tried.
|
|
@@t10_dsk_rec_alg_b
|
|
Offset heads (+400 microinches if RP04/05, +200 if RP06).
|
|
18. Repeat read operation (attempt ECC correction).
|
|
19. Repeat read operation (attempt ECC correction).
|
|
|
|
Offset heads (-400 microinches if RP04/05, -200 if RP06).
|
|
20. Repeat read operation (attempt ECC correction).
|
|
21. Repeat read operation (attempt ECC correction).
|
|
|
|
Offset heads (+800 microinches if RP04/05, +400 if RP06).
|
|
22. Repeat read operation (attempt ECC correction).
|
|
23. Repeat read operation (attempt ECC correction).
|
|
|
|
Offset heads (-800 microinches if RP04/05, -400 if RP06).
|
|
24. Repeat read operation (attempt ECC correction).
|
|
25. Repeat read operation (attempt ECC correction).
|
|
@@t10_dsk_rec_alg_c
|
|
Offset heads (+1200 microinches if RP04/05, +600 if RP06).
|
|
26. Repeat read operation (attempt ECC correction).
|
|
27. Repeat read operation (attempt ECC correction).
|
|
|
|
Offset heads (-1200 microinches if RP04/05, -600 if RP06).
|
|
28. Repeat read operation (attempt ECC correction).
|
|
29. Repeat read operation (attempt ECC correction).
|
|
|
|
Return to center line.
|
|
Set Error Correction Inhibit (ECC INHIBIT = 1)
|
|
30. Repeat read operation.
|
|
|
|
Reset Error Correction Inhibit (ECC INIBIT = 0)
|
|
31. Repeat read operation (attempt ECC correction).
|
|
|
|
|
|
If all 31 retries are unsuccessful, then the read error is defined as
|
|
non-recoverable (Hard) and an entry is made in the structures BAT block.
|
|
@@t20_dsk_rec_alg
|
|
TOPS-20 RP04/5/6 Disk Read Error Retry Algorithm
|
|
|
|
|
|
TOPS-10 and TOPS-20 use a similar algorithm to recover from disk read
|
|
data errors. The algorithm involves 31 retry attempts.
|
|
|
|
If any of the retry attempts are successful, then the error is defined
|
|
as Soft (recoverable) and the system continues in a normal manner. If,
|
|
however, all 31 retries are unsuccessful, then the error is defined as
|
|
Hard (non recoverable) and the system takes the appropriate action.
|
|
@@t20_dsk_rec_alg_a
|
|
The following details each of the 31 steps in the disk read error
|
|
retry algorithm. Assume that a read operation was initiated and
|
|
a read data error (DCK) was detected.
|
|
|
|
|
|
The first three retries do not attempt ECC correction.
|
|
|
|
1. Repeat read operation. Do not attempt ECC correction.
|
|
2. Repeat read operation. Do not attempt ECC correction.
|
|
3. Repeat read operation. Do not attempt ECC correction.
|
|
@@t20_dsk_rec_alg_b
|
|
The next 13 retries will attempt ECC correction if ECC Hard = zero (0).
|
|
|
|
4. Repeat read operation. Attempt ECC correction.
|
|
5. Repeat read operation. Attempt ECC correction.
|
|
6. Repeat read operation. Attempt ECC correction.
|
|
7. Repeat read operation. Attempt ECC correction.
|
|
8. Repeat read operation. Attempt ECC correction.
|
|
9. Repeat read operation. Attempt ECC correction.
|
|
10. Repeat read operation. Attempt ECC correction.
|
|
11. Repeat read operation. Attempt ECC correction.
|
|
12. Repeat read operation. Attempt ECC correction.
|
|
13. Repeat read operation. Attempt ECC correction.
|
|
14. Repeat read operation. Attempt ECC correction.
|
|
15. Repeat read operation. Attempt ECC correction.
|
|
16. Repeat read operation. Attempt ECC correction.
|
|
@@t20_dsk_rec_alg_c
|
|
The next 12 retries attempt offset and ECC correction.
|
|
The first offset value listed is used for RP04s and RP05s.
|
|
The second offset value listed is used for RP06s.
|
|
|
|
|
|
17. Offset (+400/+200). Repeat read operation. Attempt ECC correction.
|
|
18. Repeat read operation at this offset. Attempt ECC correction.
|
|
|
|
19. Offset (-400/-200). Repeat read operation. Attempt ECC correction.
|
|
20. Repeat read operation at this offset. Attempt ECC correction.
|
|
|
|
21. Offset (+800/+400). Repeat read operation. Attempt ECC correction.
|
|
22. Repeat read operation at this offset. Attempt ECC correction.
|
|
|
|
23. Offset (-800/-400). Repeat read operation. Attempt ECC correction.
|
|
24. Repeat read operation at this offset. Attempt ECC correction.
|
|
|
|
25. Offset (+1200/+600). Repeat read operation. Attempt ECC correction.
|
|
26. Repeat read operation at this offset. Attempt ECC correction.
|
|
|
|
27. Offset (-1200/-600). Repeat read operation. Attempt ECC correction.
|
|
28. Repeat read operation at this offset. Attempt ECC correction.
|
|
@@t20_dsk_rec_alg_d
|
|
The final three retries are a last ditch effort to get the data.
|
|
|
|
29. Return to centerline. Repeat read operation. Attempt ECC correction.
|
|
30. Set Error Correction Inhibit. Repeat read operation.
|
|
31. Set Error Correction Inhibit. Repeat read operation.
|
|
|
|
If all 31 retries are unsuccessful, then the read error is defined as
|
|
non-recoverable (Hard) and an entry is made in the structures BAT block.
|
|
|
|
@@dialog_change
|
|
|
|
The following dialog changes must be made to all SPEAR version 1.x
|
|
command and control files in order for them to operate under SPEAR
|
|
version 2.0. Although the examples use TOPS-20 style commands, the
|
|
changes apply to the TOPS-10 and VMS versions of SPEAR version 2.0
|
|
as well.
|
|
|
|
|
|
Retrieve:
|
|
|
|
The only changes in the Retrieve dialog from version 1.x to version
|
|
2.0 are in the "Selection type" "Error" and "NonError" areas.
|
|
However, there are no changes for a "Selection type" of "Error" "All".
|
|
|
|
@@dialog_change_a
|
|
The following example illustrates changes in Retrieve
|
|
"Selection type" "Error".
|
|
|
|
|
|
|
|
SPEAR v1.x SPEAR V2.0 Comments
|
|
__________ __________ ________
|
|
|
|
*Error *Error Selection type
|
|
*Disk *Disk Device category
|
|
*RP06 *RP06 Specific device(s)
|
|
*All Device error type
|
|
*Finished *Finished End device selection
|
|
|
|
To retrieve the events for a specific device error type, replace
|
|
"*All" in the version 2.0 dialog above with one or more device
|
|
error types. For example, *Software, Bus, Channel-controller.
|
|
|
|
@@dialog_change_b
|
|
The following example illustrates changes in Retrieve
|
|
"Selection type" "NonError".
|
|
|
|
|
|
SPEAR v1.x SPEAR v2.0 Comments
|
|
__________ __________ ________
|
|
|
|
*NonError *Stat, Diag, Config, Other Device Category
|
|
*All Device Selection
|
|
|
|
|
|
|
|
To retrieve the events for a specific device or class of device,
|
|
replace the "*All" in the version 2.0 dialog above with one of
|
|
the following command sequences:
|
|
|
|
*Disk *Disk Device category
|
|
*All *RA60, RA80, RA81 Specific device(s)
|
|
*Finished *Finished End device selection
|
|
|
|
@@dialog_change_c
|
|
|
|
The same functionality in Summarize may be maintained by changing the
|
|
version 1.x dialog to the version 2.0 dialog below.
|
|
|
|
SPEAR v1.x SPEAR v2.0 Comments
|
|
__________ ___________ ________
|
|
|
|
@SPEAR @SPEAR Run SPEAR
|
|
*Summarize *Summarize Invoke Summarize
|
|
*SERR:ERROR.SYS *SERR:ERROR.SYS Event file
|
|
*All Device category
|
|
*Earliest *Earliest Time from
|
|
*Latest *Latest Time to
|
|
*Yes Error distribution
|
|
*DSK:SUMMAR.RPT *DSK:SUMMAR.RPT Report to
|
|
*/Go */Go Start processing
|
|
|
|
@@dialog_change_d
|
|
To get summaries for a specific device or class of device,
|
|
replace the "*All" in the version 2.0 dialog above with
|
|
one of the following command sequences:
|
|
|
|
*Disk *Disk Device category
|
|
*All *RA60, RA80, RA81 Specific device(s)
|
|
*Finished *Finished End device selection
|
|
|
|
|
|
To suppress the Error Distributions, change the "*Yes" to
|
|
"*No" in the version 2.0 dialog above.
|
|
|
|
@@dialog_change_e
|
|
|
|
|
|
There are no dialog changes in Compute.
|
|
|
|
@@
|
|
|