SYSTEM ADMINISTRATION Chapter 17 Using a Structured Troubleshooting Strategy

Chapter 17
Using a Structured
Troubleshooting Strategy
Understanding Troubleshooting
• Troubleshooting is the process of taking a large,
complex problem, and, through the use of various
techniques, excluding all potential elements until
only the actual cause remains.
• Once the actual cause is identified, it can then be
• Good troubleshooting skills require a commonsense,
structured approach.
Establish the Symptoms
• At this stage of the process you are primarily
working with symptoms of the actual problem.
• Most symptoms manifest themselves as a failure of
some type.
• Failures may or may not be accompanied by one or
more error messages.
Open-Ended Questions
• Open-ended questions are designed to elicit
additional information. The plan is to engage in a
dialog with the user.
• Examples of open-ended questions are:
– What did you observe when the problem
– What were you doing when the error occurred?
Closed-Ended Questions
• With closed-ended questions, you are looking for a
specific answer. Many times the answer will be yes
or no, or a selection or one or more options.
• Examples of closed-ended questions are:
– Is there an error message?
– What does the error message say?
– Is your computer plugged into the wall?
– Is anyone else having this problem?
– Has any recent changes been made to the
The Basic Questions
• Always remember to answer the essential questions:
Who, what, when, where, why, and how. Here are a few
• Who
– Who reported the problem?
– Who found the actual problem?
– Whom does the problem affect?
• What
– What are the symptoms of this particular problem?
– What type of device is having the problem?
– What was the user doing when he or she first noticed the
The Basic Questions
• When
– When was the problem first noticed?
– When did the problem actually start?
• Where
– Where is the affected device?
– Where was the device when the problem occurred? (Used
primarily with laptop computers.)
• Why
– Why was this problem noticed?
– Why is this problem occurring?
• How
– How did this problem occur?
– Can you make this problem reoccur? If so, how?
Is There a Problem?
One of the questions you must always answer is,
“Does the problem really exist or is the system simply
operating the way it is intended?”
Identify the Affected Area
• This section is concerned with the scope of the
• A common question to ask at this point is, “How
widespread is the problem?” That is to say, is the
problem limited to a single user, group, computer,
server, subnet, or an entire network?
• If one user calls the help desk to report a problem,
you can usually deduce that it affects only one user
or computer.
Identify the Affected Area
• On the other hand, if your phones start ringing off the
hook with multiple users reporting the same problem,
you can be fairly certain that the problem is more
• The importance here is how you will focus your
troubleshooting efforts.
Establish What Has Changed
• A good question to ask is if anything has changed
recently. For example, has any new software been
installed on the computer?
• A dialog with other departments within the IT
department will help you determine what software
upgrades or repairs have been made to user
• Always keep in mind that the user may have made
changes on her or his own, but has decided not to
tell you about them.
Could You Do This Before?
• It is important to determine if an employee has been
able to perform a task in the past.
• Many times, a user will have certain expectations of
a computer system based on home use or past job
Recreate the Problem
It is always a good idea to see if you can recreate the
Select the Most Probable Cause
• Once you have asked a sufficient number of
questions, you should have a fairly good idea where
the problem is, although you may not be able to
describe what the problem is.
• The problem here is that you rarely have only one
possible answer to the problem.
• At this point, you will start using some more
techniques. For example, you may need to make a
trip to the users’ computers, if possible, and run
some diagnostics or utilities to eliminate some of the
• Always keep an open mind and avoid jumping to
conclusions when it comes to troubleshooting.
What Can You Eliminate?
• By testing one thing at a time, TCP/IP, DHCP, router,
and DNS, you have systematically eliminated one
problem after another, until only the actual problem
• Through the use of the process of elimination and
basic troubleshooting utilities, you have been able to
quickly identify where the problem exists.
• Now you must determine what the problem is. That
is to say, you know the problem rests with DNS, you
just don’t know what the problem is with DNS.
Troubleshooting Tools
• Always use all of the troubleshooting tools that are
available to you.
• Most computers provide a either a utility or log files
with which to view system problems. These files log
or document everything that happens at the server.
• Looking through these log files might direct you to
the problem, or at least identify for you the area
where the problem is occurring.
Event Viewer
• The Windows Event Viewer contains three logs:
Application, Security, and System.
– Application
• The application log contains events that are
caused by applications or programs.
– Security
• The security log records events that relate directly
to system security, such as valid and invalid logon
– System
• The system log contains events that are reported
by system components.
Event Viewer
• The log indicates three different types of entries:
Information, Warning, and Error.
– Information
• Describes a successful operation. For
example, when an application, driver, or
service loads successfully, an Information
event will be logged.
– Warning
• An event that may be an indicator of a possible
future problem. For example, if a system
begins to get low on disk space, it will issue a
Event Viewer
– Error
• Indicates a significant problem. This problem
may result in a loss of data or loss of a
function. For example, if a device driver fails to
load during system startup, an Error will be
• In order to view additional information for a specific
event, double-click on the entry in the right pane.
Log Files
• Most log files are simple text files that can be viewed
through a text editor such as Notepad.
• These log files normally monitor one particular item
or process and report on the successes and failures
of that item or process.
• Each operating system contains a number of log
files. The easiest way to locate them is to use the
search utility for your particular computer.
Formulate a Solution
• Once you have a good idea where the problem is, you
can begin formulating a plan of attack to fix it.
• This plan should be based on your knowledge of the way
the system is supposed to operate, and any ancillary
factors that affect the operation of that particular object.
• Always consider consulting a technical reference when
• Most major manufacturers of computer hardware and
software have a Web site loaded with support
information for their products.
• Many times, the problem has already been identified by
the manufacturer, who can provide the information or
software necessary to fix the problem.
• Based on your knowledge of the systems and your
understanding of the problem, you will start to develop
one or more solutions to fix the problem.
Implement a Solution
Always try not to make the problem worse.
One Thing at a Time
• Implement only one solution at a time. There are two
reasons for this:
– First, if you do three things and the problem is
corrected, you really don’t know which of these
actions fixed the problem.
– Conversely, if your actions made the problem worse,
you really don’t know which one of the actions made
the problem worse; therefore, you have to undo all
three actions.
• Each step should build on the previous step. For
example, if restarting the server does not correct the
problem, what is the next most logical step to try that
is not destructive? Continue working through these
steps until you have resolved the problem.
Consider the User
• While you are working to resolve the problem, it is
easy to forget about the person who is actually
experiencing the problem, the user.
• When possible, always try to accommodate the user
to the extent possible.
• In all cases, keep the user informed on your
progress. If you believe that it will take two hours to
fix the problem, be sure you communicate that
information to the user.
• Be realistic with your time estimates. If you try to
make up the time required, it will reflect badly on you
as a technician and the IT department as a whole.
Test the Result
• Testing the result means simply, did it work?
• Keep in mind that you should only implement one
potential solution at a time.
• After you implement that one solution, try it to see if
it works. If not, try another possible solution and
then test it.
Recognize the Potential Effects
of the Solution
• Will the “fix” that you developed “unfix” something
• Occasionally, you will repair one problem only to
cause another.
• This is especially prevalent when working with
several companies that make competing products.
• Some “side-effect” problems are not obvious.
• The best way to head off these problems is to
consider all of the factors involved and to conduct
extensive research prior to implementing a new
software solution.
Recognize the Potential Effects
of the Solution (continued)
• Independent discussion groups and good technical
references are a great place to start.
• In most cases, real users or administrators of a
product moderate these groups. Since these folks
have no vested interest in the product, they tend to
be more honest about the problems that they have
experienced with different manufacturers’ software.
• A good technical reference will inform you of how all
network components work together.
Document the Solution
• Once the problem is fixed, write down what you did.
Most problems reoccur at some time.
• Once a problem has been successfully resolved,
document the problem and the solution in a format
that is available to others.
• The documentation can be stored in some type of
formal, structured facility or as simply as a notebook
of loose-leaf papers.
• Many companies use a knowledge base to
document their troubleshooting efforts.
Knowledge Base
• A knowledge base is generally a computerized
system that allows you to log reported problems
along with the steps a technician took to repair
• A knowledge base can range from the very simple to
the elaborate.
• The simplest type of knowledge base might be a
series of folders or binders containing technical
notes that are compiled and maintained by
• A knowledge base can also be a very sophisticated
database-based system that can be queried by
keyword or simply asked a question in the form of a
text string.
Frequently Asked Questions
• Another simple knowledge base might consist of
one or more Web pages made up largely of text that
users or technicians can browse for frequently
asked questions (FAQs).