1. On the Test Chapter 17: Using a Structured Troubleshooting Strategy

advertisement
Chapter 17: Using a Structured Troubleshooting Strategy
Chapter Outline
1. On the Test
4.9: Given a network problem scenario, select an appropriate course of action based on a general
troubleshooting strategy. This strategy includes the following steps:
1. Establish the symptoms
2. Identify the affected area
3. Establish what has changed
4. Select the most probable cause
5. Implement a solution
6. Test the result
7. Recognize the potential effects of the solution
8. Document the solution
2. Understanding Troubleshooting
a. Troubleshooting is the process of taking a large, complex problem and, through the use of various
techniques, excluding all potential elements until only the actual cause remains.
b. Once the actual cause is identified, it can then be fixed.
c. Good troubleshooting skills require a commonsense, structured approach.
3. Establish the Symptoms
a. At this stage of the process you are primarily working with symptoms of the actual problem.
b. Most symptoms manifest themselves as a failure of some type.
c. Failures may or may not be accompanied by one or more error messages.
4. Open-Ended Questions
a. Open-ended questions are designed to elicit additional information. The plan here is to engage in a
dialog with the user.
b. Examples of open-ended questions are:
i. What did you observe when the problem happened?
ii. What were you doing when the error occurred?
5. Closed-Ended Questions
a. With closed-ended questions, you are looking for a specific answer. Many times the answer will be
yes or no, or a selection or one or more options.
b. Examples of closed-ended questions are:
i. Is there an error message?
ii. What does the error message say?
iii. Is your computer plugged into the wall?
iv. Is anyone else having this problem?
v. Have any recent changes been made to the computer?
6. The Basic Questions
1
Always remember to answer the essential questions: Who, what, when, where, why, and how. Here are a few
examples.
a. Who
i. Who reported the problem?
ii. Who found the actual problem?
iii. Whom does the problem affect?
b. What
i. What are the symptoms to this particular problem?
ii. What type of device is having the problem?
iii. What was the user doing when he or she first noticed the problem?
c. When
i. When was the problem first noticed?
ii. When did the problem actually start?
d. Where
i. Where is the affected device?
ii. Where was the device when the problem occurred? (Used primarily with laptop computers).
e. Why
i. Why was this problem noticed?
ii. Why is this problem occurring?
f. How
i. How did this problem occur?
ii. Can you make this problem reoccur? If so, how?
7. Is There a Problem?
One of the questions you must always answer is, “Does the problem really exist or is the system simply
operating the way it is intended?”
8. Identify the Affected Area
a. This section is concerned with the scope of the problem.
b. A common question to ask at this point is, “How widespread is the problem?” That is to say, is the
problem limited to a single user, group, computer, server, subnet, or an entire network?
c. If one user calls the help desk to report a problem, you can usually deduce that it affects only one
user or computer.
d. On the other hand, if your phones start ringing off the hook with multiple users reporting the same
problem, you can be fairly certain that the problem is more widespread.
e. The importance here is how you will focus your troubleshooting efforts.
9. Establish What Has Changed
a. A good question to ask is if anything has changed recently. For example, has any new software been
installed on the computer?
b. A dialog with other departments within the IT department will help you determine what software
upgrades or repairs have been made to user workstations.
c. Always keep in mind that users may have made changes on their own, but have decided not to tell
you about them.
10. Could You Do This Before?
2
a. It is important to determine if an employee has been able to perform a task in the past.
b. Many times, a user will have certain expectations of a computer system based on home use or past
job experiences.
11. Recreate the Problem
It is always a good idea to see if you can recreate the problem.
12. Select the Most Probable Cause
a. Once you have asked a sufficient number of questions, you should have a fairly good idea of where
the problem is, although you may not be able to describe what the problem is.
b. The problem here is that you rarely have only one possible answer to the problem.
c. At this point, you will start using some more techniques. For example, you may need to make a trip
to the users’ computers, if possible, and run some diagnostics or utilities to eliminate some of the
possibilities.
d. Always keep an open mind and avoid jumping to conclusions when it comes to troubleshooting.
13. What Can You Eliminate?
a. By testing one thing at a time, TCP/IP, DHCP, router, and DNS, you have systematically eliminated
one problem after another, until only the actual problem remains.
b. Through the use of the process of elimination and basic troubleshooting utilities, you have been able
to quickly identify where the problem exists.
c. Now you must determine what the problem is. That is to say, you know the problem rests with DNS,
you just don’t know what the problem is with DNS.
14. Troubleshooting Tools
a. Always use all of the troubleshooting tools that are available to you.
b. Most computers provide either a utility or log files with which to view system problems. These files
log or document everything that happens at the server.
c. Looking through these log files might direct you to the problem, or at least identify for you the area
where the problem is occurring.
15. Event Viewer
The Windows Event Viewer contains three logs: Application, Security, and System.
a. Application: The application log contains events that are caused by applications or programs.
b. Security: The security log records events that relate directly to system security, such as valid and invalid
logon attempts.
c. System: The system log contains events that are reported by system components.
The log indicates three different types of entries: Information, Warning, and Error.
a. Information: Describes a successful operation. For example, when an application, driver, or service
loads successfully, an Information event will be logged.
b. Warning: An event that may be an indicator of a possible future problem. For example, if a system
begins to get low on disk space, it will issue a Warning.
3
c. Error: Indicates a significant problem. This problem may result in a loss of data or loss of a functional.
For example, if a device driver fails to load during system startup, an Error will be logged.
In order to view additional information for a specific event, double-click on the entry in the right pane.
16.
Log Files
a. Most log files are simple text files that can be viewed through a text editor such as Notepad.
b. These log files normally monitor one particular item or process and report on the successes and failures
of that item or process.
c. Each operating system contains a number of log files. The easiest way to locate them is to use the search
utility for your particular computer.
17.
Formulate a Solution
a. Once you have a good idea where the problem is, you can begin formulating a plan of attack to fix it.
b. This plan should be based on your knowledge of the way the system is supposed to operate, and any
ancillary factors that affect the operation of that particular object.
c. Always consider consulting a technical reference when troubleshooting.
d. Most major manufacturers of computer hardware and software have a Web site loaded with support
information for their products.
e. Many times, the problem has already been identified by the manufacturer and it can provide the
information or software necessary to correctly fix the problem.
f. Based on your knowledge of the systems and your understanding of the problem, you will start to
develop one or more solutions to fix the problem.
18.
Implement a Solution
Always try not to make the problem worse.
19. One Thing at a Time
a. Implement only one solution at a time. There are two reasons for this.
i. First, if you do three things and the problem is corrected, you really don’t know which of these
actions fixed the problem.
ii. Conversely, if your actions made the problem worse, you really don’t know which one of the
actions made the problem worse; therefore, you have to undo all three actions.
b. Each step should build on the previous step. For example, if restarting the server does not correct the
problem, what is the next most logical step to try that is not destructive? Continue working through these
steps until you have resolved the problem.
20. Consider the User
a. While you are working to resolve the problem, it is easy to forget about the person who is actually
experiencing the problem, the user.
b. When possible, always try to accommodate the user to the fullest extent possible.
c. In all cases, keep the user informed on your progress. If you believe that it will take two hours to fix the
problem, be sure you communicate that information to the user.
d. Be realistic with your time estimates. If you try to fudge the time required, it will reflect badly on you as
a technician and the IT department as a whole.
4
21. Test the Result
a. Testing the result means simply, did it work?
b. Keep in mind that you should implement only one potential solution at a time.
c. After you implement that one solution, try it to see if it works. If not, try another possible solution and
then test it.
22. Recognize the Potential Effects of the Solution
Will the “fix” that you developed “unfix” something else?
Occasionally, you will repair one problem only to cause another.
This is especially prevalent when working with several companies that make competing products.
Some “side-effect” problems are not as obvious
The best way to head off these problems is to consider all of the factors involved and to conduct
extensive research prior to implementing a new software solution.
f. Independent discussion groups and good technical references are a great place to start.
g. When working with discussion groups, in most cases, real users or administrators of a product moderate
these groups. Since these folks have no vested interest in the product, they tend to be more honest about
the problems that they have experienced with different manufacturers’ software.
h. A good technical reference will inform you of how all network components work together.
a.
b.
c.
d.
e.
23. Document the Solution
a. Once it is fixed, write it down. Most problems reoccur at some time.
b. Once a problem has been successfully resolved, document the problem and the solution in a format that
is available to others.
c. The documentation can be stored in some type of formal, structured facility or as simply as a notebook
of loose-leaf papers.
d. Many companies use a knowledge base to document their troubleshooting efforts.
24. Knowledge Base
a. A knowledge base is generally a computerized system that allows you to log reported problems along
with the steps a technician took to repair them.
b. A knowledge base can range from the very simple to the elaborate.
c. The simplest type of knowledge base might be a series of folders or binders containing technical notes
that are compiled and maintained by technicians.
d. A knowledge base can also be a very sophisticated database-based system that can be queried by
keyword or simply asked a question in the form of a text string
25. Frequently Asked Questions (FAQs)
Another simple knowledge base might consist of one or more Web pages made up largely of text that users
or technicians can browse for frequently asked questions (FAQs).
5
Download