The Art Of Troubleshooting
Last Updated: 27 Mar 2004
*** PLEASE NOTE: Link(s), If Provided, May Be Wrapped ***
Troubleshooting is clearly a lost art.
Whether for reasons of laziness or inexperience, a great
many people who fancy themselves technicians have never
mastered the art of troubleshooting.
A great troubleshooter is born, not made, but there are
skills than anyone can pickup which will ensure that they
ultimately resolve most of the problems they encounter.
Here are some Microsoft troubleshooting resources:
Here are some 3rd party troubleshooting resources:
I have a guide of my own. The following steps will help
guide you to the path of Troubleshooting Enlightenment.
01: ASSUME NOTHING
When you first encounter a problem, you should make
absolutely NO assumptions about what has happened and
what the likely cause of the problem is. If you make
up your mind too early, you will likely travel down
a limited path and spend far too much time unraveling
your own assumptions. Don't assume, for instance,
that your ability to ping a box indicates that the
box is functional. Also, don't get sidetracked by
the age of your equipment. A 3-month old hard drive
that worked just fine yesterday, can fail today.
02: OBTAIN INFORMATION
Try to get your hands on as much information about
when the problem first occurred. Don't settle for
generalities. Don't start concluding anything until
all the data is collected. Don't assume that any
information is useless until the next stage.
03: ELIMINATE THE IMPROBABLE
It will take you much too long to prove that something
is impossible. It is only necessary to prove that it
is highly unlikely. Sift through all of *concrete*
evidence you have and weed out events that are not
likely suspects. If you run out of suspects, then it
is likely that you didn't have enough information to
begin with. Once you have ruled something out, leave
it ruled out until you are done. If you second guess
yourself, you'll accomplish nothing.
04: USE YOUR BRAIN (BE LOGICAL)
This seems to be one of the trickiest parts of the
whole troubleshooting process. A software problem is
more likely to result in a consistent type of failure
than flaky hardware. If a problem is reproducible, it
is more likely to be software related. Hardware issues
don't always cooperate with you as far as consistency
is concerned, and RAM is the most prominent suspect
when it comes to flaky hardware.
05: EXERCISE PROPER RISK MANAGEMENT
In testing a solution, you should start with the
approach which costs the least in terms of time and
effort, and which is the easiest to recover from.
For example, Repairing the Registry is a better first
option than employing FDISK. Also, don't make too
many changes at once, otherwise you will never be
able to tell if a single change fixed your problem,
or if it was a multiple change (or, worse yet, if
it was multiple changes in a specific order). Not to
mention, if you employ multiple "solutions", you
might just mask, but not cure, your problem.
06: VERIFY YOUR FINDINGS
Once you have established a culprit for your issues,
be sure to verify your findings before claiming
victory. Lets look at the following example:
A - You are having a problem with connectivity
B - So, you change the cables
C - Then you change the NIC
At this point, you could easily conclude that the
original NIC was bad, which may not be the least
bit true. If you continue with...
D - Test the original NIC in a second machine
...you might find out that there was nothing wrong
with the NIC, but perhaps with the way it was
seated in the slot. (You should always check the
seating of cards when encountering these sorts of
07: DOCUMENT YOUR STEPS
You'd be amazed at the details you can forget about
a problem you worked on for hours if you don't take
some time to write things down. You'll be less than
amused if you find yourself encountering the same
problem a week later on a different machine, but you
can't remember how you resolved it the first time.
• Troubleshooting is a lost art because all people want
today is Instant Gratification.
• You have to be thorough when troubleshooting, or you
will come to erroneous conclusions, and take longer
to solve issues than you should.
• He who FORMATs and runs away will be forced to
Troubleshoot again, on a different day