How Can We Help?

The Art Of Troubleshooting

You are here:
< Back
The Art Of Troubleshooting
Last Updated: 27 Mar 2004
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

*** PLEASE NOTE: Link(s), If Provided, May Be Wrapped ***


Troubleshooting is clearly a lost art.

Whether for reasons of laziness or inexperience, a great
many people who fancy themselves technicians have never
mastered the art of troubleshooting.

A great troubleshooter is born, not made, but there are
skills than anyone can pickup which will ensure that they
ultimately resolve most of the problems they encounter.


Here are some Microsoft troubleshooting resources:

• http://www.microsoft.com/TechNet/winnt/trouble.asp#bhttp://support.microsoft.com/default.aspx?SCID=/faqs/http://dsg.rte.microsoft.com/http://www.microsoft.com/windows2000/library/resources/reskit/samplechapters/pref/pref_tts_ygqb.asphttp://www.microsoft.com/TechNet/win2000/win2ksrv/reskit/sopch14.asp


Here are some 3rd party troubleshooting resources:

• http://eventid.net/http://labmice.techtarget.com/troubleshooting/http://labmice.techtarget.com/windowsxp/TroubleshootingXP/http://www.blizzard.u-net.com/winxphelp.htmlhttp://www.billssite.com/2000prompt.htm


I have a guide of my own. The following steps will help
guide you to the path of Troubleshooting Enlightenment.


01: ASSUME NOTHING

    When you first encounter a problem, you should make
    absolutely NO assumptions about what has happened and
    what the likely cause of the problem is. If you make
    up your mind too early, you will likely travel down
    a limited path and spend far too much time unraveling
    your own assumptions. Don't assume, for instance,
    that your ability to ping a box indicates that the
    box is functional. Also, don't get sidetracked by
    the age of your equipment. A 3-month old hard drive
    that worked just fine yesterday, can fail today.


02: OBTAIN INFORMATION

    Try to get your hands on as much information about
    when the problem first occurred. Don't settle for
    generalities. Don't start concluding anything until
    all the data is collected.  Don't assume that any
    information is useless until the next stage.


03: ELIMINATE THE IMPROBABLE

    It will take you much too long to prove that something
    is impossible. It is only necessary to prove that it
    is highly unlikely. Sift through all of *concrete*
    evidence you have and weed out events that are not
    likely suspects. If you run out of suspects, then it
    is likely that you didn't have enough information to
    begin with. Once you have ruled something out, leave
    it ruled out until you are done. If you second guess
    yourself, you'll accomplish nothing.


04: USE YOUR BRAIN (BE LOGICAL)

    This seems to be one of the trickiest parts of the
    whole troubleshooting process. A software problem is
    more likely to result in a consistent type of failure
    than flaky hardware. If a problem is reproducible, it
    is more likely to be software related. Hardware issues
    don't always cooperate with you as far as consistency
    is concerned, and RAM is the most prominent suspect
    when it comes to flaky hardware.


05: EXERCISE PROPER RISK MANAGEMENT

    In testing a solution, you should start with the
    approach which costs the least in terms of time and
    effort, and which is the easiest to recover from.
    For example, Repairing the Registry is a better first
    option than employing FDISK.  Also, don't make too
    many changes at once, otherwise you will never be
    able to tell if a single change fixed your problem,
    or if it was a multiple change (or, worse yet, if
    it was multiple changes in a specific order). Not to
    mention, if you employ multiple "solutions", you
    might just mask, but not cure, your problem.


06: VERIFY YOUR FINDINGS

    Once you have established a culprit for your issues,
    be sure to verify your findings before claiming
    victory. Lets look at the following example:
    	A - You are having a problem with connectivity
    	B - So, you change the cables
    	C - Then you change the NIC

    At this point, you could easily conclude that the
    original NIC was bad, which may not be the least
    bit true. If you continue with...
    	D - Test the original NIC in a second machine

    ...you might find out that there was nothing wrong
    with the NIC, but perhaps with the way it was
    seated in the slot.  (You should always check the
    seating of cards when encountering these sorts of
    problems)


07: DOCUMENT YOUR STEPS

    You'd be amazed at the details you can forget about
    a problem you worked on for hours if you don't take
    some time to write things down. You'll be less than
    amused if you find yourself encountering the same
    problem a week later on a different machine, but you
    can't remember how you resolved it the first time.


PERSONAL NOTES

• Troubleshooting is a lost art because all people want
  today is Instant Gratification.

• You have to be thorough when troubleshooting, or you
  will come to erroneous conclusions, and take longer
  to solve issues than you should.

• He who FORMATs and runs away will be forced to
  Troubleshoot again, on a different day