Customer Service and Support - CTS LatAm
Problem Resolution Framework – four stage method for troubleshooting
Problem Resolution Framework – four stage method for troubleshooting
Troubleshooting should always start with a problem definition.
The resolution process moves through the four phases in sequence.
Phases should not be skipped.
The process is a cycle, where on each interaction the problem definition should
be narrower until resolution is possible.
At any phase the engineer may need to collaborate or escalate to upper level.
Terminology
Troubleshoot
To isolate the source of a problem and fix it
Typically through a process of elimination
Sources of problems are investigated and eliminated
Begins with the most obvious (least invasive) or easiest to fix
Debug
To attach a debugging program to a process in order to uncover the root cause of a
problem
The program operates underneath the debugger
The most invasive means possible to a solution
Terminology
Crash
A serious computer failure
The computer itself stops working or a program aborts unexpectedly
Usually signifies a hardware malfunction or a serious software bug.
Hang
The computer, or program, has simply stopped responding to all attempts to interact
with it
Might be a precursor to a crash – but not always
Typical Scenarios
My server is not responding…
It’s taking too long to get search results back…
It takes too long to crawl my content…
I’ve got a memory leak!
I’m out of memory and here’s a dump!
What approach should we take?
Troubleshooting Methodologies
Scientific Method
A body of techniques for investigating a problem
Based upon observable, empirical and measurable evidence
Iterative process
Hierarchical Task Analysis (HTA)
Also referred to as Functional or Systematic Decomposition, Hierarchical Task
Decomposition (HTD)
Developed from the bottom up, from general to specific
Involves iterative decomposition of tasks into smaller subtasks
Best used where a clear goal can be determined and tasks/subtasks are required to
accomplish the goal
Scientific Method
Scientific Method
Ask Question
Used to explore Do Background
observations Research
& answer questions
Search for cause Construct Hypothesis
Think!
& effect relationships Try Again
Test with an
experiment
Analyze Results
Draw Conclusions
Hypothesis is False
Hypothesis is True
Or Partially True
Report Results
Hierarchical Task Analysis (HTA)
Breaking down the steps of a task (process)
performed by a user, viewed at different levels of
detail
Each step can be decomposed into lower-level
sub-steps
Forms a hierarchy of sub-tasks
HTA Example
Open a word processor, type your document and
print Open the word
processor
Locate word
processor icon
Type your Click on the icon
document
Select New from
file menu
Print it
Quit
Troubleshooting Methodology
Set the problem’s priority
Problems rarely occur alone
Assess each problem and set a relative priority
A problem that halts work usually takes precedence over a problem that would enhance
work
Do not assume anything!
Let the data lead you not the other way around
Defining phase
Define the Problem
Start by working on a good problem
definition
The first step in successfully solving a problem is defining it in a way it can be
solved.
How we define a problem usually determines how we analyze it.
How we analyze a problem absolutely determines:
Whether we find a solution
What quality the solution will have
Unless the problem is correctly defined, it is unlikely that a satisfactory
solution to it can be found.
Troubleshooting Methodology
Ask the right questions
Useful observations to look for
I can’t log in
The network is really slow
I’ve had this problem ever since they installed/moved…
Questions to isolate the problem
When did the problem begin?
Has it ever worked?
What has changed since it last worked properly?
Has anyone attempted to fix the problem already? If so, what did they do? What were the
results?
What version of the software are you currently running?
Troubleshooting Methodology
Divide the system
With a large system, it’s difficult (if not impossible) to troubleshoot on a large network
all at once
Need to find some way to segment the system into more manageable chunks
Hardware
Software
Our software
The operating system
3rd party software
Network
Process
Troubleshooting Methodology
List possible causes
After gathering all of this data, list and rank cause from least probable to most probable
You may have symptoms that fit into multiple problem scenarios
Share your thought process with the customer – partner with them in diagnosing the
problem
Test the possibilities
Rule various problem scenarios in/out
Go from minimally invasive to most invasive
How can you do it ?
Deliverables – Defining phase
If you are leaving this phase, you must have:
Clear and well defined problem.
What criteria define a good solution for the problem.
One or more hypotheses.
Opening letter.
Documented case following documentation guidelines.
Support Topic coded on MSSolve
Gathering phase
Gathering data
Based on the hypotheses created, start gathering the data you need.
Outline an effective action plan to collect data.
Make sure all the necessary data is gathered the first time.
Help customer to collect the data whenever possible.
Assess the quality of data collected before moving to next phase.
Troubleshooting Methodology
Collect information to identify the symptoms
Baseline information is crucial
Symptoms describe the problem for you
The fun part is finding the symptoms and making sense out of what they reveal ☺
Look for the obvious
Write an action plan to gather data
Never assume that the customer knows how to gather the data
Explain why you are asking for data
Make references to documents and articles
Check with the customer if he is comfortable with the level
of details in the action plan
Whenever possible use “Easy Assist” to guide the
customer through the data gathering process
A good action plan needs to be detailed to appropriate level for the customer
Be specific about what you need
Provide detailed explanations on how to gather the data
Assessing data quality
Before the data can be analyzed, it must be validated
Are the logs and dump files readable?
Does the network monitor trace include the actual machines under
investigation?
Does the timeframe include the events we are looking for?
Best Practices
Try to get a small sample of the data you need in order to make sure it is being
collected right (i.e: Perfmon logs).
Keep active communication with customer to ensure data will be gathered
correctly and timely.
Deliverables – Gathering phase
If you are leaving this phase, you must have:
Action plan used for data gathering documented on the case
Active communication with the customer along the phase
Useful data collected
Analyzing phase
Analyzing the data
Tips for a good Data Analysis
Separate the data that is relevant to the analysis based on the problem definition
and your knowledge.
Look for obvious evidence first before spending time on deeper analysis:
Look for existent content (KB, Bugcheck, Internet, IMQA)
Do deeper analysis:
Investigate the data collected in detail.
Is it possible to reproduce the problem ? Try to recreate it on lab environment.
Compare the data collected against a working environment –
what are the differences?
Tips for a good Data Analysis
Communicate progress to customer:
This is important to show commitment with resolution.
Document your analysis thoroughly.
Study the results
Evaluate the results of the tests that you’ve just run
You may have found the problem
You may need to refine your list of possible problems based upon data that you found
What if the data provides you absolutely no clue what to do next?
Ask for help!
No one knows everything
Know when to ask for help and who to ask
Leverage whatever resource you deem appropriate
Confirm or reject a hypothesis
Hypothesis confirmed
There is enough evidence to confirm a hypothesis and support a diagnosis
Hypothesis rejected
If all hypothesis were rejected:
Go into another loop in the cycle - start Defining again.
Consider escalation or collaboration at this point.
Deliverables – Analyzing phase
If you are leaving this phase, you must have:
Hypothesis confirmed by data analysis.
Possible problem (root-cause) identified.
Results of your analysis documented in the case.
Active communication with customer informing progress along phase.
Fixing phase
Write an action plan to fix the problem
Assess your knowledge - do not attempt actions
that are beyond what you can do with confidence.
Explain every action at a level the customer can understand.
Offer clarification on all steps in the action plan.
Identify all the risks
Mitigate risks:
Always consider testing the action plan first in a lab environment.
Always include precaution measures (backups, contingence plan, etc...)
if resolution action plan is to be applied straight to production.
Best practices when fixing problems
If there are several ways to resolve the problem or several steps in one
solution, apply one change at a time.
Recommend the customer to carry out the plan step-by-step according
to instructions.
Use Easy Assist to assist customer whenever appropriate.
Assessing the resolution
Symptoms disappeared,
Flag the case as “Solution Delivered” and monitor
problem appears to be resolved
Follow-up on a regular basis until case is closed
Original issue is resolved, but
now we have a different issue Negotiate with customer to open a new incident
Problem continues, no Cycle to Defining Phase
change in symptoms Collaboration should be considered at this point,
get help from other SMEs.
Problem is now worse
Brainstorm with your peers, consult with SMEs,
involve an EE if needed.
Deliverables – Fixing phase
If you are leaving this phase, you must have:
Closure letter if the issue was resolved.
Decision to go over the Defining phase again.
Decision to escalate to upper level.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademar ks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provi ded after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.