0% found this document useful (0 votes)
79 views43 pages

Operations Manual Update (v1)

This document provides guidelines for the Service Management Center (SMC) team on monitoring the network using the SolarWinds tool, opening tickets in the ticketing system for any issues detected according to severity levels and service impact, and ensuring issues are addressed within their SLA targets. It describes the different types of alarms the monitoring system can trigger and how to investigate the root cause of issues for nodes, interfaces, or services that are down or degraded. Guidelines are provided for prioritizing incidents and selecting the proper problem codes in the ticketing system.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views43 pages

Operations Manual Update (v1)

This document provides guidelines for the Service Management Center (SMC) team on monitoring the network using the SolarWinds tool, opening tickets in the ticketing system for any issues detected according to severity levels and service impact, and ensuring issues are addressed within their SLA targets. It describes the different types of alarms the monitoring system can trigger and how to investigate the root cause of issues for nodes, interfaces, or services that are down or degraded. Guidelines are provided for prioritizing incidents and selecting the proper problem codes in the ticketing system.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Ticketing System and SolarWinds procedure

pg. 1
The operation guidelines of SMC Team: -

According to the SMC guidelines, this document describes its purposes and procedures. This plan is designed to assist [New
employees] in meeting their assigned tasks on time.

Proactive Management of SMC to Improve IT Services:


Service Management Center (SMC) is a high-priority improvement target for IT service providers because it is critical for business
operations and involves daily interaction with customers, thus directly affecting customer satisfaction.
Service Management Center, Continual Service Improvement, Proactive approach.
Proactive service management utilizes real-time data to organize jobs. The SMC set up predefined operational parameters
(Monitoring tool), and threshold values and configures workflows to trigger automated actions before things go wrong. The
proactive strategies aim to thwart the issues by offering an array of benefits like-
 Higher uptime guarantee
 Reduced risks
 Increased ROI
 Increased customer satisfaction
 Lower operating overheads
 Lesser failures
 No costly repairs
 Superior service delivery

pg. 2
Monitoring tool: -

SolarWinds is the tool used for monitoring whole network elements. The (Routers and Switches) include Core Switches, Distribution
Switches, and Access Switches, and all devices connected to the access switch (such as Access Points, NVRs, CCTV Cameras, Data
Points).
The link of Solarwinds Orion platform:-
http://5.132.250.83/Orion/SummaryView.aspx?ViewID=76

 On Orion (SolarWinds), enter your user name and password as shown below.

pg. 3
 You can access the monitoring dashboard by clicking (My Dashboard) and then clicking (NPM Summary)

 The system provides three types of alarms: (Node Down/Up, Interface Down/Up, and Degradation of services- Active)

pg. 4
 Using the last boot date in the Node UP section, you can determine if this issue is due to power failure or a sudden
network outage based on the event time.

pg. 5
 As you can see in the example below, the node was down due to a power issue.

 Types of Nodes down:-(Router (RTR), Switch, Access Point(AP), NVR & IP Camera(CAM))

 Types of Interfaces down:(Router & Switch)

 Types of Degradation of Service – Active: -


I. High Received Utilization
II. High Transmit Utilization
III. BGP Neighbor Down
IV. High Temperature
V. High CPU Utilization
VI. High Memory Utilization

pg. 6
If there is (node down or interface down) that is mean service impact, Degradation of Service – Active Non-service impact,
you should act according to the type of service because there is SLA, please make sure that you are following the SLA Targets
mentioned in the SMC Operation Workbook when opening tickets, this is very critical in order not to breach the committed
SLA.
15 min for Incidents with Service
Impact and Non-service impact
A breach of SLA will occur if we don't create an incident within 15 minutes of receiving an alarm directly.
 First of all, open Incident, go to the ticketing system, and follow the below steps on the ticketing system section below.
 Event & Incident type accurate correlation is mandatory in order to be able to take the right action, refer to the operation
workbook for more information
 Once the incidents are resolved and all the related work is done (opening tickets, updating tickets, etc), clear the related
events so that it will be easier to detect other events
 Always make sure that you are clearing only correct/related event

pg. 7
 Node Details: Your browser will open a new window when you click on the node name. You can find customer information and link details,
plus the phone number and address of the site's contact person.

pg. 8
MVPS platform – To monitor the customer Servers
 URL: https://mvps.itc.sa/Orion/Login.aspx?ReturnUrl=%2fOrion%2fSummaryView.aspx%3fViewID%3d1&ViewID=1

pg. 9
pg. 10
Ticketing System: -

Please follow the below when selecting all the required fields.

Problem Codes:

Severity
Priority Impact
Category Type Code Level Devices Comments
Incident with Loss of Service Node Down Router, CPE is completely down,
Service Impact Switch, NVR, one of the stacked
Camera switches is down
Interface Down Router, Router:
Switch (Data *LAN is Interface down
Point, P2P Switch:
Interface) *DataPoint Problem, P2P
Interface (ITC Switch &
P1 High Critical Customer Switch),
Cameras Mgt. Interface
Backup Issue NVR, Camera Any issue that affects the
backup activity (inability
to fetch backup)
PC Failure PC PC is totally not working
Low-Current Anti-Theft, Anti-Theft or Fire Alarm
System Failure Fire Alarm systems are completely
not working
Degradation Node Down P2 Medium Major Access Point AP is down

pg. 11
of Service High CPU Router,
Utilization Switch
High Memory Router,
Utilization Switch
High Bandwidth Router,
Utilization Switch
High Response Router,
Time Switch,
Access Point,
NVR, Camera
High Packet Loss Router,
Switch,
Access Point,
NVR, Camera
Non-Redundant Router,
Hardware Failure Switch
Software Issue NVR NVR Slowness,
unresponsive, playback
problems
WIFI Access Access Point Unable to connect, limited
access, low wireless signal,
no sync with MOH's AD
PC Partial Failure PC All issues related to the PC
(slowness, software,
drivers, etc…)
Peripheral Device Mouse, All issues related to any of
Issue Keyboard, the peripheral devices
Printer,

pg. 12
Scanner
Low-Current Partial Anti-Theft, Low current systems
System Failure Fire Alarm single node (one room) is
not working or not
functioning properly (e.g.
Siren issue)
Incident with Non- Loss of Redundant Link P3 Low Minor Router, Router:
Service Impact Redundancy Down Switch *Redundant link from ITC
down
*BGP (Provider) down,
cable between ITC switch
& Provider router is
disconnected
*Provider router is
rebooted/powered off or
misconfigured
Switch:
*One of the ports in the
PortChannel is down
*OSPF (Provider) down,
cable between ITC switch
& Provider router is
disconnected
*Provider router is
rebooted/powered off or
misconfigured
Redundant Router, FAN, Power Supply
Hardware Failure Switch

pg. 13
Environmental High Temperature Router,
Alerts Switch

Request Optimization Visit Maintenance Activity Any planned activity by ITC (IOS upgrade, optimization)
Technical Request Configuration Configuration changes or assistance (VLANs, DHCP, Routing, others)
Report Request Report Utilization Report, Inquiry
Maintenance Site Visit Low-Current systems testing, maintenance requests, camera alignment
Request

Resolution Codes:

Tier 2 Tier 3 Comments


Power Outage-CPE Power outage on the CPE
Power Outage-PoE Power outage on the PoE
PoE-Indoor Cable Cable between Router & PoE was disconnected
Disconnected
PoE-Outdoor Cable Cable between PoE & MW Antenna was disconnected
Disconnected
Customer Related
Maintenance Maintenance by customer
Problem
Site Closed Site completely closed/relocation
P2P Device/Cable *Between ITC Router & ITC Switch, Customer's End Device
Disconnected *Between ITC Switch & Customer's End Device, Provider's Router, AP, NVR (NVR or Cameras
Management interfaces), PC
*Between ITC Switches (redundant links)
High Received Utilization

pg. 14
High Transmit Utilization
High CPU Utilization
High Memory Utilization
High Temperature
AD Synchronization Issue MOH AD Sync account disabled from MOH side
Link Related Circuit Outage
Problem
Configuration Issue Configuration issue on Router, Switch, Camera, NVR, AP
Software/Application Issue *IOS upgrade to fix a bug
*NVR Software upgrade to fix slowness, playback, others
*PC OS Damaged
Mounting Issue Equipment that requires remounting (AP, Camera, Cabinet, others) due to bad installation
Faulty CPE Router, Switch
Faulty FAN Router, Switch
Faulty Power Supply
Software/Hardware
Faulty Interface/Module
Problem
Faulty Hardware If the faulty hardware is not in the list
Faulty Cable *Between ITC Router & ITC Switch
*Between ITC Switches
*Between ITC Swtich & AP, NVR, Camera, PC
Faulty Connector Change connector for the cables pulled by ITC
Faulty HDD HDD on NVR, PC
Faulty DataPoint
Faulty PC
Others Others

pg. 15
Configuration Any configuration change requested by the customer
Utilization Report Bandwidth utilization reports, NTA
Customer Request
General Inquiry Any information inquiry
Site Visit Site visit for maintenance

the below link for the Ticketing System (bcm Remedy).


https://support-sdms.itc.sa/smartit/app/#/

pg. 16
 When you login the Remedy there is Dashboard

pg. 17
pg. 18
 Go to the console and tap the ticket console to see all the pending tickets.

pg. 19
 Incident Creation Steps:

The first step is to go to the Smart Recorder in the Ticketing system, click it and add @ CPE SERVICE ID, then click inside the
red line below.
I. Add the name of node and then click create incident as mentioned below

pg. 20
pg. 21
II. In the Edit incident page go to Status put In progress, Operational Categorization Tier 1 put SMC, Operational
Categorization Tier 2 put Loss of Service, Operational Categorization Tier 3 put Node Down, Reported Source put
System Management, Severity put Critical, Service Impacted put Yes Actual Incident Start Date/Time put even time
for the node down

pg. 22
III. And go down and click assigned to me and click confirm + Seve

 After you open a ticket make sure that the ticket status is not kept on “in progress kept as pending if you are still following up
with the customer”, always check the queue during the shift and at the end of each shift.
 If there is Node down (Router only) you should send email to Bcare to check the link, keep monitoring the queue in case the
ticket was referred or follow up with Bcare through email and act accordingly.

pg. 23
V. In the Edit incident page in Status kept it as Pending, and the Status Reason (Waiting for customer approval /
response) and then send the email to the customer.

pg. 24
IV. To Resolve the ticket folow up the below stpes:-

pg. 25
 In case you have any system issue, please contact the below
You should send email to SDM-SUPPORT [email protected] and Abdul Haque M Farooque [email protected]

pg. 26
Communication with the customers:
 Make sure that you are familiar of the design/setup of affected site before calling the customer and do your troubleshooting.
Example1:- design/setup for MOH-CLI-RTR-Dammam28‫القادسية‬: ‫ اسم المركز الصحي‬, ‫ الدمام‬: ‫المحافظة‬, ‫ الشرقية‬: ‫ المنطقة الصحية‬:
Example2:- design/setup for MOH-DCLI-RTR-Riyadh32 ‫مركز طب االسنان التخصصي بشرق الرياض – الروضة‬

pg. 27
pg. 28
pg. 29
 Network Overview

 Network topology
Add a high level topology diagram in this section and include a brief description of the various layers of the topology such as access layer,
distribution layer and core layer.

The content should be taken from the detailed design document.

 Device Hostnames
Use the table below to add a table with network devices hostnames and corresponding layers.

# Devices Name Network layers


1 User Laptop and PCS Wired 264
2 IP Phones

3 Wireless Devices 7
4 Medical Devices

5 Modalities
18
6 IP Camers
7 Managment VLAN
8 Loopback
9 Point to Point
10 Mobile devices for Guest
11 Medical Devices

pg. 30
 Devices Management addresses
Use the table below to add management addresses for network devices.

Subnet Description Device hostname IP Address


Management Vlan Al- RYD-RWGFL-C01-AS01 10.110.126.139/26
255.255.255.192
Rowadahi RYD-RWGFL-C01-AS02 10.110.126.140/26
RYD-RWGFL-C01-AS03 10.110.126.141/26
RYD-RWGFL-C01-AS04 10.110.126.137/26
RYD-RWGFL-C01-AS05 10.110.126.138/26
RYD-RWFFL-C01-AS06 10.110.126.142/26
RYD-RWSFL-C01-AS07 10.110.126.143/26
RYD-RWTFL-C01-AS08
10.110.126.144/26

pg. 31
 Connectivity Diagrams & Tables
Add connectivity diagrams and tables below.
 Content to be taken from the detailed design documentation.

pg. 32
pg. 33
pg. 34
 MOH Network Architecture

pg. 35
 In case the customer asked about anything that you don’t have or know the answer to, simply tell him that we’ll get back to
you and then request for guidance.

Handover:
 Make sure to maintain the handover sheet and put clear comments so that the next shifter knows the exact actions taken/to
be taken
 At the end of your shift, make sure that only the related events/on-going incidents are kept on the NMS, all the resolved ones
should be cleared before the next shifter takes over.
 We have a template we will send to you to following up.

Daily Report: -

 Your support is needed to create the operation daily report to fetch all TTs pending, assigned, and in progress from the SMC
inbox and you have to update it with the last email.
Please follow the below steps to initiate the report.
1- Open Remedy system
2- Go to the Report tab

3- Click on CTT incident details report (click right - open)

4- Select Wait then submit – (The system will take time to fetch the TTs)

pg. 36
5- From the left hand, you have to update the filter (Case sensitive):
Assigned group: SMC, B-care, FOPS
Categorization (Tier 1): SMC
Customer Type: Organization
Status: Assigned, In Progress, Pending
6- Go to the last list from the filter then click on Go
7- Export the report as Excel file

8- Remove unnecessary columns from the Excel like


ITC_Phone Number
Internet E-mail
Assigned By Organization
.... etc.

Troubleshooting: -

pg. 37
 Root cause description for most of the issues on the link (router)
Root Cause Discription
Power issue on POE Microwave device power injector was was rebooted by Power due to which link was down.

Network fault/Network Outage Link was affected due to Netwrok issue


Issue in Last mile Media Issue In Last mile Media and it was fixed remotely or by field intervention
There was mismatch of SPEED/DUPLEX. Reconfigured same from both end and since then link is
Speed/Duplex Issue
stable.
ITC had investigated the link and found no issue on ITC path and the Last mile media. Rather it
No issue from ITC side could be some issue in cable disconnection done someone between router to Last Mile media
Power inector.
There was issue in indoor/Outdoor cable coming or cable connectors from Lastmie media and is
Cable issues
fixed by replacing it.

Affected due to bad weather condition As there was heavy rain in that area due to which this link was affected the Microwave signals

pg. 38
 You need to further investigate each problem, especially since we have multiple CPE types (Routers, switches, APs, NVRs,
CAMs, WLCs)

pg. 39
 Do not refer tickets to THD in case the problem is on any of the LAN devices (Switches, APs, NVRs, CAMs) unless you are
100% sure that it requires a field visit
 Learn the topology well for each project and base your investigation on it
 Login to the current monitored devices to acquire more information on how they are configured
 You can always ask for assistance but always do your troubleshooting first

pg. 40
Tickets Handling: -
Customer Trouble Ticket Handling Guidelines.
 The SMC Trouble Ticket handling guidelines for SMC staff to manage faults as a Proactive approach via Orain monitoring tool
proactive Problem Identification.
 SMC is a 24x7 round-the-clock technical help desk center responsible for customer support based on their request.

 SMC Engineers will immediately take action when there is a high, medium or low service impact appear on the monitoring
tool or customer calls or emails following fault handling procedure in order to resolve the case. The following will be the brief
flow:-

 service impact or non-impact showing on the monitoring tool or receive Call from the Customer
 Create a TT related to the issue.
 Fault fixing and informing the customer on Email.
 Escalation to FOPS team if the problem still persists.
 Follow up with the FOPS team until the problem is resolved.
 Fault Closing.
 Report for the customer on Demand from the customer.
 Double-check the information reported on the ticketing system & emails before sending emails.

SLA Breaches: -
 Make sure that you open tickets in a timely manner (with respect to the SLA)
 In case a breach occurred for any reason that is outside your control,
(i.e. problem in the system), send an email mentioning what happened exactly so that we can address it.

pg. 41
 As a follow up on the below, please make sure that you take action on all the incidents during your shift before leaving the
office. Always check the NMS and the ticketing system queue and cross check both with the handover sheet.
 For example, if an incident happened at the end of your shift and you didn’t take action, the next shifter will have to open the
ticket on your behalf but this will be considered as a breach since the incident occurred during one shift but action was taken
during the next shift.
 Please make sure to follow the above at all times to avoid breaches.

You are requested to act on the above mentioned points with the highest priority, failure to do so will have consequences.

Your job isn’t only to sit back and monitor alarms and act accordingly, there is emails and calls you should take action by opening
ticket as customer request and follow up the case according to.

In order to enhance our operation work and make sure that we are in line with the operation guidelines, we’ve compiled above a list
of points that needs to be addressed based on the observation of the 24x7 shift.

pg. 42
Thank you

pg. 43

You might also like