7/11/19
ISP Essentials Workshop – Network
Monitoring
Manila, Philippines
8-12 July 2019
Agenda
• Intro to Network Management
• Configuration Management
• Device Monitoring
• Flow Monitoring
• Log Management
1
7/11/19
Module 1
INTRO TO NETWORK
MANAGEMENT
Hosts and Services
• Host • Service
– Container for services – An application software
– Can be physical or virtual – Runs on a host
– Both have CPU, Disk, Memory, – Have allocated resources
Network interfaces – Have vendors / suppliers
– Physical hosts also have
• Vendors, service contracts
• Power supplies, temperature
2
7/11/19
Managing Config Data
• Some Host Configuration Data to Track
– Physical Device Locations
– Installed CPU, Disk, Memory, Network Interfaces
– Serial Numbers, Licenses, OS Revision & Patch Details
• Some Service Configuration Data to Track
– Allocated Resources, Network Ports
– Service Permissions, Filters and ACLs, Logging
– Software Revision & Patch Details
Why Manage Config Data?
• Match Resource Allocation to Revenue Generation
• Ensure our Hosts and applications have Secure
configuration
• Correlate operational results with config changes
• Roll back or restore config when fault occurs
3
7/11/19
Operational Data
• Host • Service
– CPU Utilisation – Time to Respond to Request
– Memory Utilisation – Processes in Use
– Disk Utilisation – Queue Length
– Network Interface Utilisation – State of a BGP session
– Fan State
– Port Errors
Operational Data
• Availability • Reachability
– Applies to Hosts & Services – Applies to Hosts & Services
– Percent of time host or service is – Percent of time host or service is
performing to specification reachable
– Typically measured as a percent, for – Typically measured as a percent, for
example 99.99% example 99.99%
– Excludes planned outages – Unreachable hosts may not be
unavailable to everyone
– Unreachable hosts may be available
• Performance from another location
– Time to respond to request or
forward packet
– Megabits or Packets Per Second
– Discards, Errors, Loss
8
4
7/11/19
Why Monitor Operational Data
• Know about Problems Before your Customers Call
• Prove Hosts & Services are Delivering on SLAs
• Continue to Meet SLAs as your Network Grows
Common NMM Tools
10
5
7/11/19
Common Back-end Tools
• Data storage
– Config files, formats and locations
– Databases: SQL, key-pair, not SQL
• RRDTool
– Explain the idea of a round-robin database
• Check_mk
– Explain the idea of a service checking
• Nagios Plugins
– Explains what is Nagios and what are plugins
11
Network Automation
• A continuous process of generation and deployment of
configuration changes, management, and operations of
network devices (from Network Automation at Scale)
12
6
7/11/19
Network Automation
• Automating config management
• Including config changes based on operational data
• Orchestrated with tools like Ansible Chef, Puppet, and Salt
• This is the next step in network monitoring and
management
13
Module 2
ADDRESS MANAGEMENT
14
7
7/11/19
Address Management
• planning and managing the assignment and use of IP
addresses and closely related resources of a computer
network.
• IP Address Management (IPAM) tools
– Racktables
– Netbox
– A lot of others (commercial and open source)
https://en.wikipedia.org/wiki/IP_address_management 15
Tools - Racktables
• Asset management tool
https://www.racktables.org/demo.php
16
8
7/11/19
Tools - Netbox
• open source web application designed to help manage and
document computer networks.
https://netbox.readthedocs.io/en/stable/
17
Module 3
CONFIG MANAGEMENT
18
9
7/11/19
Network Device Configuration
• How to configure device?
– Using the command line (Cisco)
– From a special tool (Mikrotik)
– From a web interface (Procurve)
– JSON files (Arista)
– XML files (Juniper)
• Who configures the device?
• How often do changes happen?
19
Why do you need to manage config?
• Know when changes are done
• Restore config during failure
• Rollback changes with unexpected outcome
• Track config changes throughout time (history)
20
10
7/11/19
What is Version Control?
• Also known as revision control or source control
• Manages changes to files or documents with a revision
number
• Allows users to find and highlight changes
• Allows users to restore previous versions of a file or
document
21
What’s a Diff?
• A comparison of two versions of a single file or document
• Highlighting the changes between the two versions
• Allowing users to quickly see only what’s changed
22
11
7/11/19
What’s a Diff?
23
Config Management Tools
• Retrieve configuration files
• Allow for their storage as files or in versioning system
• Solve many problems with network operations
24
12
7/11/19
Tools - Rancid
• Really Awesome New Cisco config differ
• monitors a router's (or more generally a device's)
configuration
• Uses CVS, Subversion, or Git to maintain history
• Supports Cisco, Foundry, HP, Juniper, and more
• Runs on BSD, Linux, Mac OS
• Pros:
– The de-facto industry standard for config management
https://www.shrubbery.net/rancid/
25
Rancid Example
Index: configs/dc1-gw1
===================================================================
retrieving revision 1.677
diff -U 4 -r1.677 dc1-gw1
@@ -713,8 +713,10 @@
remark permit eduroam to beta-login
permit tcp any host 204.111.222.3 eq www 443
remark permit eduroam to stats
permit tcp any host 204.111.222.4 eq www 443
+ remark permit eduroam to net-api
+ permit tcp any host 204.111.222.5 eq www 443
remark temp deny access to all
deny ip any 204.111.222.0 0.0.0.64
26
13
7/11/19
Rancid Example
Index: configs/dc1-gw
===================================================================
retrieving revision 1.2213
diff -U 4 -r1.2213 dc1-gw
@@ -32,9 +32,8 @@
!Flash: bootflash: Directory of bootflash:/
!Flash: bootflash: 11 drwx 16384 Jan 11 2017 12:13:18 +10:00 lost+found
!Flash: bootflash: 12 -rw- 371180156 Oct 5 2018 14:05:16 +10:00 asr1000rp1-adventerprisek9.03.13.10.S.154-3.S10-ext.bin
- !Flash: bootflash: 13 -rw- 4 Jul 9 2019 15:15:03 +10:00 .issu_loc_lock
!Flash: bootflash: 48769 drwx 4096 Jan 11 2017 12:16:08 +10:00 .installer
!Flash: bootflash: 438913 drwx 4096 Jan 11 2017 13:05:11 +10:00 core
!Flash: bootflash: 829057 drwx 4096 Oct 11 2018 07:24:32 +10:00 .prst_sync
!Flash: bootflash: 520193 drwx 4096 Jan 11 2017 12:19:19 +10:00 .rollback_timer
27
Tools - Oxidized
• network device configuration backup tool (to replace
Rancid)
• Stores files in a version control system
• Supports a large number of manufacturer
– Cisco (CatOS, IOS, IOSXR, NXOS)
– Juniper (JunOS, ScreenOS)
– Huawei (VRP, SmartAX)
– Mikrotik (RouterOS)
• Pros:
– Integrates with LibreNMS
https://github.com/ytti/oxidized
28
14
7/11/19
Other Tools
• Fetchconfig
• Jazigo
29
Module 4
DEVICE MONITORING
30
15
7/11/19
Intro to SNMP
• Simple Network Management Protocol
• Used to communicate management information between
the network management stations and the agents in the
network elements.
• Even though SNMP is a protocol, we use the term SNMP to
describe the complete architecture of the management
system
31
Intro to SNMP
• Network management stations execute management
applications which monitor and control network elements.
• Network elements are devices such as hosts, gateways,
terminal servers
• The agent is a piece of software that runs on the network
devices you are managing. It can be a separate program, or it
can be incorporated into the operating system. Agents listen and
respond on UDP port 161.
32
16
7/11/19
SNMP Polling, Traps and MIB
• SNMP Polling is the act of querying an agent for some piece of
information. SNMP managers use UDP to poll agents
• A trap is way for the agent to tell the NMS that something has
happened. Traps are sent asynchronously, not in response to queries
from the NMS. SNMP traps are sent using UDP port 162.
• MIB or Management Information Base is a database of managed
objects that the agent tracks. Any sort of status or statistical information
accessed by the NMS is defined in an MIB.
– OID or object identifier is the name of a management object. OIDs are globally
unique
33
SNMP Applications
• LibreNMS
• MRTG
• PRTG
• …
34
17
7/11/19
Beyond SNMP
• SNMP is a heavy-weight protocol with low information density
• SNMP was not designed for streaming high resolution data
• It’s seen as too slow, incomplete, network-specific, and hard to
operationalize
New protocols are being developed to stream telemetry data in real-time
• Yang data model
• XML, JSON and GBP encoding
• Data pushed from agents, not requested from Managers
• UDP, TCP or gRPC transport available
35
Tools - LibreNMS
• An open-source network monitoring system (NMS)
• Capable of managing small or big networks
• Most management functions are supported or can be
integrated
• Details under the hood:
– Written in PHP, derived from the Observium project
– Configuration in MySQL
– Operational data is stored in Round Robin Database files
https://www.librenms.org/
36
18
7/11/19
LibreNMS Dashboard
37
Tools – Sensu
• Sensu is a multi-cloud monitoring system that allows for
automating monitoring workflow
– Monitor containers, instances, applications, and on-premises
infrastructure
– Integrates with PagerDuty, Slack, Grafana, etc
• Sensu Go is the latest version
• Uchiwa is an open-source dashboard for the Sensu
monitoring framework
https://sensu.io/about/
38
19
7/11/19
Sensu / Uchiwa Dashboard
https://github.com/sensu/uchiwa
39
Tools - Grafana
• Open platform for monitoring and analytics
• Does time series analytics
• Plugins to integrate with other applications
40
20
7/11/19
Grafana Dashboard
https://grafana.com/
41
Module 5
FLOW MONITORING
42
21
7/11/19
What is a Flow?
• A flow is defined as a unidirectional sequence of packets
with some common properties that pass through a network
device. (RFC3954)
43
Why do we monitor IP flows?
• Where is our traffic coming from?
• What kind of application traffic is it?
• Are the correct QoS bits set?
• Have routing changes impacted the network
44
22
7/11/19
What’s Netflow?
• Cisco protocol for flow monitoring released in 1996
• Described by RFC3954, but not an Internet Standard
• Netflow V5 is supported by nearly all router platofrms
• Versions:
– Version 5: Ipv4 only
– Version 9: IPv4/v6 and MPLS
45
What is IPFIX?
• IP Flow Information Export
• Vendor neutral protocol for flow monitoring
• Started through the IETF process in 2004 & released in
2011
• Based on Cisco’s Netflow version9
• IPFIX is an Internet Standard replacement for version 9
46
23
7/11/19
How do Netflow and IPFIX work?
• Packets with matching tuples are grouped into a flow
• First occurrence of a flow is recorded in a flow cache
• Cache entries are timestamped
• Number of packets and bytes matching the flow are tallied
• Details like next hop IP, ASN, subnet masks, and TCP flags
can be recorded
• Cache can be queries interactively, or flows can be
exported
47
Setting up Netflow & IPFIX
• Cisco – Netflow Configuration
• Juniper – Monitoring, Sampling …
• Huawei – Netstream Configuration
• Mikrotik - IP Traffic Flow
48
24
7/11/19
Flow Sampling / Downsampling
• Tracking every flow can take a lot of device resources
• Some routers & switches can be crippled by turning on
Netflow
• Sampling helps by tracking one in n packets
• CPU load can be significantly reduced – but so can
resolution
49
Tools - Softflowd
• Software Flow Monitoring
• Passive Netflow collector
• Network traffic passing through a switch can be mirrored
• Attach a Unix computer to the mirrored port
• Softflowd tracks flows from the mirrored traffic
• Flows can be exported just as they are from routers &
switches
50
25
7/11/19
Ad-Hoc Flow Queries
• Cisco
show ip flow
• JunOS
show services accounting flow-detail
51
Tools – nfdump + nfsen
• Nfdump collects and processes netflow and sflow
– C application that receives flows & logs them to files
• Nfsen generates stats and displays graphs
– Web-based front-end to Nfdump
https://github.com/phaag/nfdump
http://nfsen.sourceforge.net/ 52
26
7/11/19
Tools – nfdump + nfsen
53
Tools - ntopng
• Web-based traffic and security network monitoring tool
https://github.com/ntop/ntopng
54
27
7/11/19
Module 6
LOG MANAGEMENT
55
What generates logs?
• Operating Systems
– Linux, Mac, Windows
• System applications
– Cron, init, rdbms
• Network applications
– Bgp, dhcp, http, iptables …
56
28
7/11/19
What do servers log?
• Backups
• Connections
• Database messages
• Hardware messages
• Software versions and updates
57
What do Network Apps log?
• Connections
• DHCP details
• Hardware messages
• Port events
• Protocol information
58
29
7/11/19
Where are logs stored?
• Linux/Mac : /var/log
• Windows: Event Viewer
• Network devices: Memory
Is it useful to have logs stored all over the place? What
happens to events written to memory when devices are
turned off?
59
Firewall Log
60
30
7/11/19
Syslog Message Levels
Level Description
0 Emerg
1 Alert
2 Critical
3 Error
4 Warning
5 Notice
6 Info
7 Debug
61
Syslog aggregation
62
31
7/11/19
How to aggregate syslog
• Set up a remote syslog facility on a server
– Graylog
– Elastic Stack
– Rsyslog
– Splunk
– Syslog-ng
• Configure devices to send their logs
63
Tools - Graylog
• Commercial + Open source software
• Collection, Storage, Analysis, & Visualisation
• Tightly coupled software stack including:
– ElasticSearch for Search
– MongoDB for log storage
• LibreNMS integration
64
32
7/11/19
Tools – Elastic Stack
• Open source with commercial support available
• Collection via Logstash
• ElasticSearch for Storage and Search
• Kibana for Search, Analytics, and Visualisation
• (ELK stack)
65
Tools - Rsyslog
• Open source with commercial support available
• TCP, SSL, TLS, RELP
• MySQL, PostgreSQL, Oracle and more
• Filter any part of syslog message
• Multi-threading and suitable for relay chains
66
33
7/11/19
Tools - Splunk
• Commercial software
• Free for small users at < 500 mb/day
• Collection, Storage, Analysis & visualization
• Real-time alerting engine included
• Popular corporate solution with 13k customers
67
Tools – Syslog-ng
• Free and open source with commercial support available
• Collection and storage
• Adds TCP and TLS to basic UDP transport
• Can extract structured information from log messages
• Can log directly to a database
• Requires external tools for Analysis and visualization
68
34
7/11/19
Log Alerting & Analysis
• No systems administrator has time to read all logs
• Log messages are unimportant until they aren’t
– Post-incident security reports
– Billing inquiries
– Law Enforcement Agency request
• Some platforms include analysis or alerting
• Others need external tools like Tenshi or Swatch
69
Beyond Alerting: Analysis
• Volume of log entries is as important as entries
– What’s your baseline number of entries?
– Has it changed?
– Do more log entries mean an attack?
• Similar log entries across a network can be important
– Port scanning, intrusion attempts
• Similar log entries across time can be important
– Is someone attacking you very slowly?
70
35
7/11/19
7171
36