Real-Time Network Monitoring Data Processing

This document proposes a novel mechanism for scalable storage and real-time processing of network monitoring data using a data-intensive framework. It takes advantage of a NoSQL data store like HBase for collecting and indexing network flow information records in real-time. This design is applicable to different monitoring protocols and provides low-latency responses for exploratory queries by administrators. The implementation is about 4000 times faster than traditional solutions and can handle high data input rates from network devices.

Uploaded by

Thems Oclauanj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views8 pages

Real-Time Network Monitoring Data Processing

Uploaded by

Thems Oclauanj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Real-Time Handling of Network Monitoring Data

Using a Data-Intensive Framework

Aryan TaheriMonfared∗, Tomasz Wiktor Wlodarczyk∗, Chunming Rong∗
∗ Department of Electrical Engineering and Computer Science, University of Stavanger, Norway
{aryan.taherimonfared, tomasz.w.wlodarczyk, chunming.rong}@uis.no

Abstract—The proper operation and maintenance of a network protocols. Their solution copies recently collected NetFlow
requires a reliable and efficient monitoring mechanism. The data to Hive tables in fixed intervals which doubles the
mechanism should handle large amount of monitoring data storage capacity requirement. Andersen et al. [5] described the
which are generated by different protocols. In addition, the
requirements (e.g. response time, accuracy) imposed by long- management of network monitoring datasets as a challenging
term planned queries and short-term ad-hoc queries should be task. They emphasized on the demand for a data management
satisfied for multi-tenant computing models. framework with the eventual consistency property and the real-
This paper proposes a novel mechanism for scalable storage time processing capability. The framework should facilitate
and real-time processing of monitoring data. This mechanism search and discovery by means of an effective query definition
takes advantage of a data-intensive framework for collecting
network flow information records, as well as data points’ indexes. and execution process. Balakrishnan et al. [6] and Cranor et
The design is not limited to a particular monitoring protocol, al. [1] proposed solutions for the real-time analysis of network
since it employs a generic structure for data handling. Thus, it’s data streams. However, they may not be efficient for the
applicable to a wide variety of monitoring solutions. analysis of high-speed streams in a long period [5].
I. I NTRODUCTION
B. Contributions
Monitoring and measurement of the network is a crucial
part of infrastructure operation and maintenance. A good A flexible and efficient mechanism is designed and imple-
understanding of the traffic passing through the network is mented for real-time storage and analysis of network flow in-
required for both planned and ad-hoc tasks. Capacity planning formation. In contrast to other solutions, which have analysed
and traffic matrix processing are planned, whereas traffic binary files on distributed storage systems, a NoSQL type of
engineering, load-balancing, and intrusion detection are ad- data store provides real-time access to a flexible data model.
hoc tasks which often require real-time behaviour. The data model flexibility makes it compatible with different
1) Storage Requirements: Quite often, ad-hoc tools are monitoring protocols. Moreover, the structure leads to fast
used for analysing network properties [1]. Traffic dumps or scanning of a small part of a large dataset. This property
flow information are common data type for an ad-hoc analysis. provides low latency responses which facilitate exploratory
The data volume for these types can be extremely large. and ad-hoc queries for researchers and administrators. The
2) Analytic Requirements: The storage should be dis- solution provides a processing mechanism which is about 4000
tributed, reliable, and efficient to handle high data input rate times faster than the traditional one.
and volume. Processing this large data set for an ad-hoc query The study concentrates on flow information records, due to
should be near real-time. It should be possible to divide and regulatory and practical limitations such as privacy directives
distribute the query over the cluster storing the data. and payload encryption. However, one can leverage the same
3) Privacy Policies: Storing the packet payload which solution for handling wider and richer datasets which contain
corresponds to the user data is restricted according to European application layer fields. This study is a part of our tenant-aware
data laws and regulations [2]. The same policy applies to the network monitoring solution for the cloud model.
flow information as well. The rest of the paper is organized as follows: Section
II explains the background information about data-intensive
A. Related Work processing frameworks and network monitoring approaches.
Li et al. [3] surveyed the state of the art in flow information Section III describes the Norwegian NREN backbone net-
applications. They identified several challenges in the fields work as a case study. Dataset characteristics and monitoring
such as: machine learning’s feature selection for an effective requirements of a production network are explained in this
analysis, real-time processing, and efficient storage of data section. Section IV introduces our approach toward solving
sets. Lee et al. [4] proposed a mechanism for importing data processing challenges for network monitoring. Section
network dumps (i.e. libpcap files) and flow information to V discusses technical details of the implementation as well
HDFS. They’ve implemented a set of statistical tools in as performance tunings for improving the efficiency. Section
MapReduce for processing libpcap files in HDFS. The tool VI evaluates the solution by performing common queries and
set calculates statistical properties of IP, TCP, and HTTP Section VII concludes the paper and introduces future works.
II. BACKGROUND of both scalability issues [1] and privacy policies [2]. Thus,
A. Framework for Data-Intensive Distributed Applications we are more interested in the packet header, and IP flow
information. An IP flow is a set of packets passing through
Using commodity hardware for storing and processing large a network between two endpoints, and matching a certain
sets of data is becoming very common [7]. There are multiple set of criteria, such as one or more identical header fields
proprietary, open-source frameworks and commercial services [18]. In our study, a flow is a canonical five-tuple: source
providing similar functionality such as: Apache’s Hadoop1 IP, source port, destination IP, destination port, and protocol.
[8] and related projects, Google’s File System (GFS) [9], Flow information is flushed out of the network device after 15
BigTable [10], Microsoft’s Scope [11], Dryad [12]. In the seconds of inactivity, 30 minutes of persistent activity, TCP
following, required components for the analysis and storage session termination, or when the flow buffer in the device is
of our dataset is explained. full. This makes the start and end time of a flow imprecise
1) File System (Hadoop Distributed FS): The first building [19]. IP flow information is an efficient data source for the
block of our solution, for handling network monitoring data, is real-time analysis of network traffic.
a proper file system. The chosen file system must be reliable, IP flow information can be exported using different pro-
distributed and efficient for large data sets. Several file systems tocols, in different formats. NetFlow [20], sFlow [21], and
can fulfil these requirements, such as Hadoop Distributed IP Flow Information Export (IPFIX) [18] are designed to
File System (HDFS) [8], MooseFS2 , GlusterFS3 , Lustre[13], handle network monitoring data. Collected data have a variety
Parallel Virtual File System (PVFS)[14]. Despite the variety, of use-cases. They can be used for security purposes, audit,
most of these file systems are missing an integrated processing accountability, billing, traffic engineering, capacity planning,
framework, except HDFS. This capability in HDFS makes it etc.
a good choice as the underlying storage solution.
2) Data Store (HBase): Network monitoring data, and C. Testing Environment
packet header information are semi-structured data. In a short We have implemented, optimized, and tested our suggested
period after their generation, they’re accessed frequently, and solution. The testing environment consists of 19 nodes, which
a variety of information may be extracted from them. Apache deliver Hadoop, HDFS, HBase, ZooKeeper, Hive services. The
HBase4 [15] is the most suitable non-relational data store for configuration for these nodes is as follows: 6x core AMD
this specific use-case. HBase is an open-source implementa- Opteron(tm) Processor 4180, 4x 8GB DDR3 RAM, 2x 3 TB
tion of a column-oriented distributed data source inspired by disks, 2x Gigabit NIC.
Google’s BigTable [10], which can leverage the MapReduce
processing framework of Apache. Data access in HBase is III. C ASE S TUDY: N ORWEGIAN NATIONAL R ESEARCH
key-based. It means a specific key or a part of it can be used AND E DUCATION N ETWORK (NREN)
to retrieve a cell (i.e. a record), or a range of cells [15]. As a This study focuses on the storage and processing of IP flow
database system, HBase guarantees consistency and partition information data for the Norwegian NREN backbone network.
tolerance from the CAP theorem [16] (aka. Brewer’s theorem). Two core routers, TRD GW 1 (in Trondheim) and OSLO GW
3) Processing Framework (Hadoop MapReduce): Process- (in Oslo), are configured to export flow information. Flow
ing large data sets has demanding requirements. The process- information are collected using NetFlow [20] and sFlow [21].
ing framework should be able to partition the data across
a large number of machines, and exposes computational fa- A. Data Volume
cilities for these partitions. The framework should provide Flow information is exported from networking devices at
the abstraction for parallel processing of data partitions and different intervals or events (e.g. 15 seconds of inactivity, 30
tolerate machine failures. MapReduce [17] is a programming minutes of activity, TCP termination flag, cache exhaustion).
model with these specifications. Hadoop is an open source The data are collected in observation points, and then the
implementation by Apache Software Foundation, which will anonymized data are stored for experiments. Crypto-PAn [22]
be used in our study. is used for the data anonymization. The mapping between the
original and anonymized IP address is ”one-to-one”, ”consis-
B. Network Monitoring
tent across traces”, and ”preserves prefix”.
This study focuses on the monitoring of backbone networks. Flow information is generated by processing a sampled set
The observation can be instrumented using Simple Network of packets. Although sampled data is not as accurate as not-
Management Protocol (SNMP) metrics, flow information (i.e. sampled one, studies showed they can be used efficiently for
packet header), and packet payload. SNMP does not deliver network operation and anomaly detection, by means of right
the granularity demanded by our use-case; also storing packets methods [23], [24].
payloads from a high capacity network is not feasible, because There need to be a basic understanding of the dataset for
1 http://hadoop.apache.org/
designing the proper data store. Data characteristics, common
2 http://www.moosefs.org/ queries and their acceptable response times are influential fac-
3 http://www.gluster.org/ tors in the schema design. The identifier for accessing the data
4 http://hbase.apache.org/ can be on one or more fields from the flow information record
TABLE I: Traffic Characteristics Trondheim Gateway 1

Statistics/day src
Traffic Type 2.5e+07 dst
Avg Max Min srcport
dstport
Distinct Source IPs 987104 4740760 122266 flow
Distinct Source IPs and 6083640 13188647 844898 netflow records

Source ports 2e+07

Distinct Destination IPs 1613040 2488893 420686

Distinct Destination IPs 7010330 16379274 1113095
and Destination ports 1.5e+07

Distinct Bidirectional 10683200 21454096 1829854

flows
1e+07
NetFlow records 21962800 44036078 4373665

5e+06
(e.g. source or destination IP addresses, ports, Autonomous
Systems (AS), MACs, VLANs, interfaces, etc. ). Figure 1
0
depicts number of unique source, destination IP addresses, 2012-11-01 2012-12-01 2013-01-01 2013-02-01 2013-03-01 2013-04-01
Time
unique source IP:source port, destination IP:destination port
tuples, unique bidirectional flows (biflows), and flow informa- Fig. 1: Number of distinct source IPs, source IPs: source ports,
tion records per day for the TRD GW 1 in a 5 month period. destination IPs, destination IPs:destination ports, bidirectional
The summary of numeric values for TRD GW 1 and OSLO GW flows, and raw netflow records collected from Trondheim
is presented in Table I. gateway 1.
The average number of flow information records for both
routers is 22 millions per day, which corresponds to 60 GBs of
data in binary form. However, this number can become much A. Choice of Technologies
bigger if flow informations are collected from more sources Apache HBase satisfies our requirements (Section I) such
and the sampling rate is increased. as consistency and partition tolerance. Moreover, the data
staging is affordable by proper configuration of cache feature,
B. Data Access Methods in-memory storage size, in-filesystem storage size, regions
Monitoring data can be accessed for different purposes configuration and pre-splitting for each stage, and etc. For
such as: billing information, traffic engineering, security mon- instance, short-term data can be stored in regions with large
itoring, forensics. These purposes corresponds to a big set memory storage and enabled block cache. The block cache
of possible queries. The schema can be design such that it should be configured such that the Working Set Size (WSS) fits
performs very well for one group of queries. That may lead in memory [25]. While long-term archives are more suitable
to a longer execution time for the other query groups. Our for storage in the filesystem.
main goal is reaching the shortest execution time for security Hive5 is an alternative for HBase, which is not suitable
monitoring and forensics queries. Three types of queries are for our application. It doesn’t support binary key-values, and
studied, IP based: requires fast IP address lookups (e.g. all parameters are stored as strings. This approach demands
specific IPs, or subnets), Port based: requires fast Port address for more storage, and makes the implementation inefficient.
lookups (e.g. specific services), and Time based: requires fast While composite key structure is an important factor for
lookup on a time period. fast data access in the design, it is not supported by Hive.
Network monitoring data, and packet header information Although Hive provides an enhanced query mechanisms for
are semi-structured data. They have arbitrary lengths and a retrieving data, aforementioned issues make it inapplicable to
various number of fields. Storing this type of data as binary our purpose.
files in a distributed file system is challenging. The next section B. Design Criteria
discusses several storage schemas and their applicability to
desired access methods. A table schema in HBase has three major components:
rowkey, column-families, and columns structures.
IV. S OLUTION 1) Row Key: A rowkey is used for accessing a specific part
Two major stages in the life cycle of the monitoring data can of data or a sequence of them. It is a byte array which can have
be considered: short-term processing, and long-term archiving. a complex structure such as a combination of several objects.
Rowkey structure is one of the most important part of our
• Short-term processing: when collected monitoring data
study because it has a great impact on the data access time,
are imported into the data store, several jobs should
and storage volume demand. The followings are our criteria
be executed in real-time. These jobs generate real-time
for designing the rowkey:
network statistics, check for anomaly patterns and routing
• Rowkey Size: Rowkey is one of the fields stored in each
issues, aggregate data based on desired criteria, and etc.
cell, and is a part of a cell coordinate. Thus, it should
• Long-term archiving: Archived data can be accesses for
security forensics, or on-demand statistical analysis. 5 http://hive.apache.org/
be as small as possible, while efficient enough for data ning. Here, three table types are introduced, each addressing
access. one query category: IP-based, Port-based, and Time-based
• Rowkey Length (Variable versus Fixed): Fixed length tables.
rowkeys, and fields help us to leverage the lexicographi- 1) IP Based Tables:
cally sorted rows in a deterministic way. a) T1 (reference table), T2: The rowkey of this table
• Rowkey Fields’ Order (with respect to region load): consists of: source IP address, source port, destination IP
Records are distributed over regions based on regions’ address, destination port, and reverse timestamp (Table II).
key boundaries. Regions with high loads can be avoided Columns in this family are flexible and any given set can
by a uniform distribution of rowkeys. Thus, the position be stored there. Column qualifiers identifier are derived from
of each field in the rowkey structure is important. Statis- fields’ names, and their values are corresponding values from
tical properties of a field’s value domain are determining flow information records. Other tables are designed as sec-
factors for the field position. ondary indexes. They improve access time considerably for
• Rowkey Fields’ Order (with respect to query time): the corresponding query group. Table T1 is used for retrieving
Lexicographic order of rowkeys makes queries on the flow information parameters that are not in the rowkey (e.g.
leading field of a rowkey much faster than the rest. This is number of sent or received packets, bytes, flows)
the motivation for designing multiple tables with different Table T2 has destination address and port in the lead
fields order. Therefore, each table provides fast scanning position. This is used in combination with T1 for the analysis
functionality for a specific parameter. of bidirectional flows.
• Rowkey Fields’ Type: Fields of a rowkey are converted b) T3, T4: are suitable when source and destination
to byte arrays then concatenated to create the rowkey. addresses are provided by the query (Table II). For instance,
Fields’ types have significant effect on the byte array size. when two ends of a communication are known, and we want to
As an example number 32000 can be represented as a analyse other parameters such as: communication ports, traffic
short data type or as a string. However, the string data volume, duration, etc.
type require two times more number of bytes. 2) Port Based Tables:
• Rowkey Timestamps vs. Cell Version: It’s not rec- a) T5, T6: are appropriate tables for service discovery
ommended to set the maximum number of permitted (Table II). As an example, when we want to discover all nodes
versions too high [25]. Thus, there should be a timestamp delivering SSH service (on default port: 22), we can specify
for the monitoring record as a part of the rowkey. the lead fields on T5 and T6 (source and destination ports),
• Timestamps vs. Reverse Timestamps: In the first stage and let the data store returns all service providers and their
of data life cycle, recent records are frequently accessed. clients. If the client c1 is communicating on the port p1 with
Therefore, revere timestamps are used in the rowkey. the server s1 on the port 22 at time ts, then there is a record
2) Column Families: Column families are the fixed part with the rowkey: [22][s1][c1][p1][1-ts] in the data store.
of a table which must be defined while creating the schema. b) T7, T8: can fulfil the requirement for identifying
It’s recommended to keep the number of families less than clients who use a particular service (Table II). The same record
three, and those in the same table should have similar access from T5, T6 will have the rowkey: [22][c1][s1][p1][1-ts].
patterns and size characteristics (e.g. number of rows) [15]. 3) Time Based Tables: OpenTSDB6 is used for storing time
Column family’s name must be of string type, with a short series data. This can be an efficient approach for accessing
length. The family’s name is also stored in the cell, as a part and processing flows of a specific time period. A rowkey in
of the cell coordination. A table must have at least one column OpenTSDB consists of: a metric, a base timestamp, and a
family, but it can have a dummy column with an empty byte limited number of tags in the key-value format. Source and
array. We have used constant value D for our single column destination IP addresses and ports are represented as tags,
family across all tables. and a set of metrics are defined. Five fields from the flow
3) Columns: Columns are the dynamic part of a table information record are chosen as metrics: number of input
structure. Each row can have its own set of columns which and output bytes, input and output packets, and flows.
may not be identical to other rows’ columns. Monitoring data D. Storage Requirement
can be generated by different protocols, and they may not have The storage volume required for storing a single replication
similar formats/fields. Columns make the solution flexible and of a not-compressed record can be estimated using Equation
applicable to a variety of monitoring protocols. (1) as depicted in Table III. However, this estimation may vary
There are several tables with different fields’ orders in considerably if protocols other than NetFlow v5 and sFlow are
rowkeys, but not all of them have columns. Complete monitor- used for collecting monitoring data (e.g. IPFIX raw record can
ing record is just inserted into the reference table, and others be 250 bytes, containing 127-300 fields.)
are used for fast queries on different rowkey fields. Equation (2) is used for calculating the required capacity for
C. Schemas tables T2-T8 (See Table III). These tables don’t have columns
and values, which makes them much smaller than Table 1.
Section III-B explained desired query types and Section
IV-B1 described required properties of a rowkey for fast scan- 6 www.opentsdb.net
TABLE III: Storage requirements IPv4
Est. # records Storage for T1 Storage for T2-T8 Storage for OpenTSDB Total
Single Record 1 (37 ∗ 23) + (133) ∼ 1KB 7tables ∗ 23B = 161B 5metrics ∗ 2B = 10B ∼ 1KB
Daily Import ∼ 20million 1KB ∗ 20 ∗ 106 = 20GB 161B ∗ 20 ∗ 106 ∼ 3GB 10B ∗ 20 ∗ 106 = 200M B ∼ 23GB
Initial Import 20m ∗ 150days ∼ 3 ∗ 109 1KB ∗ 3 ∗ 109 = 3T B 161B ∗ 3 ∗ 109 ∼ 500GB 10B ∗ 3 ∗ 109 = 30GB ∼ 3.5T B

TABLE II: IP Based and Port Based Tables maximum number of written bytes per second is 81 MB/s. The
Table Row Key Query Type task is finished after 45.46 minutes. Therefore, a performance
T1 [sa] [sp] [da] [dp] [1 - ts] Extended queries tuning is required.
T2 [da] [dp] [sa] [sp] [1 - ts]
T3 [sa] [da] [sp] [dp] [1 - ts] Source-Destination B. Performance Tuning
address queries
T4 [da] [sa] [dp] [sp] [1 - ts] Source-Destination The initial implementation of the collection module was not
address queries optimized for storing large datasets. By investigating perfor-
T5 [sp] [sa] [da] [dp] [1 - ts] Service server dis- mance issues, seven steps are recognized as remedies [26],
covery queries
T6 [dp] [da] [sa] [sp] [1 - ts] Service server dis- [25]. These improvements will also enhance query execution
covery queries process, and are applied there as well.
T7 [sp] [da] [sa] [dp] [1 - ts] Service client dis- a) Using LZO compression: Although compression de-
covery queries
T8 [dp] [sa] [da] [sp] [1 - ts] Service client dis- mands more CPU time, the HDFS IO and network utilization
covery queries are reduced considerably. Compression is applied to store files
(HFiles) and the algorithm must be specified in the table
schema for each column family. The compression ratio is
dependent on the algorithm and the data type, and for our
X
|recordT 1 | = |cq| ∗ (|rk| + |cf n| + |cn|) + |cvi |) (1) dataset with the LZO algorithm the ratio is about 4.
i∈cq
b) Disabling Swap: Swappiness is set to zero on data
nodes, since there is enough free memory for the job to
|recordT 2−T 8 | = (|rk| + |cf n|) (2) complete without moving memory pages to the swap [26].
c) Disabling Write Ahead Log (WAL): All updates in a
where: region server are logged in WAL, for guaranteeing durable
|x| = x’s size in byte(s) writes. However, the write operation performance is improved
rk = row key (size = 23B) significantly by disabling it. This has the risk of data loss in
cf n = column family name case of a region server failure [25].
cq = column qualifiers set d) Enabling Deferred Log Flush (DLF): DLF is a table
cn = column qualifier name property, for deferring WAL’s flushes. If WAL is not disabled
cv = column value (due to the data loss risk), this property can specify the flushing
interval to moderate the WAL’s overhead [25].
V. I MPLEMENTATION e) Increasing heap size: 20TB of the disk storage is
A. Data Collection planned to be used for storing monitoring data. The formula
A set of MapReduce jobs and scripts are developed for calculating the estimated ratio of disk space to heap
for collecting, storing, and processing data in HBase and size is: RegionSize/M emstoreSize ∗ ReplicationF actor ∗
OpenTSDB7 . In the MapReduce job, the map task read flow HeapF ractionF orM emstores [27]. This leads to a heap
information files and prepare rowkeys as well as columns size of 10GB per region server.
for all tables. In the next step they are written into the f) Specifying Concurrent-Mark-Sweep Garbage Collec-
corresponding tables. After that another task checks data tion (CMS-GC): Full garbage collection has tremendous over-
integrity by a simple row counting job. This verification is head and it can be avoided by starting the CMS process
not fully reliable, but it is a basic step for the integrity check earlier. Initial occupancy fraction is explicitly specified to be
without scarifying performance. 70 percent. Thus, CMS starts when the old generation allocates
Performance evaluation was performed by processing more than 70 percent of the heap size [26].
records of a single day. The day is chosen randomly from g) Enabling MemStore-Local Allocation Buffers
working days of 2013. The statistical characteristics of the (MSLAB): MSLAB relaxes the issue with the old generation
chosen day represents properties of any other working days. heap fragmentation for HBase, and makes garbage collection
The performance of the implementation is not satisfactory in pauses shorter. Furthermore, it can improve cache locality by
this stage. For HBase, the maximum number of operations allocating memory for a region from a dedicated memory
per second is 50 with the maximum operation latency of area [28].
2.3 seconds. HDFS shows the same performance issue, the h) Pre-Splitting Regions: The pre-splitting of regions has
major impact on the performance of bulk load operations. It
7 Available at: https://github.com/aryantaheri/netflow-hbase can rectify the hotspot region issue and distribute the work load
TABLE IV: Initial region splits for tables T1-T4 (Store file lead to a uniform load in regions.
size in MBytes-Number of store files) In tables T1-T4, regions R4, R5, R10, R12 have big
Region Starting IP address T1 T2 T3 T4 store files compared to the rest of regions. Highly loaded
1 30-1 0-0 0-0 5-1 regions serve entries within the following IP address
2 17.17.17.17 23-1 0-0 0-0 0-0
3 34.34.34.34 32-1 6-1 5-1 0-0
spaces (anonymized) : R4 → [51.51.51.51, 68.68.68.68),
4 51.51.51.51 172-1 22-1 21-1 22-1 R5 → [68.68.68.68, 85.85.85.85), R10 → [153.153.153.153,
5 68.68.68.68 325-1 57-1 57-1 57-1 170.170.170.170), R12 → [187.187.187.187, 204.204.204.204)
6 85.85.85.85 77-1 11-1 10-1 11-1 By investigating these IP address blocks, we identified
7 102.102.102.102 85-1 9-1 13-1 0-0
8 119.119.119.119 57-1 11-1 0-0 11-1 that some of them contains Norwegian address blocks8 and
9 136.136.136.136 102-1 11-1 10-1 11-1 some others are popular services providers. In addition, empty
10 153.153.153.153 543-1 92-1 82-1 97-1 regions contain special ranges such as: private networks and
11 170.170.170.170 21-1 0-0 0-0 0-0
12 187.187.187.187 887-1 138-1 141-1 139-1
link-local addresses.
13 204.204.204.204 73-1 11-1 10-1 11-1 In tables T5-T8, regions R1, R12, R13, R14, R15 have
14 221.221.221.221 5-1 0-0 0-0 1-1 high loads, and they serve the following port numbers: R1
15 238.238.238.238 0-1 0-0 0-0 0-0 → [0, 4369), R12 → [48059, 52428), R13 → [52428, 56797), R14
→ [56797, 61166), R15 → [61166, 65536),
TABLE V: Initial region splits for tables T5-T8 (Store file size For tables T5-T8, R1 covers well known ports (both system
in MBytes-Number of store files) ports and user ports) suggested by Internet Assigned Num-
Region Starting Port number T5 T6 T7 T8 bers Authority (IANA)9 , and R12-R15 contains short-lived
1 197-1 137-1 198-1 137-1 ephemeral ports (i.e. dynamic/private ports). In the empirical
2 4369 7-1 0-0 0-0 0-0
3 8738 0-0 0-0 0-0 0-0
splitting, the difference between system ports, user ports, and
4 13107 0-0 0-0 0-0 0-0 private/dynamic (ephemeral) ports will be taken into account.
5 17476 0-0 0-0 0-0 0-0 A large fraction of records have port numbers of popular
6 21845 0-0 9-1 8-1 0-0 services (e.g. HTTP(S), SSH) or IP addresses of popular
7 26214 0-0 0-0 0-0 0-0
8 30583 0-0 0-0 0-0 10-1 sources/destinations (e.g. Norwegian blocks, popular services).
9 34952 0-0 12-1 0-0 12-1 Therefore, regions should not be split using a uniform distri-
10 39321 0-0 13-1 10-1 12-1 bution over the port number range or the IP address space.
11 43690 9-1 12-1 0-0 12-1
12 48059 37-1 49-1 38-1 60-1
The splitting is improved by taking these constrains into
13 52428 25-1 49-1 26-1 50-1 consideration and the result is significant. The average number
14 56797 25-1 37-1 25-1 38-1 of operations per second is 1600 (x64 more), the latency is
15 61166 26-1 24-1 25-1 25-1 5ms (x80 less), and the job duration is reduced to 6.57 minutes
(x7.5 faster). The results are depicted in Figure 2.

among all region servers. Each region has a start and an end VI. E VALUATION
rowkeys, and only serves a consecutive subset of the dataset.
This section analyses several query types and their response
The start and end rowkeys should be defined such that all
times.
regions will have a uniform load. The pre-splitting requires a
good knowledge of the rowkey structure and its value domain. A. Top-N Host Pairs
Tables T1-T4 start with an IP address, and T4-T8 have a
Finding Top-N elements is a common query type for many
port number in the lead position. Thus, they demand different
datasets. In our dataset, elements can be IP addresses, host
splitting criteria. The initial splitting uses a uniform distribu-
pairs, port numbers, etc. In the first evaluation, a query for
tion function, and later it’s improved by an empirical study.
finding Top-N host pairs is studied for a 150 days period.
IPv4 space has 232 addresses, and the address space is split
These pairs are hosts which have exchanged the most traffic
uniformly over 15 regions, as shown in Table IV. Furthermore,
on the network. The query requires processing of all records
port number is a 16 bit field with 65535 values and the same
in the table T1, and aggregation of input and output bytes for
splitting strategy is applied for it, Table V.
all available host pairs, independent of the connection initiator
The performance gain for storing a single day of flow and port numbers. Table T1 has 5 billion records.
information is considerable. On average, 754 HBase operations
Traditional tools (e.g. NFdump) are not capable of an-
are performed in a second (x30 more operations/s), the average
swering this query, because the long period corresponds to
operation latency is decreased to 27 ms (x14 faster), and
an extremely large dataset. For this purpose, two chaining
the job is finished in 15 minutes (x3 sooner). Despite high
MapReduce jobs are written for HBase tables. The first one
efficiency improvement, there are some hotspot regions which
identifies host pairs and aggregates their exchanged traffic. The
should be investigated more.
second one sorts pairs based on the exchanged traffic.
Tables IV and V show regions’ start keys,the number of
store files, and their sizes. It can be observed that splitting 8 http://drift.uninett.no/nett/ip-nett/ipv4-nett.html

regions using the uniform key distribution function doesn’t 9 http://www.iana.org/

(a) Jobs finishing time (b) HDFS IO

(c) Number of operations per seconds in HBase (d) Operation latency in HBase
Fig. 2: Storage performance under different implementations (SNS: Single day processing without pre-splitting, SS: Single day
processing with a uniform splitting function, SSE: Single day processing with an empirical pre-splitting function)

On average, the first job finishes after 26 minutes, and the

second one after 19 seconds. These are reasonable duration for
processing a large dataset, since, Top-N host pairs queries are
not executed very frequently, and there is no real-time demand
for them.
B. Service Server Discovery for a Given Period
This query type contains time filters which means a subset
of the dataset in the given time range is of interest. The
query can be executed using two methods. The first method
uses HBase tables to retrieve records which satisfy non-time
criteria (i.e. intermediate result). Then, compliant records with
the time filter are returned as the target dataset. Finally, the
target dataset is processed according to the query specifica- Fig. 3: Queries performance evaluation
tions. Since, time criteria can not be evaluated in each data
node10 , the time range filtering is performed in a single node.
Therefore, this method is inefficient when the intermediate
result is large. Figure 3 depicts response times of several queries using
The second method benefits from OpenTSDB. OpenTSDB multiple methods. The first two methods (i.e. HBase, and
key structure simplifies accessing data within a time range, at OpenTSDB) are explained earlier. The other methods use a
the cost of storage volume and response time. This method traditional tool (i.e. NFdump) for retrieving and processing
retrieves records within the time frame, first. Then, other records. NFD1 is executed over the complete dataset, and
filters are evaluated and the final processing is performed. NFD2 processes only a subset of the dataset which satisfies the
This approach is efficient for small periods when the estimated time constrain. HBase outperforms OpenTSDB by an average
number of compliant records with all filters is high. factor of 87 and NFD1 by an average factor of 4472. Its
performance is not comparable with NFD2, since NFD2 has
10 Because the timestamp has a trailing position in the rowkey structure. a limited dataset.
VII. C ONCLUSION [9] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google file system,”
ACM SIGOPS Operating Systems Review, vol. 37, no. 5, p. 29,
The paper proves the effectiveness of a data-intensive Dec. 2003. [Online]. Available: http://portal.acm.org/citation.cfm?doid=
processing framework for delivering scalable and efficient 1165389.945450
[10] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach,
network monitoring services. The proposed mechanism is not M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable,”
dependent on a specific network monitoring protocol, and it’s ACM Transactions on Computer Systems, vol. 26, no. 2, pp. 1–26,
applicable to any protocol as long as rowkey design criteria are Jun. 2008. [Online]. Available: http://portal.acm.org/citation.cfm?doid=
1365815.1365816
satisfied. Data point structure is designed by careful analysis [11] R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver,
and conversion of monitoring record’s fields. Therefore, the and J. Zhou, “SCOPE: easy and efficient parallel processing of massive
collection’s process and storage volume overheads are reduced, data sets,” Proceedings of the VLDB Endowment, vol. 1, no. 2, p.
12651276, 2008.
and real-time data retrieval is accomplished. [12] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad.” ACM
Long-term queries are performed by MapReduce jobs and Press, 2007, p. 59. [Online]. Available: http://dl.acm.org/citation.cfm?
short-term queries are executed through available scanning doid=1272996.1273005
[13] P. Schwan, “Lustre: Building a file system for 1000-node clusters,” in
APIs. These two accessing methods fulfil response time Proceedings of the 2003 Linux Symposium, vol. 2003, 2003.
requirements of planned (e.g. statistical analysis, evidence [14] P. H. Carns, W. B. Ligon, III, R. B. Ross, and R. Thakur, “PVFS: a
gathering) and ad-hoc (e.g. forensics) activities. parallel file system for linux clusters,” in In Proceedings of the 4th
Annual Linux Showcase and Conference. USENIX Association, 2000,
p. 317327.
Further Work [15] L. George, HBase: the definitive guide. O’Reilly Media, Incorporated,
There are several areas which required further study and 2011.
[16] E. A. Brewer, “Towards robust distributed systems,” in Proceedings of
improvement such as: advanced query interface for network the Annual ACM Symposium on Principles of Distributed Computing,
operators and researchers, embedded analytical engine for vol. 19, 2000, p. 710.
statistical studies, robust underlying infrastructure for en- [17] J. Dean and S. Ghemawat, “MapReduce,” Communications of the
ACM, vol. 51, no. 1, p. 107, Jan. 2008. [Online]. Available:
hanced availability, and integration with a cloud platform’s http://portal.acm.org/citation.cfm?doid=1327452.1327492
network management service for a better real-time monitoring [18] B. Claise, Specification of the IP Flow Information Export (IPFIX)
and security enforcement mechanisms using Software Defined Protocol for the Exchange of IP Traffic Flow Information, ser.
Request for Comments. IETF, Jan. 2008, no. 5101, published: RFC
Networking (SDN) technologies. 5101 (Proposed Standard). [Online]. Available: http://www.ietf.org/rfc/
rfc5101.txt
ACKNOWLEDGMENT [19] R. R. Kompella and C. Estan, “The power of slicing in internet
flow measurement,” in Proceedings of the 5th ACM SIGCOMM
The authors would like to thank Olav Kvittem and Arne conference on Internet Measurement, ser. IMC ’05. Berkeley,
Oslebo from UNINETT and Martin Gilje Jaatun from SINTEF CA, USA: USENIX Association, 2005, p. 99. [Online]. Available:
http://dl.acm.org/citation.cfm?id=1251086.1251095
ICT, who provided valuable comments and assistance to the [20] B. Claise, Cisco Systems NetFlow Services Export Version 9, ser.
undertaking of this research. Request for Comments. IETF, Oct. 2004, no. 3954, published:
RFC 3954 (Informational). [Online]. Available: http://www.ietf.org/rfc/
R EFERENCES rfc3954.txt
[21] P. Phaal and M. Lavine, “sFlow version 5,” Tech. Rep., Jul. 2004.
[1] C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk, “Gigascope.” [Online]. Available: http://www.sflow.org/sflow version 5.txt
ACM Press, 2003, p. 647. [Online]. Available: http://portal.acm.org/ [22] J. Fan, J. Xu, M. H. Ammar, and S. B. Moon, “Prefix-
citation.cfm?doid=872757.872838 preserving IP address anonymization: measurement-based security
[2] European Parliament Council, “Directive 2002/58/EC,” Jul. 2002. evaluation and a new cryptography-based scheme,” Computer Networks,
[Online]. Available: http://eur-lex.europa.eu/LexUriServ/LexUriServ.do? vol. 46, no. 2, pp. 253–272, Oct. 2004. [Online]. Available:
uri=CELEX:32002L0058:EN:NOT http://linkinghub.elsevier.com/retrieve/pii/S1389128604001197
[3] B. Li, J. Springer, G. Bebis, and M. Hadi Gunes, “A survey of network [23] D. Brauckhoff, B. Tellenbach, A. Wagner, M. May, and A. Lakhina,
flow applications,” Journal of Network and Computer Applications, “Impact of packet sampling on anomaly detection metrics.” ACM
vol. 36, no. 2, pp. 567–581, Mar. 2013. [Online]. Available: Press, 2006, p. 159. [Online]. Available: http://portal.acm.org/citation.
http://linkinghub.elsevier.com/retrieve/pii/S1084804512002676 cfm?doid=1177080.1177101
[4] Y. Lee and Y. Lee, “Toward scalable internet traffic measurement and [24] V. Carela-Espaol, P. Barlet-Ros, A. Cabellos-Aparicio, and J. Sol-
analysis with hadoop,” ACM SIGCOMM Computer Communication Pareta, “Analysis of the impact of sampling on NetFlow traffic
Review, vol. 43, no. 1, p. 5, Jan. 2012. [Online]. Available: classification,” Computer Networks, vol. 55, no. 5, pp. 1083–1099,
http://dl.acm.org/citation.cfm?doid=2427036.2427038 Apr. 2011. [Online]. Available: http://linkinghub.elsevier.com/retrieve/
[5] D. G. Andersen and N. Feamster, “Challenges and opportunities in inter- pii/S1389128610003439
net data mining,” Parallel Data Laboratory, Carnegie Mellon University, [25] Apache Software Foundation, “The apache HBase reference guide,”
Research Report CMU-PDL-06-102, 2006. http://hbase.apache.org/book.html, 2012. [Online]. Available: http:
[6] H. Balakrishnan, M. Balazinska, D. Carney, U. etintemel, M. Cherniack, //hbase.apache.org/book.html
C. Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tatbul, R. Tibbetts, [26] Y. Jiang, HBase Administration Cookbook. Birmingham: Packt
and S. Zdonik, “Retrospective on aurora,” The VLDB Journal, Publishing, Limited, 2012. [Online]. Available: https://ezproxy.siast.sk.
vol. 13, no. 4, pp. 370–383, Dec. 2004. [Online]. Available: ca:443/login?url=http://proquest.safaribooksonline.com/9781849517140
http://link.springer.com/10.1007/s00778-004-0133-5 [27] Lars Hofhansl, “HBase region server memory sizing,” Jan.
[7] S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken, “The 2013. [Online]. Available: http://hadoop-hbase.blogspot.no/2013/01/
nature of data center traffic.” ACM Press, 2009, p. 202. [Online]. hbase-region-server-memory-sizing.html
Available: http://portal.acm.org/citation.cfm?doid=1644893.1644918 [28] Todd Lipcon, “Avoiding full GCs in apache HBase
[8] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The with MemStore-Local allocation buffers,” Mar. 2011.
hadoop distributed file system.” IEEE, May 2010, pp. 1–10. [Online]. Available: http://blog.cloudera.com/blog/2011/03/
[Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper. avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-3/
htm?arnumber=5496972

Casper Monitoring
No ratings yet
Casper Monitoring
8 pages
High-Speed Network Data Acquisition System
No ratings yet
High-Speed Network Data Acquisition System
5 pages
Real-Time Network Anomaly Detection
No ratings yet
Real-Time Network Anomaly Detection
4 pages
Irjiet-Inspire250281745454223 VLC1
No ratings yet
Irjiet-Inspire250281745454223 VLC1
4 pages
A Ow Trace Generator Using Graph-Based Traffic Classification Techniques
No ratings yet
A Ow Trace Generator Using Graph-Based Traffic Classification Techniques
7 pages
An Empirical Study of Differentially-Private Analytics For High-Speed Network Data
No ratings yet
An Empirical Study of Differentially-Private Analytics For High-Speed Network Data
3 pages
Nokia AWS Kafka Spark
No ratings yet
Nokia AWS Kafka Spark
12 pages
1 s2.0 S1389128621005739 Main
No ratings yet
1 s2.0 S1389128621005739 Main
8 pages
Dissertacao Flow
No ratings yet
Dissertacao Flow
29 pages
764 Stream Processing Pipeline Architecture For Real Time Synchrophasor Analytics
No ratings yet
764 Stream Processing Pipeline Architecture For Real Time Synchrophasor Analytics
11 pages
Network Traffic Analysis: Hadoop Pig VS Typical Mapreduce
No ratings yet
Network Traffic Analysis: Hadoop Pig VS Typical Mapreduce
7 pages
Big Data Analytics - Edited
No ratings yet
Big Data Analytics - Edited
2 pages
Kolomvatsos K., Anagnostopoulos C., Hadjiefthymiades S., An Efficient Time Optimized Scheme For Progressive Analytics in Big Data", Big Data Research, Vol. 2, 2015, S. 155-165
No ratings yet
Kolomvatsos K., Anagnostopoulos C., Hadjiefthymiades S., An Efficient Time Optimized Scheme For Progressive Analytics in Big Data", Big Data Research, Vol. 2, 2015, S. 155-165
11 pages
Data Velocity Scaling
No ratings yet
Data Velocity Scaling
5 pages
Solving A Big-Data Problem With GPU: The Network Traffic Analysis
No ratings yet
Solving A Big-Data Problem With GPU: The Network Traffic Analysis
11 pages
One Sketch To Rule Them All: Rethinking Network Flow Monitoring With Univmon
No ratings yet
One Sketch To Rule Them All: Rethinking Network Flow Monitoring With Univmon
14 pages
Network Monitoring Project
No ratings yet
Network Monitoring Project
30 pages
Payless: A Low Cost Network Monitoring Framework For Software Defined Networks
No ratings yet
Payless: A Low Cost Network Monitoring Framework For Software Defined Networks
9 pages
PayLess: SDN Monitoring Framework
No ratings yet
PayLess: SDN Monitoring Framework
9 pages
Work Traffic Monitoring Analysis System
No ratings yet
Work Traffic Monitoring Analysis System
14 pages
6 Steps Effective Performance Monitoring Strategy W - Sevo116 PDF
No ratings yet
6 Steps Effective Performance Monitoring Strategy W - Sevo116 PDF
6 pages
00889216
No ratings yet
00889216
5 pages
1 s2.0 S095741741730619X Main
No ratings yet
1 s2.0 S095741741730619X Main
13 pages
ToN 2025 PriSketch
No ratings yet
ToN 2025 PriSketch
12 pages
Li Bingdong 201312 PHD Abs
No ratings yet
Li Bingdong 201312 PHD Abs
4 pages
SteadySketch ToN
No ratings yet
SteadySketch ToN
16 pages
Traffic Issue and Concepts
No ratings yet
Traffic Issue and Concepts
14 pages
A D M A & A W I D / P F R N: Meenakshi - RM, Mr.E.Saravanan
No ratings yet
A D M A & A W I D / P F R N: Meenakshi - RM, Mr.E.Saravanan
4 pages
WWWW 329d26
No ratings yet
WWWW 329d26
15 pages
Jifs223295 2
No ratings yet
Jifs223295 2
25 pages
Software and Systems Modeling
No ratings yet
Software and Systems Modeling
17 pages
1 s2.0 S0167819115000460 Main
No ratings yet
1 s2.0 S0167819115000460 Main
13 pages
Computer Network Analysis by Visualization
No ratings yet
Computer Network Analysis by Visualization
43 pages
SFlow Network Monitoring System 01
No ratings yet
SFlow Network Monitoring System 01
3 pages
Network Situation Features Extraction Method of Computer Network Based On Knowledge Graph
No ratings yet
Network Situation Features Extraction Method of Computer Network Based On Knowledge Graph
5 pages
Big Data & Security Training Guide
No ratings yet
Big Data & Security Training Guide
106 pages
ReductStore - White Paper - Review
No ratings yet
ReductStore - White Paper - Review
7 pages
Wren 10
No ratings yet
Wren 10
6 pages
Network Traffic Statistics Huhao
No ratings yet
Network Traffic Statistics Huhao
23 pages
(Comparative Study, Survey Paper) Online ML Techniques For Network
No ratings yet
(Comparative Study, Survey Paper) Online ML Techniques For Network
17 pages
Network Threat Detection with NetFlow
No ratings yet
Network Threat Detection with NetFlow
6 pages
TeMIA-NT: Real-Time Threat Monitoring
No ratings yet
TeMIA-NT: Real-Time Threat Monitoring
16 pages
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
No ratings yet
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
18 pages
1 The 7V of Big Data
No ratings yet
1 The 7V of Big Data
6 pages
Unit - 4
No ratings yet
Unit - 4
3 pages
Sem Report
No ratings yet
Sem Report
24 pages
Unit Ii
No ratings yet
Unit Ii
20 pages
Ref 5
No ratings yet
Ref 5
7 pages
Efficient Storage and Processing of High-Volume Network Monitoring Data
No ratings yet
Efficient Storage and Processing of High-Volume Network Monitoring Data
14 pages
Big Data For 5G Intelligent Network Slicing
No ratings yet
Big Data For 5G Intelligent Network Slicing
7 pages
Flow-Based Network Traffic Generation Using Generative Adversarial Networks
No ratings yet
Flow-Based Network Traffic Generation Using Generative Adversarial Networks
37 pages
Forecasting Model For Data Center Bandwidth2016
No ratings yet
Forecasting Model For Data Center Bandwidth2016
8 pages
U4S9
No ratings yet
U4S9
18 pages
Multi-Protocol Network Monitoring System
No ratings yet
Multi-Protocol Network Monitoring System
10 pages
2014-Network Monitoring Present and Future
No ratings yet
2014-Network Monitoring Present and Future
15 pages
Big Data Analytics For Wireless and Wired Network Design: A Survey
No ratings yet
Big Data Analytics For Wireless and Wired Network Design: A Survey
23 pages
Cisco NCS 540 Release Notes 24.2.11
No ratings yet
Cisco NCS 540 Release Notes 24.2.11
26 pages
Cisco WLAN Controller Guide
No ratings yet
Cisco WLAN Controller Guide
119 pages
Cisco Prime Infrastructure 2.x PDF
No ratings yet
Cisco Prime Infrastructure 2.x PDF
7 pages
CCIE R&S Q and A
100% (1)
CCIE R&S Q and A
88 pages
Use Cisco Ios Xe Hardening Guide
No ratings yet
Use Cisco Ios Xe Hardening Guide
70 pages
QRadar SIEM Implementation Guide 7.2.1
No ratings yet
QRadar SIEM Implementation Guide 7.2.1
27 pages
Enable NetFlow V9 in Cisco Switches With PRTG
100% (1)
Enable NetFlow V9 in Cisco Switches With PRTG
14 pages
PfR Outbound Load Balancing Guide
No ratings yet
PfR Outbound Load Balancing Guide
19 pages
Cisco Catalyst 3750-X and 3560-X Series Switches: Data Sheet
No ratings yet
Cisco Catalyst 3750-X and 3560-X Series Switches: Data Sheet
36 pages
Nta Datasheet
No ratings yet
Nta Datasheet
5 pages
IP Multicast Configuration Guide, Release 12.4T
No ratings yet
IP Multicast Configuration Guide, Release 12.4T
578 pages
Tuning Guide: Ibm Security Qradar
No ratings yet
Tuning Guide: Ibm Security Qradar
40 pages
Unmaintained Design Icons - v2.0
No ratings yet
Unmaintained Design Icons - v2.0
10 pages
PRTG Network Monitor
No ratings yet
PRTG Network Monitor
4 pages
Magic Quadrant For Network Performance Monitoring and Diagnostics
No ratings yet
Magic Quadrant For Network Performance Monitoring and Diagnostics
13 pages
K7 Solutions
100% (1)
K7 Solutions
36 pages
BRKNMS-2573 (2019)
No ratings yet
BRKNMS-2573 (2019)
106 pages
Understanding DoS Logs and Counters RevC
No ratings yet
Understanding DoS Logs and Counters RevC
10 pages
FortiOS 7.6.2 Administration Guide
No ratings yet
FortiOS 7.6.2 Administration Guide
4,077 pages
Electronics: A Comparative Analysis of Cyber-Threat Intelligence Sources, Formats and Languages
No ratings yet
Electronics: A Comparative Analysis of Cyber-Threat Intelligence Sources, Formats and Languages
22 pages
300-410 Exam - Free Actual Q&as, Page 12 - ExamTopics
No ratings yet
300-410 Exam - Free Actual Q&as, Page 12 - ExamTopics
49 pages
SW 6 9 0 Stealthwatch Security Events and Alarm Categories DV 1 5
No ratings yet
SW 6 9 0 Stealthwatch Security Events and Alarm Categories DV 1 5
186 pages
NAM User Book
No ratings yet
NAM User Book
294 pages
Cisco Secure Firewall Threat Defense Hardening Guide, Version 7.2
No ratings yet
Cisco Secure Firewall Threat Defense Hardening Guide, Version 7.2
19 pages
Cisco WAAS Training Overview
No ratings yet
Cisco WAAS Training Overview
144 pages
Vendor: Cisco Exam Code: 400-101 Exam Name: CCIE Routing and Switching Written Exam
100% (1)
Vendor: Cisco Exam Code: 400-101 Exam Name: CCIE Routing and Switching Written Exam
378 pages
Fortios Hardware Acceleration 701
No ratings yet
Fortios Hardware Acceleration 701
141 pages
CISCO图标库
No ratings yet
CISCO图标库
27 pages
Cloudflare Public DNS Resolver Audit Report
No ratings yet
Cloudflare Public DNS Resolver Audit Report
6 pages

Real-Time Network Monitoring Data Processing

Uploaded by

Real-Time Network Monitoring Data Processing

Uploaded by

Real-Time Handling of Network Monitoring Data

Using a Data-Intensive Framework

Source ports 2e+07

Distinct Destination IPs 1613040 2488893 420686

Distinct Bidirectional 10683200 21454096 1829854

regions using the uniform key distribution function doesn’t 9 http://www.iana.org/

On average, the first job finishes after 26 minutes, and the

You might also like