Learning Based Hybrid Routing For Scalability in Software - 2021 - Computer Netw
Learning Based Hybrid Routing For Scalability in Software - 2021 - Computer Netw
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
A R T I C L E I N F O A B S T R A C T
Keywords: Software Defined Network is an emerging paradigm in computer networks. The separation of the control plane
Control plane from the forwarding plane in this arrangement has different aspects. This splitting provides many advantages like
Reinforcement learning easy manageability and configuration. Along with benefits, various issues specific to this paradigm also arise.
Routing Algorithm
Routing management in such a paradigm deals with diverse concerns, objectives, and parameters before selecting
Scalability
Software defined network
the best route. Reinforcement Learning has already proven its strength in distinct fields like business, industry
automation, gaming, algorithms, etc. Even routing in a network can also be made efficient using concepts defined
in reinforcement learning. In this paper, routing within a controller’s area is modeled, keeping scalability in
mind; and an optimal solution is provided using learning. Both proactive and reactive approaches are used for
flow installation, and the link load is utilized optimally. The area under a particular controller is efficiently
routed, and it tweaks the network. Q-learning model helps to learn the optimal path and provide the best route in
case of a failure. Once the learning completes, the model works on it. Preliminary evaluation depicts that
improvement of 78%, 58%, and 47 % is achieved for the number of messages generation when compared with
other already exiting solutions for routing in Software Defined Networks.
1. Introduction and Beacon [2–4], are developed to take advantage of the paradigm. The
behavior of the control plane is directly related to the performance of the
Software defined network (SDN) is a famous paradigm nowadays; it network. Various issues like scalability, security, fault tolerance,
offers more flexible programming and better control for programmable congestion and flow management, etc. [5] are of concern to the control
networks. The main attraction is the separated control plane and data plane only. The control pane’s different functions are routing, topology
plane that provide a logically centralized control model. The separated management, energy management, security management, etc. The
data plane allows vendor-neutral and open network infrastructure. The controller communicates with the devices by using API such as Open
basic idea behind developing such a network is to access network de Flow [6] protocol. OpenFlow defines the implementation of OpenFlow
vices using an open and programmable interface. The 4D project [1] enabled devices and communication between the switching hardware
supported such a paradigm in 2004. SDN simplifies network manage and a network controller. It is one of the primarily used protocols,
ment through the centralized programmability of the network and en whereas the protocols like ForCES, POF, Opflex [7], etc., are also
hances the capability of network devices. In traditional networks, developed to be used for SDN. Currently, the latest version of OpenFlow
proprietary network devices and an insufficiently skilled workforce is 1.5.1, and the main features are the egress table and packet type aware
make it difficult to add or change the client’s new service requirements. pipeline. A flow table entry in OpenFlow enabled switch is constituted of
Control plane and OpenFlow protocol are the two main components distinct fields as match field, priority, action, counter, and timeout, etc.
upon which the SDN paradigm is focused more. Traffic engineering is a mechanism in which traffic of a network is
The control plane in SDN consists of one or more controllers to dynamically analyzed and predicted for a routing decision to the net
handle the devices connected to them. A controller in SDN manages the work’s traffic patterns. One of the core responsibilities of traffic engi
global view and takes optimal decisions depending on the network’s neering is to regulate the traffic in a network in an optimized way [8]. In
current scenario. Different control architectures, such as NOX, Maestro, diverse networks, the objectives of traffic control are distinct. It is used
* Corresponding author.
E-mail address: nayyer.amit@gmail.com (A. Nayyer).
https://doi.org/10.1016/j.comnet.2021.108362
Received 13 December 2020; Received in revised form 23 June 2021; Accepted 23 July 2021
Available online 8 August 2021
1389-1286/© 2021 Elsevier B.V. All rights reserved.
A. Nayyer et al. Computer Networks 198 (2021) 108362
to solve congestion; in other cases, it is used to control and manage other many platforms as they are built on OpenFlow. OpenFlow’s design
traffic protocols. Route optimization, i.e., finding the efficient route to cannot encounter the demands of high-performance networks [9], i.e., it
achieve the desired network performance, plays a crucial role in traffic is not scalable. The issue of scalability for the control plane in the
engineering. Traffic engineering also focuses on minimizing the impact OpenFlow network is one of the problems that need attention [10].
of a node’s failure on network performance and resource utilization. For Research is already going on towards increasing the scalability of SDN.
traffic engineering, one needs to have information about the network The solutions include designing scalable frameworks [11], parallelism
topology, capabilities, traffic, and errors in the network. Significant approaches, and routing solutions [10]. Routing based optimization
benefits of traffic engineering in SDN are better control, reliable de provides an effective way to increase the network’s scalability; thus,
livery, better utilization of resources, advanced services, planned SDN’s paradigm influences researchers for new traffic engineering
maintenance and predict network, etc. Various issues concerning traffic techniques. Merely using the already developed methods for other net
engineering at the control plane are shown in Fig. 1. works in SDN may not provide the best solution to the problem in this
Scalability in naive terms can be defined as a characteristic that al paradigm. Traffic engineering can be much more efficient and intelli
lows the system or network to continuously function even when its size gently implemented than conventional approaches like ATM, IP, and
or traffic is increased manifold. The control plane is the core of any SDN MPLS [8] by using various machine learning techniques in SDN. Since
architecture. It handles all the traffic coming from Northbound API and SDN provides centralized visibility and uses the OpenFlow protocol, it
Southbound API, through which different applications and devices are can be dynamically programmed by a central location. Thus, for this
connected. The number of connected devices to a controller constitutes new network architecture, more intelligent and efficient routing
its area, and area traffic balancing is one of the main issues in the control methods are urgently required.
plane. Proper utilization of switches provides a better routing solution
and less congestion in the area under a controller, hence a more scalable 1.2. Contributions
controller. If the links are not exploited to their capacity optimally, it
may result in congestion and considerable delivery time for the packets, Traffic engineering solutions for efficient routing cannot work until
even though some links may be underutilized. Controlling the flow to traffic balancing is appropriately handled at the data plane. Different
wards the controller and efficient utilization of links in the network is approaches to traffic forwarding have various merits that can be utilized
one of the best solutions for a scalable control plane. for a better solution. Reinforcement learning makes the system capable
of learning, and its feature helps to improve the solutions. This paper
focuses extensively and exclusively on the routing and scalable traffic
1.1. Motivation balancing technique for software defined networks. To the best of our
knowledge, no such technique is available in the literature for routing in
Separation of control and data plane permits innovative and better SDN. Significant contributions of the proposed work are:
network management solutions. These solutions are not attainable on
• An efficient hybrid routing solution using reinforcement learning is
proposed to reduce the number of events towards the control plane.
• Initially, a proactive approach is used while reinforcement learning
plays its role in high traffic on the links.
• During the discovery of reactive flow rules, the scalability of the
control plane is kept in mind. Those switches and links are selected
for which the proactive rules are already there and have fewer
chances of failure.
• A comparison of the proposed hybrid protocol with state-of-the-art
protocols is provided.
1.3. Organization
2. Related work
This section provides insight into the research work that is brought
into consideration for the proposed system model. It starts with a review
of different routing protocols that focus on improving SDN scalability,
followed by approaches that use artificial intelligence-based routing.
2.1. Routing
2
A. Nayyer et al. Computer Networks 198 (2021) 108362
contains the action to forward the packet towards the controller. to reach the end state. Different machine learning techniques are applied
OpenFlow is generally associated with a reactive approach for rule in SDN [18] to provide a better solution to routing optimization prob
installation. lems. Table 1 depicts such solutions.
Source routed forwarding [12] emphasizes increasing the scalability
and performance in SDN. The research work is focused on the reduction 3. Proposed hybrid model
of state distribution to different switches. As the controller knows the
network’s exact situation, it sends a relay path to the source/ingress This section provides an overview of the learning-based hybrid
switch to put in the header. It allows the sender to specify the route of routing algorithm proposed to increase scalability in SDN. Fig. 2 depicts
the packet. The header is inspected by each intermediate switch along the communication diagram between the controller and a switch. The
the path to forward the packet accordingly. This state reduction is events in the diagram are, namely 1) Proactive rules, 2) Bandwidth
directly proportional to the number of links in the path. Due to Open threshold value, 3) Data Packet in, 4) Search for the rule in the flow
Flow networks’ flow-based working, source routing is not suitable for table, 5) Check bandwidth, 6) Compare bandwidth, and 7) Send flows.
the networks as it needs to find a route for each packet. The description of these events is provided in the next paragraph.
Explicit routing in SDN (ERSDN) [13] reduces the number of Proactive rules are those that are assessed during the initialization of
network events to be processed by the control plane. In a reactive the network. Firstly, the controller determines such routes by discov
approach, the total number of events is double that of hops traversed by ering the topology under its control. The routes from all sources to
any packet. In ERSDN, wherever the controller receives a new packet, destinations are prepared, keeping the shortest path in mind. These
the controller installs the flow rules on all the switches that are on the routes are installed at the switches in the form of proactive rules. Pro
way to the optimal path destination. For example, suppose a new packet active rules installation is the first phase of the proposed algorithm. A
needs to traverse through 4 switches. In that case, it needs to generate rule contains the actions to be performed on an incoming packet at the
only four control events for installing flow rules on the switches in switch. It can direct the switch to delete the packet, forward it to the
ERSDN instead of eight control events in a reactive scenario. Explicit controller or to a specific port. During the second phase, the controller
routing has high complexity, and it requires larger Ternary Content predicts a threshold value for various types of services in the network.
Addressable Memory (TCAM). The threshold values for multiple services are selected depending upon
DevoFlow [9] allows switches to make local routing decisions as they the type of services provided by the network. Transmission Control
do not require a controller to consult. It will enable the wildcard rule to Protocol (TCP) packets are mostly generated in the network, they can
select an output port of switch using a probability distribution. Difane have significant traffic, and therefore a considerable bandwidth value
[14] increases the control plane’s scalability by splitting the must be assigned. The purpose of the threshold value is to restrict the
pre-installation of wildcard rule among multiple switches called au traffic on links based on services and priority. On the arrival of a new
thority switches. A controller partitions and distributes the rules among packet, the flow table is searched for a matching rule. Once the rule is
different switches. Difane keeps the traffic in the data plane by selec found in the flow table, action related to that is performed. In case the
tively directing packets through authority switches. Mahout [15] allows action is to forward the packet to a designated link, the link is accessed,
the end host to detect Elephanta flows using TCP buffers of outgoing and the traffic for that specific service is calculated. If the traffic value is
flows. It restricts the need to consult the controller only for mice flows. less than the controller’s threshold bandwidth value, the packet is for
The techniques discussed in [9,14,15] are providing scalability solutions warded to the port. Otherwise, the packet is not forwarded to the
to the control plane, but they need to modify the paradigm in one or
another way.
Table 1
Protocol Oblivious Source Routing (POSR) [16] uses protocol
Approaches used by different studies for routing optimizations in software
oblivious instruction set (POI-FIS) to propose a routing algorithm based defined networks.
on source routing. Similar to Programming Protocol Independent Packet
Source Year Learning Complexity Summary
Processors (P4), POF separates network protocol from forward pro
cessing. A new packet format is proposed, and the packet processing K.K 2017 k-means low Machine learning based
Budhraja routing protocol for SDN.
pipeline is provided with a target to provide unicast, multicast, and link
et al.
failure recovery in the SDN network. The forwarding plane’s program [19]
mability allows packet design and implementation of source routing as S. Sendra 2017 Reinforcement average Select path in such a way
per need and current situation in the network. Experiments are per et al. that minimized network
formed on POF enable SDN network, and results confirm the reduction [20] cost, delay, loss and
bandwidth.
in flow table utilization and small path setup latency in the network.
S. C Lin 2016 Reinforcement average Routing protocol to
DPVisor [17] is a distributed Network Virtualization Hypervisor et al. optimize the rewards
(NVH) system that provides superior programmability based on virtual [21] related to QoS
SDN slicing. The centralized NVH is converted into distributed by parameters.
creating instances of NVH. The workload is distributed among instances G. Stampa 2017 Deep high Routing protocol use
et al. Reinforcement deep reinforcement
to achieve better scalability. Compared with the OpenFlow-based ONOS [22] learning with a target to
benchmark (ONVisor), the performance of the DPVisor is comparable to minimize the network
latency, throughput, and recovery time. delay.
A. Azzouni 2017 Neural average NeuRoute is proposed for
et al. network dynamic routing based
2.2. Reinforcement learning [23] on Machine learning.
C.Wang 2018 Reinforcement average SDCoR provide cognitive
Reinforcement learning is goal-based learning built on interaction et al. learning by using
with the environment, used for making decisions. It learns by trial and [24] reinforcement learning
for dynamic routing on
error to perform actions to maximize the rewards. It is also known as
the internet of vehicles.
learning by doing to achieve the best outcome. In reinforcement R. Alvizu 2017 Neural average Neural networks are used
learning, an actor tries different actions, learns from feedback about et al. network for predicting traffic
whether the action delivers a better result, and then reinforces the action [25] load, which is further
that works. Reinforcement learning finds application in the setups where used to make routing
decisions.
an agent’s start and the end state are known, but there are multiple paths
3
A. Nayyer et al. Computer Networks 198 (2021) 108362
designated port. model supports the idea where data plane is considered as a set of ele
The switch requests a new path from the controller at this stage, as ments that have the functionality of the forwarding and statistics
depicted in Fig. 2(b). The controller needs to reorganize all paths from collection only. Further, the bandwidth utilization state of different links
the switch to the destination. It generates these paths dynamically by must be considered before making a routing decision in the network.
monitoring the traffic on all links. A path with maximum links similar to The objective is to develop an improved routing protocol taking
the proactive rule for the same source to destination is given priority. advantage of learning in SDN’s centralized paradigm.
Maximizing the use of proactive stored rules confine the control plane to
generate fewer messages towards the data plane. A hybrid algorithm is 4.2. Prerequisite
designed to take advantage of proactive and reactive rules depending
upon the network traffic. The availability of multiple routes among switches is the main
Reinforcement learning concepts are implemented at the controller requirement of the proposed model. The advantage of dynamic routing
to learn the optimal routing path from the available paths. A separate cannot be fully utilized if there is only a single path between two
module monitors the paths to understand their behavior. A specific switches. The proactive flow entry should remain for the most extended
reward value is assigned to the path that remains valid for a longer duration in a flow table. The controller’s reactive flow rules are stock up
duration. Selection is based on the reward value wherever available for in the flow table with higher priorities so that during a search, the
the path. The controller’s selected optimal path is sent to the requesting reactive flow rule will be available before any proactively stored rules.
switch and all other switches in the path, as implemented in [13]. The Moreover, switches should be allowed to access the statistics, such as the
switch on receiving the reactive flow rule forwards the packet to the traffic on a link. The switch needs to read such statistics before taking
designated port and stores the rule in its flow table with a higher pri appropriate action.
ority. If present in the flow table, the reactive rule is always executed
first instead of the proactive rule. The reactive rules will time out after a
4.3. Controller design
short period of inactivity and be removed from the flow table. The
switch again works on proactive rules as before in Fig. 2(a). A detailed
The design of the controller includes:
description of each phase is provided in section IV.
4.1. Objectives During a network setup, route assessment is the first job that a
controller needs to execute. The controller evaluates the routes for
Reducing the count of forwarding from and towards the controller is switches under its control. Most of the route generation for packet for
an alternative way to increase the control plane’s scalability in SDN. The warding is based on shortest path algorithms such as Dijkstra and
aim is to design a routing protocol to reduce the flow forwarding related Bellman ford. The controller executes the shortest path algorithm for
to the controller. Unlike other protocols that modify the switches and finding the best path between each pair of switches in the network. It
make routing decisions at the data plane itself [9,14,15], the proposed stores this path information for future decisions. The generated paths are
4
A. Nayyer et al. Computer Networks 198 (2021) 108362
converted into rules and forwarded to the concerned switches. These links. More hops are not desirable, but more proactive hops used in the
proactive rules at the switches provide fast flow traversing, negligible path provide a faster and less event producing route to the packet. In
setup delay, and a decline in the rate of contacting the controller. case the two paths have the same value for the difference of total hops
A bandwidth threshold value for various services is required to be and proactive hops, then the path with less traffic on the links is selected.
provided by the network administrator. The controller can also predict In Eq. (1), the traffic on each path is evaluated to further use in the PV
based on the type of services supported by the network at the initial calculation. Eq. (1) intends to find the average traffic on the path from
phase of the network setup. It can also be updated anytime whenever switch to the destination, which will help decide if the PV of the two
required by the network administrator, depending upon the network’s paths becomes equal.
service request status. The threshold value of a service is used to fix the /
∑n
bandwidth consumption and prioritizing the service in the network. The Traffic(STi ) =
( (
Traffic linkj
))
Thop(STi) (1)
target of inducing bandwidth threshold value for the services is to j=1
restrict the service to occupy full link capacity and jeopardize other
Eq. (2) calculates the path value and set preference to the number of
essential services. The bandwidth threshold value is compared with the
hops in the paths. The traffic of the path STi is added to the number of
link traffic before forwarding a packet to that link during real traffic. The
hops, and finally, the result is subtracted from a constant value 100 so
threshold value is provided to the concerned switches by the controller.
that the highest PV will result for the smaller number of hops with low
The method for handling requests from the switches in hybrid mode
traffic. In the proposed model, path value can be formulated as:
and the Q-learning model of the controller are described separately in
( ( ))
the next two subsections IV D and IV E. Table 2 offers various symbols Traffic(STi )
PVi = 100 − (Thop − Phop) + (2)
used to model the problem. 100
We illustrate the problem and controller response with an example
4.4. Hybrid model
shown in Fig. 3. A denser setup similar to the figure is provided in [16].
The controller needs to calculate the best route for each PACKET_IN
In this model, topology T= (S, L) with L= {L1, L2, L3…. Lm,} and S=
event that it receives from the switch. To check the capability of HRRL,
{S1, S2, S3…. Sn} is assumed. On receiving a path request message from
TCP is implemented in the network. TCP tries to use as much as band
the switch, the controller first retrieves the information regarding the
width as possible by increasing its congestion window as long as its
source, destination, and the overloaded link from the message; let Lc is
packets are acknowledged within timeout. No doubt, in the network
the link that makes the path Psd infeasible. The detailed steps of the
UDP traffic also exists but in real networks most (about 85%) of the
proposed routing protocol on receiving a message from the switch is
internet traffic is TCP-based.
listed as follows:
For a new TCP packet received at S3 with destination S9, the switch
Step 1: Controller generates a set of paths that includes all possible
S3 search for a rule to forward the packet. The proactive path assumed
routes for packet forwarding excluding Lc, from the requesting switch to
for source S3 and destination S9 is S3→ S4 → S6 → S8 → S9. The flow
the destination, i.e. {ST1, ST2, …. STn}. Another set of paths that con
rule at S3 is to forward the packet to the port connected to S4. The
tains routes between source to destination is generated only when the
current TCP bandwidth utilization value of the link at S3 →S4 is smaller
first set is empty, i.e., the path between the requesting switch and the
than the TCP bandwidth threshold value assumed to be 70 % of the
destination does not exist, but an alternative path between source and
link’s capacity. Therefore, the packet is forwarded to the link. Once the
destination exists.
packet reached S4, it will search the port/link to forward the packet. As
Step 2: After monitoring the real-time traffic status for each link in
per the proactive rule, S4 will forward the packet to S6. The comparison
ST, paths with greater traffic than the bandwidth threshold value for the
is made between the bandwidth threshold value with the link’s current
required service at any of its links are excluded.
traffic value, i.e., 70 %, as per Table 3. The table is a randomly generated
Step 3: The traffic is calculated for each path in the set as per Eq. (1).
set of values for different connections. The traffic at the link S4 → S6 is
Step 4: The path with the highest PV, i.e., path value as per Eq. (2) is
not less than the bandwidth threshold value provided for the service. S4
selected for implementation when the path’s Q-value is not available, as
is restricted to forward the packet towards S6. At this moment, a request
discussed in IV E.
is generated towards the controller for a new path. On receiving a new
Step 5: The requesting switch and others in the selected path route
path request, the controller gathers information such as source node,
are updated for this packet forwarding.
destination node, congested link (i.e., S3, S9, 4-6), and the requesting
Each path’s PV function depends on the total number of hops, the
switch, i.e., S4. As per the flowchart, the controller generates paths from
number of used proactive hops, i.e., the hops/links that are also there in
S4 to S9. It then discards those paths from the path set, which contain
the proactive path for the same source and destination, and traffic on the
overloaded links. The remaining paths are shown in Table 4.
The PV is calculated as per Eq. (2), and the path with the highest
Table 2 value, i.e., S4→S5→S10→S8→S9 is selected for the packet forwarding.
List of symbols and their meaning.
Accordingly, the switches S4, S5, and S10 are updated in rules to handle
T=(S,L) The Topograph of SDN under a controller a new packet. S8 will not be updated regarding this new route as an
S Set of switches already stored proactive rule will be utilized. The use of this route re
L Set of Links duces the number of events generated in the network. If a reactive
Lc Congested link process is implemented, four requests are there towards the control
Psd Path from source to destination
plane one by one, each by S4, S5, S10, and S8. For each request, the
ST Set of paths
Thop Total number of hops in a path controller will generate one reply message. In the proposed routing
Phop Proactive hops in a path protocol, one request is generated by S4, and in reply to it, the controller
PV Path Value after applying the equations generates three messages to S4, S5, and S10 only.
SS State space in Q learning
The controller’s process is presented in a flowchart, representing
A Action space in Q learning
R Reward function in Q learning
Hybrid Routing with Reinforcement Learning (HRRL).
NQ(sst , at ) New state action pair
Q(sst , at ) Old State action pair 4.5. Q-learning model
А Learning rate
Discount factor
Reinforcement learning is a paradigm of the learning processes. It is a
Γ
5
A. Nayyer et al. Computer Networks 198 (2021) 108362
6
A. Nayyer et al. Computer Networks 198 (2021) 108362
(continued )
7: Else increment s-value to 0.05;
8: Calculate Q-value based on s-value and number of successful transmissions;
9: Update Q-value for Q(state, path);
(continued )
3: Repeat
4: Observe current path Pi; 4.7. Architecture
5: In case of any congested link in the path, reduce the s- value to 0.05
6: Pi=Pi+1; The architecture in [24] includes a 3-layer design of the network. It
(continued on next column) divides the data plane into two different layers, where one contains the
7
A. Nayyer et al. Computer Networks 198 (2021) 108362
SDN wireless access infrastructure and the other consist of SDN wireless of controllers. This proposal’s target network is a small office network, a
nodes. The lower layer senses the network environment and the learning company, or a data center network. It cannot replace WAN or B4
pattern of dynamicity. It quantifies the environmental information. (OpenFlow in Google’s data center WAN). It is challenging to treat many
Another model, called the learning model, collects the strategies and packet ins and create a dynamic path in an extensive network. The
feedback for making decisions. The optimal decision is taken from the proposal focuses on a subnet controller that is responsible for handling a
learning module. It also accumulates correct and effective decisions for small number of OpenFlow devices.
the future plan. This learning module is employed in the SDN controller One argument may exist that in the HRRL model, the controller’s
of the network. path assessment increases its response time. In the reactive model for the
Similarly, a distributed hierarchal control plane architecture is pro OpenFlow protocol, the controller also needs to assess different routes to
posed in [21] to minimize the signaling delay by implementing three provide a reactive response. Therefore, the path assessment is not
layers’ design and examining action, policy, quality function, long term exclusive to HRRL only. Secondly, the proposed algorithm requires a
rewards, etc., by using reinforcement learning. Switches take charge of relatively large memory space for proactive rules installation. The
information collection and data forwarding, while the slave controller memory space can be efficiently utilized by removing the proactive rule
provides read-only access to switches and receives port status messages. with the reactive one based on LRU policy. Whenever the reactive rule is
Markov decision process with QoS aware reward function is used, and an time out, a request can be generated for a proactive rule to the
efficient, adaptive, QoS provisioning routing is provided to the system. controller. Alternatively, the controller sends the proactive rule again
The super controller is responsible for regulating the entire network once the timeout period for the reactive rule expires. Another argument
while the domain controller is in charge of signaling within its subnet. that other better artificial intelligence techniques like deep learning may
The hierarchical structure is exploited to minimize the signaling delay provide better solutions than the proposed one. The implementation of
between controllers. In the proposed system model, the switch act as a such methodologies at each subnet controller will increase imple
sensing device for the traffic patterns, an analogy to that in [24], and a mentation cost, as extra processors and memory are required to put on
hierarchal setup is supported as therein [21]. It either allows the packet each subnet controller. Undoubtedly, reinforcement learning imple
to flow on the link; or senses the situation and contacts its controller for mentation also requires additional memory and processing resources,
the new rule in the current scenario. The routing decisions are made at but the requirement of such resources in the proposal is much lower. The
the controller using reinforcement learning. The architecture is proposal suits the needs and performance requirements of the subnet
explained with the help of Fig. 6. controller in the SDN network.
The diagram depicts the area under different controllers. A controller
is limited to routing within the area under its control, and three con 5. Experimental validations
trollers are connected. These communication links between controllers
can be used for different purposes like synchronization, load balancing, In this section, a discussion on the experimental setup is made. A
information exchange, etc. A switch is connected to two different con network using Mininet [33] is deployed to evaluate the effectiveness of
trollers in the diagram; as in real scenarios, there can be a switch with the proposed method. The simulation setup is presented, followed by an
multiple controllers. Moreover, switch migration [29] is possible only analysis of results.
when the situation depicts as in Fig. 6. For getting the overview of traffic Simulation Settings
conditions at switches, statistics are collected by the controller from Mininet is a famous and mostly used emulator for academic research
time to time. The approaches to collect statistics are broadly classified as in the field of SDN. The switches used in experiments are Open vSwitch
push-based flow monitoring and-pull based flow monitoring [30]. Each (OVS), the system is implemented on Intel Core i-5-7200 2.50 GHz
strategy has different characteristics for measurement, overhead, and processor with RAM of 8GB and Ubuntu 20.04. Flow arrival rate gen
accuracy. The switches are only responsible for collecting and reporting erates Poisson distribution to generate dynamic traffic in the network at
network status and forwarding packets as per rules. Traffic matrix [31] 100 ~1000 flows per second. Three Ryu [34] controllers are imple
is an important tool for monitoring the traffic between source and mented to present inter-controller communications and packet for
destination in the network. In [32], OpenTM is presented, a traffic warding. Controllers are configured to listen on different port numbers.
matrix tool for OpenFlow networks. Thirty hosts are attached to each switch. Each addition of a switch in
creases 1 switch and 30 hosts in the network. Only the number of
4.8. Application switches are added in a topology for performance evaluation of the
network. Topologies are implemented as a hybrid one, using custom
Reducing the number of packet generation increases the scalability topology feature of Mininet. For sparse topology, as shown in Fig. 3,
8
A. Nayyer et al. Computer Networks 198 (2021) 108362
there are a total of 13 switches and, therefore 390 hosts under one number of messages induced in the control plane and data plane.
controller. Switches are connected so that a maximum of 4 switches can Simulation criteria are set for sparse and dense topology setup. A TCP
be connected to a single switch. For dense topology, there is a total of 26 connection establishes with the help of 3 way handshake. The protocols
switches and 780 hosts under one controller. Switches in dense topology initiate and acknowledge a connection. Once the connection is estab
are connected so that a maximum of 7 switches can be connected to a lished, data transfer begins and the connection is terminated once the
single switch. A confidence interval is useful for depicting variation process finished. There is an overhead of extra messages in TCP. On the
around a point of estimates. 15 samples are used and the mean of sample other hand UDP did not implied any handshaking dialouges. It did not
for the same input on an algorithm are calculated. For the simulation consider errror checking and correction as important and hence avoid
setup i.e. sparse and dense, each result is generated 15 times. Due to the overheads in term of number of messages.
dynamicity of the model, the results may vary during each iteration and As shown in Fig. 7(a), Reactive routing generates the most messages,
therefore point estimate of 15 iterations are taken as a final result. whereas ERSDN generates fewer messages than the Reactive protocol.
Table 5 summarize the network setup. Few messages are generated for control plane in HRRL and therefore it
The bandwidth threshold value is initialized by the controller at the remains at the bottom. The messages generated by SDCoR is almost
network initialization state and provided to the switches and rules. similar to ERSDN. The results support the fact that HRRL generates a low
Meter tables are implemented in the switches for implementing band number of messages. Due to the availability of proactive rules and low
width check per-flow and per-service basis. This feature is available in traffic in the network, the controller must not consult. Therefore, no
OpenFlow 1.3 and above. A switch will not forward the packets to the control plane events are generated for a new packet in the network. A
duplex link if port statistics results in link bandwidth utilization as large number of messages are generated for small number of request due
greater than the threshold value. Using the FlowManager application of to the reason that TCP is implemented in the network and for each
Ryu, manipulation of OVS switches in real-time is implemented. The packet it need to transfer, extra messages are generated for the purpose
controller selects the threshold value so that the high priority services of acknowledgements. The same setup in UDP generates very less
and high bandwidth-demanding services are provided a dedicated
bandwidth. There are minimal chances of collision or link overloading
during data transfer by the switch. Providing a bandwidth threshold
value for services restricts a switch to forward the packet only as per
proactive installed rules and therefore control-plane events are
generated.
Methodology
Routing based simulation model is built for two different topologies:
sparse and dense, to evaluate the proposed algorithm, named Hybrid
Routing using Reinforcement Learning (HRRL). The performance is
compared with three other routing protocols. The first one is Explicit
Routing in SDN (ERSDN) [13] developed to increase the control plane’s
scalability. It proposes to install rules on all the intermediate switches to
the path once they receive a flow request message from the ingress
switch. The intermediate switches forward the packet without contact
ing the controller. The second method is the OpenFlow based reactive
model [6], in which for every new packet, the switch is directed to
consult the controller. Finally, third method SDCoR [24] used rein
forcement learning for dynamic routing on internet of vehicles.
The primary metrics included for measuring the performance are
Messages generated, Response Time, and flow entries installed. The
message generated here refers to the total number of messages generated
at both the data plane and control plane till the transmission of the
packets completes. Response time is the time from the packet generation
to when the first response is received from the controller for its for
warding. It is dependent upon the controller’s load during the traffic
flow. Flow entries are the rules in the flow table of the switches in the
network.
Result 1: As HRRL is designed to reduce the number of messages
towards the control plane, the experiment is performed to check the
performance. The hosts generated different messages to check the total
Table 5
Network setup.
Mininet Version 2.2.2
Ryu version 4.30
Open VSwitch version 2.13.0
OpenFlow Version 1.3
Link capacity 5 Gbps
Ubuntu 20.04
Topologies Sparse, Dense
TCP Threshold value 70
Sample for results 15 times
Learning rate 0.05
Discount factor 0
Fig. 7. (a): Number of messages generated in sparse topology. (b): Number of
Rewards +ive
messages in dense topology.
9
A. Nayyer et al. Computer Networks 198 (2021) 108362
number of messages in both planes. During the dense topology and high
traffic loads, the reactive protocol’s performance remains similar; even
the ERSDN performs the same because the controller installs the flow
rules on the ingress switches only after checking the network’s traffic
load. The performance of SDCoR is better in dense topology due to
learning process. The messages generated by the proposed HRRL in
creases with the increase of traffic in the network. This is because the
proactively installed rule will not work due to traffic on the link, and a
switch needs to contact the controller. After checking the network
conditions, the controller directs the switch to forward the packet from
some other port while considering the maximum use of proactive rules
installed in the network. Fig. 7(b) depicts as discussed.
Result 2: The response time constitutes searching time for the best
route at the controller and getting a reply from the controller for a new
path. In the case of low link traffic on the port, the flow latency of HRRL
has only related to searching the flow table of the switch and getting port
statistics. As HRRL deals with parameters inside the switch, the flow
setup latency of the network during low traffic is negligible as the
controller’s response is not required. During high traffic and link utili
zation state, the flow setup latency of HRRL is of concern. The con
troller’s routing decision using already provided flow rules depends on
the number of alternative paths between source to destination. Fig. 8
depicts the same. The average latency of both SDCoR and HRRL remains
nearly the same.
As depicted in Fig. 8, the controller’s average response time is similar
to the three methods. The reactive is providing the fastest response,
whereas ERSDN is marginally greater than the reactive one. HRRL and
SDCoR has a slightly high response time from others as the topology
modeled is the dense and large number of paths between two switches
constraints the controller for larger response time. Moreover, the
learning consumes time.
Result 3: Figs. 9(a) and 9(b) Figure 9 show the flow entries installed
for all flows under different flow arrival rates. The average of 15 runs is
taken for the same flow rate. As depicted in the figures, the HRRL has the
lowest number of flows in sparse topology and dense topology. SDCoR is
second in place for flows generated. ERSDN has the lowest flows
generated as compared to the reactive one. HRRL reduces the number of
flow entries by 47% compared to SDCoR, 58% to ERSDN and 78% to the
reactive approach in the sparse topology where most of the proactive
Fig. 9. (a): Total flow entries in sparse topology. (b): Total flow entries in
rules are used, and very low flows entries are generated. HRRL reduces
dense topology.
the number of flow entries by 19% to SDCoR, 41% to ERSDN and 68% to
Reactive one. Since HRRL only request a flow entry where traffic is high,
it needs to install the minimum flow entries. scalable networks. SDCoR is another reinforcement based method for
The evaluation reveals the effectiveness of the proposed HRRL al routing in the Internet of Vehicles. A comparison with the pure reactive
gorithm compared to Explicit Routing, which is also developed for approach of OpenFlow is also made. As HRRL is specific for a controller
and switches under its control, results under a single controller with
different switches and hosts are considered for evaluation. When there
are many flows and the load of links is very high, HRRL is similar to
SDCoR and both are not better than Explicit Routing in terms of the
controller’s response. HRRL provides better performance in terms of
messages generated as compared to other three discussed routing
methods.
6. Conclusion
10
A. Nayyer et al. Computer Networks 198 (2021) 108362
reactive model, 58% compared to ERSDN and 47% when compared with [21] G. Stampa, M. Arias, D. Sanchez-Charles, V. Muntes-Mulero, and A. Cabellos, “A
deep-reinforcement learning approach for software-defined networking routing
SDCoR.
optimization,” arXiv preprint arXiv:1709.07080, 2017.
In the future, HRRL may be implemented in a scalable framework to [22] A. Azzouni, R. Boutaba, and G. Pujolle, “NeuRoute: Predictive dynamic routing for
explore its performance on different parameters such as varied link ca software-defined networks,” arXiv preprint arXiv:1709.06002, 2017.
pacity in the network. [23] C. Wang, L. Zhang, Z. Li, C. Jiang, ‘SDCoR: Software defined cognitive routing for
Internet of vehicles, IEEE Internet Things J. 5 (5) (Oct. 2018) 3513–3520.
[24] R. Alvizu, S. Troia, G. Maier, A. Pattavina, Matheuristic with machine-learning-
Declaration of Competing Interest based prediction for software-defined mobile metrocore networks, IEEE/OSA J.
Opt. Commun. Netw. 9 (9) (Sept. 2017) D19–D30.
[25] M. Bitvinick, S. Ritter, J.X. Wang, Z. Kurth-Nelson, C. Blundell, D. Hassabis,
The authors declare that they have no known competing financial Reinforcement learning, fast and slow, Trends Cognit. Sci. 23 (Apr. 2019) 408–422.
interests or personal relationships that could have appeared to influence [26] R. Nian, J. Liu, B. Huang, A review on reinforcement learning: introduction and
the work reported in this paper. applications in industrial process control, Comput. Chem. Eng. 139 (Aug 2020)
1–30.
[27] M.Kim B.Jang, G. Harerimana, J.W. Kim, Q-learning algorithms: a comprehensive
References classification and applications, IEEE Access 7 (Sep. 2019) 133653–133667.
[28] A. Dixit, F. Hao, S. Mukherjee, T. Lakshman, R. Kompella, Elasticon: an elastic
[1] A. Greenberg, G. Hjalmtysson, D.A. Maltz, A. Myers, J. Rexford, G. Xie, H. Yan, distributed SDN controller, in: Proc. 10th ACM/IEEE Symp. Architect. Netw.
J. Zhan, H. Zhang, A clean slate 4d approach to network control and management, Commun. Syst., 2014, pp. 17–28.
ACM SIGCOMM Comput. Commun. Rev. 35 (5) (Oct. 2005) 41–54, 10.1145/ [29] Y. Jarraya, T. Madi, M. Debbabi, A survey and a layered taxonomy of software-
1096536.1096541. defined networking, IEEE Commun. Surv. Tut. 16 (4) (2014) 1955–1980. Fourth
[2] N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. McKeown, S. Shenker, Nox: Quart.
towards an operating system for networks, SIGCOMM Comput. Commun. Rev. 38 [30] A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya, C. Diot, Traffic matrix
(3) (Jul 2008) 105–110, 10.1145/1384609.1384625. estimation: existing techniques and new directions, ACM SIGCOMM Comput.
[3] Z. Cai, A.L. Cox, T.S.E. Ng, “Maestro: A System for Scalable OpenFlow Control, Rice Commun. Rev. 32 (4) (Oct. 2002) 161–174.
University, Dec. 2010. Tech. Rep. TR10-11. [31] A. Tootoonchian, M. Ghobadi, and Y. Ganjali, “OpenTM: traffic matrix estimator
[4] D. Erickson, The beacon openflow controller, in: Proceedings of the Second ACM for OpenFlow networks,” in Passive and Active Measurement. Berlin, Germany:
SIGCOMM Workshop on Hot Topics in Software Defined Networking, in: HotSDN Springer-Verlag, 2010, pp. 201–210.
’13, 2013, pp. 13–18, 10.1145/2491185.2491189. [32] B. Lantz, B. Heller, N. McKeown, A network in a laptop: Rapid prototyping for
[5] A. Nayyer, A.K. Sharma, L.K. Awasthi, Issues in software-defined software-defined networks, in: Proc. 9th ACM SIGCOMM Workshop HotNets-IX 19,
networkingProceedings of the Second International Conference on 2010, pp. 1–19, 6.
Communication, Computing and Networking, Springer 46 (Sep. 2018) 989–997. [33] Ryu. [Online]. Available: https://ryu.readthedocs.io/en/latest/getting started.
Lecture Notes in Networks and Systems10.1007/978- 981- 13- 1217- 5 _ 97. html 2021.
[6] Open Networking Foundation, “OpenFlow Switch Specification, Version 1.5.1,”
https://www.opennetworking.org/images/stories/downloads/sdn-resources/onf-s
pecifications/openflow/openflow-switch-v1.5.1.pdf, Mar. 2015.
Amit Nayyer received the B.Tech degree in computer science
[7] D. Kreutz, et al., Software-defined networking: A comprehensive survey, Proc. IEEE
and engineering from Punjab Technical University, Jalandhar,
103 (1) (Jan. 2015) 14–76, https://doi.org/10.1109/JPROC.2014.2371999.
India, in 2004 and the M.Tech. (Computer Science and Engi
[8] F. Akyildiz, A. Lee, P. Wang, M. Luo, W. Chou, A roadmap for traffic engineering in
neering degree) from National Institute of Technology
SDN-OpenFlow networks, Comput. Netw. 71 (Oct. 2014) 1–30, https://doi.org/
(Deemed to be University), Hamirpur, India, in 2009. He is
10.1016/j.comnet.2014.06.002.
working as research scholar (Ph.D) with department of Com
[9] A.R. Curtis, J.C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma, S. Banerjee,
puter Science at Himachal Pradesh University, Shimla, India.
DevoFlow: scaling flow management for high performance networks, Comput.
His main research interests include Software Defined
Commun. Rev. 41 (4) (Aug. 2011) 254–265.
Networking, Energy Efficiency, Adhoc network, and Sensor
[10] M. Karakus, A. Durresi, A survey: control plane scalability issues and approaches in
Networkse-mail: nayyer.amit@gmail.com
software-defined networking (SDN), Comput. Netw. 112 (Jan. 2017) 279–293,
https://doi.org/10.1016/j.comnet.2016.11.017.
[11] A. Nayyer, A.K. Sharma, L.K. Awasthi, Laman: a supervisor controller based
scalable framework for software defined networks, Comput. Netw. 159 (2019),
https://doi.org/10.1016/j.comnet.2019.05.003012, 125134, Aug.43–44, 2012.
[12] H. Owens, A. Durresi, Explicit routing in software-defined networking (ERSDN):
addressing controller scalability, in: 17th International Conference on Network- Aman Kumar Sharma is a Professor of Computer Science at
Based Information Systems (NBiS), 2014. Himachal Pradesh University, Shimla, India. His research in
[13] M. Yu, J. Rexford, M.J. Freedman, J. Wang, Scalable flow-based networking with terest includes Cloud Computing, Engineering Quality. He has
difane, in: SIGCOMM’ 10, Proceeding of ACM SIGCOMM 2010 Conference, Aug. published over 70 papers in international journals and over
2010, pp. 351–362. 100 papers in conference proceedings.e-mail: sharmaas1@g
[14] A.R. Curtis, W. Kim, P. Yalagandula, Mahout: low-overhead datacenter traffic mail.com
management using end-host-based elephant detection, in: Proc. IEEE INFOCOM,
Apr. 2011, pp. 1629–1637.
[15] S. Li, K. Han, N. Ansari, Q. Bao, D. Hu, J. Liu, S. Yu, Z. Zhu, Improving SDN
scalability with protocol-oblivious source routing: a system-level study, IEEE
Trans. Netw. Serv. Manag. 15 (1) (Mar. 2018) 275–288, https://doi.org/10.1109/
TNSM.2017.2766159.
[16] H Huang, B Niu, S Tang, S Li, S Zhao, K Han, Z Zhu, Realizing highly-available,
scalable, and protocol independent vSDN slicing with a distributed network
hypervisor system, IEEE Access 6 (2018) 13513–13522, https://doi.org/10.1109/
ACCESS.2018.2813405.
[17] J. Xie, F.R. yu, T. Huang, R. Xie, J. Liu, C. Wang, Y. Liu, A survey of machine Lalit Kumar Awasthi is a Professor of Computer Science at
learning techniques applied to software defined networking (SDN): Research issues National Institute of Technology, Hamirpur, India. Currently
and challenges, IEEE Commun. Surveys Tuts. 21 (1) (2019) 393–430, 1st Quart. he is Director at Dr. B.R Ambedkar NIT, Jalandhar, India. His
[18] K.K. Budhraja, A. Malvankar, M. Bahrami, C. Kundu, A. Kundu, M. Singhal, Risk- research interest includes Check pointing, Mobile computing,
based packet routing for privacy and compliance preserving SDN, in: Proc. IEEE Sensor Networks and P2P Networks. He has published over
CLOUD’17, Honolulu, CA, USA, 2017, pp. 761–765. June. eighty papers in journals and over 200 papers in conference
[19] S. Sendra, A. Rego, J. Lloret, J.M. Jimenez, O. Romero, Including artificial proceedings.e-mail: lalitdec@gmail.com
intelligence in a routing protocol using software defined networks, in: Proc. IEEE
ICC Workshops’17, Paris, France, May. 2017, pp. 670–674.
[20] S.C. Lin, I.F. Akyildiz, P. Wang, M. Luo, QoS-aware adaptive routing in multi-layer
hierarchical software defined networks: a reinforcement learning approach, in:
Proc. IEEE SCC’16, San Francisco, CA, USA, 2016, pp. 25–33. June.
11