0% found this document useful (0 votes)
12 views11 pages

Resource_Usage_Cost_Optimization_in_Cloud_Computing_Using_Machine_Learning

Uploaded by

mprachi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
12 views11 pages

Resource_Usage_Cost_Optimization_in_Cloud_Computing_Using_Machine_Learning

Uploaded by

mprachi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 11

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 10, NO.

3, JULY-SEPTEMBER 2022 2079

Resource Usage Cost Optimization in Cloud


Computing Using Machine Learning
Patryk Osypanka and Piotr Nawrocki

Abstract—Cloud computing is gaining popularity among small and medium-sized enterprises. The cost of cloud resources plays a
significant role for these companies and this is why cloud resource optimization has become a very important issue. Numerous
methods have been proposed to optimize cloud computing resources according to actual demand and to reduce the cost of cloud
services. Such approaches mostly focus on a single factor (i.e., compute power) optimization, but this can yield unsatisfactory results in
real-world cloud workloads which are multi-factor, dynamic and irregular. This article presents a novel approach which uses anomaly
detection, machine learning and particle swarm optimization to achieve a cost-optimal cloud resource configuration. It is a complete
solution which works in a closed loop without the need for external supervision or initialization, builds knowledge about the usage
patterns of the system being optimized and filters out anomalous situations on the fly. Our solution can adapt to changes in both system
load and the cloud provider’s pricing plan. It was tested in Microsoft’s cloud environment Azure using data collected from a real-life
system. Experiments demonstrate that over a period of 10 months, a cost reduction of 85 percent was achieved.

Index Terms—Cloud resource usage prediction, anomaly detection, machine learning, particle swarm optimization, resource cost
optimization

1 INTRODUCTION companies provision resources with a large safety margin


just to avoid unexpected emergencies. Sometimes they add
OMPUTER systems are currently often located in com-
C puting clouds such as Amazon Web Services (operated
by Amazon), Azure (operated by Microsoft), Google Cloud
to these resources when a problem emerges and leave them
at high levels even after the problem has been fixed. More-
over, Anders and Edler [1] estimate that in 2030, data centers
Platform (operated by Google) and many others. A com-
will use around 3–13 percent of global electricity, and this is
puting cloud provides storage, network and computing
why reducing provisioned resources is also important in
resources to anyone who needs them. There are different
order to protect the environment.
cloud usage models, i.e., Infrastructure as a Service (IaaS),
A cloud provider offers different components (i.e., virtual
Platform as a Service (PaaS) or Software as a Service
machines (VM) or databases (DB)), and each component con-
(SaaS), but all of them reduce management effort and
sists of different properties (i.e., compute power (CPU), random
downtime risk while providing high-scalability possibili-
access memory size (RAM), disk capacity or input/output
ties when compared to on-premise solutions. Scalability
operations per second (IOPS)) (Fig. 1). Our idea is to automate
means that new instances of services (PaaS), virtual
the process of scaling system components while taking into
machines (IaaS) or databases (databases are partially SaaS
account the predicted usage level. In the process, we take into
and partially PaaS) can be added as required. In many sys-
consideration the usage of virtual machines, application serv-
tems, it is difficult to predict load beforehand and thus to
ices and databases. Our solution can optimize cloud resource
meet accessibility and responsiveness requirements (espe-
usage costs by predicting the demand for different resources
cially where the system is too big for frequent, on-demand
(i.e., CPU, IOPS, memory, storage) and then adjusting cloud
adjustments), the system must be scaled up with a margin
components accordingly. Prediction is done with the use of
for both unforeseen load spikes and long-term load changes.
machine learning interpolation combined with anomaly
This results in considerable power and storage overprovi-
detection. Cost reductions are achieved by provisioning cloud
sioning and thus unnecessary spending. In many cases,
components that meet the demand and at the same time are
optimal from the financial point of view. The optimal configu-
 Patryk Osypanka is with the Department of Computer Science, AGH Uni- ration is arrived at using a particle swarm optimization (PSO)
versity of Science and Technology, al. A. Mickiewicza 30, 30-059 Krakow, algorithm adjusted to solving discrete problems.
Poland, and also with ASEC S.A., ul. Wadowicka 6, 30-415 Krakow, The classic approach to cloud resource optimization
Poland. E-mail: patryk.osypanka@agh.edu.pl.
 Piotr Nawrocki is with the Department of Computer Science, AGH Uni- either focuses on a single resource (e.g., CPU) and scaling
versity of Science and Technology, al. A. Mickiewicza 30, 30-059 Krakow, parameter (e.g., number of machines) or creates resource uti-
Poland. E-mail: piotr.nawrocki@agh.edu.pl. lization models that ignore potential unexpected changes.
Manuscript received 17 September 2019; revised 14 July 2020; accepted 7 The main contributions of this paper are briefly summarized
August 2020. Date of publication 11 August 2020; date of current version 6 Sep- as follows:
tember 2022.
(Corresponding author: Piotr Nawrocki.)
Recommended for acceptance by J. Wang.  using an anomaly detection filter to improve the
Digital Object Identifier no. 10.1109/TCC.2020.3015769 quality of machine learning regression predictions;
2168-7161 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.
2080 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 10, NO. 3, JULY-SEPTEMBER 2022

addition, Yang et al. [9] propose ML along with heuristic


algorithms to assign tasks to the optimal virtual machine.
However, the important aspect of resource management and
scaling is missing from those works, as opposed to our solu-
tion which considers available component configurations.
A different approach is presented in [10] where the
authors propose a system which scales resources across pri-
Fig. 1. Example of components and their properties. vate and public clouds. The system scaling engine is based
on queuing theory and makes it possible to extend private
 using an adapted PSO algorithm to solve the cloud cloud capabilities using public cloud resources. This solu-
resource reservation problem; tion uses threshold-based policies and time-series analysis.
 using both vertical (quality) and horizontal (quan- We use machine learning instead to predict demand, which
tity) scaling at the same time to obtain optimal allows us to bridge the delays caused by the process of pro-
results; visioning new instances.
 presenting experimental results with real-life data In [11], the authors use ML for resource usage prediction
from the production system for different cloud usage and in [12], the authors propose a host of different ML algo-
models and verifying the effectiveness of the solu- rithms as a way of improving prediction; however, in both
tion proposed along with actual cost reductions. cases no further steps beyond prediction are presented.
The rest of this paper is structured as follows: Section 2 Also, none of these solutions perform anomaly detection
contains a description of related work, Section 3 is con- which makes them prone to inaccurate predictions in case
cerned with defining in detail the cloud resource cost opti- of temporary deviations.
mization process, Section 4 describes the implementation of The authors of [13] describe a system which develops vir-
the optimization solution, and Section 5 contains the conclu- tual machine reservation plans based on CPU usage history.
sion and further work. During evaluation, different ML algorithms are compared
with OpenStack and Blazar. In addition, tests in the virtual
environment (without cloud integration) present the sys-
2 RELATED WORK tem’s performance over a year. Although the system uses
The literature describes various studies devoted to resource ML, which makes it flexible, the authors focus on a single
allocation optimization. For example, the authors of [2], [3] type of virtual machine only as contrasted with our system,
and [4] describe a solution which analyses incoming tasks which uses all VM types available from a given cloud pro-
and reserves virtual machine instances in a way that makes vider to minimize overall cost. We also account for more
it possible to meet a deadline and is cost efficient. The solu- resources (i.e., RAM) along with anomaly detection, which
tion assumes that the system is performing tasks with makes our solution more complete and accurate.
known CPU and memory demands. The authors of the Other works present different methods of virtual machine
review presented in [5] discuss different task scheduling usage optimization: a time-aware residual network [14],
methods which can be used in such cases. On the other autonomic computing and reinforcement learning [15], deep
hand, we optimize more generic systems which fulfil many learning [16], a combination of PPSO and NN [17], an NN
functions and therefore cannot focus on task scheduling as with self-adaptive differential evolution algorithm [18] and
we are unable to determine the relevant parameters. We standalone neural networks [19], [20]. The authors of [21] use
must make sure that just enough cloud resources are avail- Naı̈ve Bayes, and in [22] and [23] the authors use learning
able when needed. automata. Kaur et al. [24] propose a set of various prediction
In a similar manner, cloud resource management with methods working in parallel and the authors of [25] use a
the use of deep reinforcement learning algorithms was progressive QoS prediction model and a genetic algorithm.
described by Zhang et al. [6]. The authors propose a deep Q- All those works along with surveys [26], [27] focus on virtual
network as a variant of a reinforcement learning algorithm, machines, mostly on CPU and RAM usage. We extend these
which is initially pre-trained by a stacked autoencoder approaches to other cloud component types (PaaS, SaaS)
(SAQN). To address stability issues, they introduced experi- and make a step further by selecting real-life, provider-
ence replay, Q-network freeze and network normalization. dependent sets of resources. Although the authors of [28]
The described solution assumes that the client makes describe a general idea for a system which would cover IaaS,
requests with a resource demand which is known before- PaaS and SaaS, that study does not include any tests or
hand and tested using an artificial load generated by broader analysis of the topic.
HiBench, a big data benchmark suite. Our approach is to A lot of studies describe different ways of allocating
optimize generic systems which generate requests that are resources optimally from a cloud provider’s point of view.
variable in time and whose characteristics are unknown. Dorian Minarolli and Bernd Freisleben [29] describe a system
The tests performed show that due to anomaly detection, which optimizes virtual machine allocation using fuzzy con-
our solution works without initial training and is able to trol. Owing to the proposed multi-agent environment, their
operate properly not only with a simulated (artificial) load, solution is able to operate on a considerable set of virtual
but also with real-world, noisy data. machines. Similarly, Singh et al. [30] propose mobile agents
Hilman et al. [7] propose an online incremental learning which manage resource allocation in the cloud provider’s
approach to predict the run time of tasks, and the authors of physical infrastructure. The authors take into consideration
[8] use machine learning (ML) for the same purpose. In not just the type of physical resources available, but also their
Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.
OSYPANKA AND NAWROCKI: RESOURCE USAGE COST OPTIMIZATION IN CLOUD COMPUTING USING MACHINE LEARNING 2081

location and network infrastructure which allows a cloud administrators must approve the changes proposed every
provider to reduce costs. In comparison to the above solu- time they find them useful. Some of these systems advertise
tions, our approach is focused on cost optimization from the that they are using ML in their analysis,5, 6 but in fact, they
end-user perspective; although a reduction in server opera- offer an overall view of spending sources and a simple
tion costs could possibly lead to a provider offering a dis- scheduling of component scaling plus human support,
count, the solution proposed by us provides direct cost which helps reduce costs but without automation.
savings. The solutions described in the aforementioned Our analysis of existing solutions shows that currently
articles take into consideration only the provisioning of vir- none tackle the problem of optimizing different types of
tual machines as a cloud provider’s building blocks, while cloud resources (IaaS, PaaS, SaaS) with proactive usage pre-
we are focused on cost optimization from the end user’s per- diction, anomaly detection and efficient, cloud-provider
spective, and thus not only IaaS, but also PaaS and SaaS are specific, automatic resource allocation. The contribution of
considered. this study is to define such a fully automatic system along
Usage prediction enables us to develop a resource usage with simulations and tests of its behavior using real-life
plan. Many works describe different techniques of resource usage data. Our solution does not require initial reservation
allocation. In [31], Wei et al. present a game-theoretic schedules or knowledge about the type of tasks performed
method while the authors of [32] propose a coral-reef and by the system. It works with different combinations of cloud
game theory-based approach. Machine learning is proposed component types (IaaS, PaaS, SaaS) and accounts for various
in [33], and a combinatorial auction algorithm and a combi- resource properties (CPU, IOPS, RAM, etc.). The cost opti-
natorial double auction algorithm are described in [34] and mization mechanism is resistant to anomalies (i.e., tempo-
[35]. Zhang et al. [36] propose machine learning-based rary usage spikes) and adapts to price changes (i.e., periodic
resource allocation, and in [37] the authors put forward discounts) as pricing policy is obtained directly from the
greedy particle swarm optimization. Our solution uses the cloud provider.
more lightweight, although accurate, Integer-PSO algorithm
described in [38], which we adapt and use for resource allo-
cation planning purposes.
3 CLOUD RESOURCE COST OPTIMIZATION
In the survey [39], Gondhi et al. review different virtual Systems located in the cloud can be complicated and
machine scheduling algorithms. Besides particle swarm involve multiple different resource types. The demand for
optimization, which is a base for Integer-PSO, the authors those resources varies over time, which is conditioned by:
describe a genetic algorithm, simulated annealing, ant col-
1) usage patterns generated by users which depend on
ony optimization, an artificial immune system and other
the time of the day and the day of the week;
meta-heuristics algorithms. Despite providing comparisons
2) usage patterns which depend on end-point machine
of advantages and disadvantages of the methods presented,
configuration (usage generated by automated devi-
the survey does not describe complete solutions. For exam-
ces, i.e., IoT);
ple, a continuous PSO algorithm has to be first adapted to
3) changes in system configuration (new functionali-
the discrete resource allocation problem (Integer-PSO) and
ties, new devices);
only then can it be used in the optimization process, while
4) accidental changes caused by temporary conditions
the aforementioned survey does not cover this adaptation.
(a software bug, communication issues).
On the other hand, our work describes a complete solution
A system must meet availability demands. A change in
which was tested on real-world data.
demand for cloud resources necessitates changes in those
In addition to the research described above, there are
resources’ configurations, which means scaling them. Resour-
some commercial solutions which enable cloud resource
ces can be scaled up or out. For example, a virtual machine
optimization. For example, scaling components as exempli-
can be scaled up by increasing its CPU parameters or it can be
fied by Azure Autoscale,1 AWS Autoscale2 and Google
scaled out by provisioning another copy of the given VM.
Cloud Autoscale3 are part of the cloud environment. Unfor-
Depending on the cloud provider’s pricing plan, either scal-
tunately, only threshold-based scaling and simple time-
ing up or scaling out can be more cost-effective while provid-
based scaling are available. Both of those scaling techniques
ing the same computing power. Scaling takes time, so it must
require an analysis of system usage patterns which might
be performed before it is needed, which requires resource
be difficult when the system is complicated. There are also
usage prediction.
commercial cloud provider-independent systems4 that offer
To meet the above requirements, we have developed a
cloud resource optimization. These systems analyze spend-
solution which performs prediction and monitoring. It con-
ing and present it in an easy-to-understand form. Addition-
sists of a Prediction module, a Monitoring module and a Database
ally, they provide hints about potential scale-downs of
to store predicted data (Fig. 2). We designed it (Fig. 3) to peri-
some cloud components or reorganizations which reduce
odically (every week) gather historical usage data from the
cloud running costs. These systems are not automated, so
last month for each resource which needs to be tailored. This
task is done by the Prediction module. In the next step, the
1. Azure Autoscale – https://azure.microsoft.com/en-us/features/ solution filters out anomalies to improve prediction quality.
autoscale
2. AWS Autoscale – https://aws.amazon.com/autoscaling
3. Google Cloud Autoscale – https://cloud.google.com/compute/ 5. Cloud Cost Management, Efficiency and Optimization – https://
docs/autoscaler www.cloudability.com
4. Azure Cost Management – https://azure.microsoft.com/en-us/ 6. Next-Generation Cloud Optimization for CloudOps – https://
services/cost-management www.densify.com
Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.
2082 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 10, NO. 3, JULY-SEPTEMBER 2022

Fig. 2. Optimization setup overview.

Next, for each resource, it makes a prediction for the next 7


days and, using all these predictions combined, calculates a
cost-optimal cloud resource configuration with hourly reso-
lution and the desired maximum resource utilization level.
Fig. 4. Prediction module algorithm.
The maximum utilization level depends on system type (i.e.,
will be lower for high-availability systems). Such a long pre-
diction timeframe reduces prediction frequency and pro- allocated resources, the system becomes less responsive and
vides an allocation plan for the entire week for the takes longer to process requests. Filtered data are stored in
administrator’s inspection if required. Only available scaling the database (WriteToDB()). In the next step, the module
options are considered; if a cloud provider adds new possi- reads collected historical data from the database (Read-
bilities, these will be automatically included in calculations. FromDB()) to predict usage for the next week. The historical
The calculated cloud resource configuration is stored in the data time window length affects prediction stability and
Database. In a separate hourly loop, using the Monitoring mod- adaptation rate and has to be configured according to opti-
ule, the system checks if cloud resources need to be scaled mized system properties (configuredWindow). The time win-
according to predictions. dow must be sufficiently long to observe usage patterns but
The logic of the Prediction module is presented in the form sufficiently short to allow quick prediction adaptation. For
of algorithm (Fig. 4). For every component (component) in every collected piece of usage data, the module develops
the set of monitored components (monitoredComponents) usage predictions (PredictUsageML()) using machine learn-
which we optimize, the module collects (GetHistoryData()) ing interpolation and then stores them (predictions.Add()).
CPU, memory and storage usage data (usage level along In the last stage of the algorithm, after all predictions
with the time of day and the day of the week). These data have been done, the module obtains the current pricing
are filtered and then stored in the database to be used for plan from the cloud provider (GetPricingPlan()) and calcu-
prediction later on. Filtering is done by the anomaly detec- lates the optimal resource configuration. The cloud provider
tion algorithm [40]: first using the exchangeability martin- defines possible scaling configurations for different cloud
gales function (AnomalyFilter()), and next, in order to components; the same CPU, memory or disk storage resour-
smooth the data and improve prediction quality, using a ces can be provisioned with a different configuration and
median filter (MedianFilter()). Filtering prevents unneces- therefore at a different cost. This creates a matrix of possibil-
sary prediction distortions when resource usage changes ities. As the number of possible configurations is usually
are temporary and random. If such a change exceeds the large, calculating all variants is not feasible and this is why

Fig. 3. The optimization loop.


Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.
OSYPANKA AND NAWROCKI: RESOURCE USAGE COST OPTIMIZATION IN CLOUD COMPUTING USING MACHINE LEARNING 2083

the module chooses a cost-optimal configuration (Calculate- Cost definition for minimization algorithm D is as follows:
Configuration()) using a particle swarm optimization algo- 
C if P  L
rithm. Based on the solution described by A. S. Ajeena DðC; P ; LÞ ¼ ; (6)
Beegom et al. [38], we defined our own version of the Inte- 1 otherwise
ger-PSO algorithm which is suited to our needs. Given the
predicted required level of resources L = ½L1 ; . . . ; Lm  (i.e., where P  L is defined as
CPU core count, or RAM amount) and n different compo- P  L , 8i 2 ð1; . . . ; mÞ Pi  Li : (7)
nent configuration types (i.e., compute-optimized, memory-
optimized, general-purpose) from the cloud provider In the example, D ¼ C ¼ €40.00 as 8  7 and 17  16.
½T1 ; . . . ; Tn , our problem is to find a set of configurations Q Q with the minimal cost can be found using the cost func-
which will meet the L constraint and will be cost-efficient at tion D from Equation (6) and the Integer-PSO algorithm. As
the same time. Q = ½z1 ; . . . ; zn  defines how many instances the cloud providers’ pricing policies are usually complex, it
of every configuration type should be used. As an example, is impossible to define how many minimums exist in the cost
we can take virtual machines with CPU core count (L1 ) and function, which is discrete, as a fractional component cannot
RAM amount (L2 ) as the resources examined along with the be provisioned. The final stage of the original algorithm
predicted required level as L = ½7; 16, which means 7 CPU described in [38] was altered, as we are looking for multiples
cores and 16 GB of RAM. A sample cloud provider offers 3 of available machines ½z1 ; . . . ; zn  rather than task assignment
different machine types: configuration.
To reduce frequent configuration changes, the new calcu-
 T1 : 4 CPU cores, 1 GB of RAM, € 12.00/month;
lated configuration Q0 is compared to the previous configura-
 T2 : 2 CPU cores, 8 GB of RAM, € 14.00/month;
tion. If the old Q still meets the P  L constraint and if
 T3 : 2 CPU cores, 2 GB of RAM, € 10.00/month. P P 0
In this case Q , which meets the L constraint, can be 8i 2 ð1; . . . ; mÞ di < F (where di ¼ iPi i and F is a stability
defined as ½1; 2; 0. It means one virtual machine of type T1 factor), Q0 is discarded and Q is used instead. F determines
and 2 machines of type T2 . The maximum value k for zi how probable it is that the algorithm will keep the previous
(i 2 ð1; . . . ; nÞ) which has to be taken into consideration configuration set. Continuing the example defined previously
while finding Q can be defined as the number of the least where Q ¼ ½1; 2; 0 and P = ½8; 17, we can take as an example
powerful configurations needed to meet the L level. Adding a new predicted required level L0 = ½4; 15, new set Q0 =
more resources will be more expensive and is not necessary, ½0; 2; 0, P 0 = ½4; 16 and we can define the stability vector as
as L is definitely already met. Following the above example, F ¼ 0:4. CPU count d1 ¼ 84 8 ¼ 0:5, RAM amount d2 ¼ 17 
1716

k = 16 as 16 virtual machines of type T1 fulfills the L require- 0:06. In this case, di < F is not met for the CPU count (i ¼ 1)
ment in terms of RAM amount. Q is defined as and a new value Q0 will be used. Each time the old configura-
tion is used, F is decremented; when Q0 is used, F is reset to
Q ¼ ½ z1 ; . . . ; zn ; (1) its initial value. The final results are stored in the database
(WriteToDB()) and are later used by the Monitoring module.
where 8i 2 ð1; . . . ; nÞ 0  zi  k. The cost of such set C is In a separate loop, the Monitoring module runs every
defined as hour. It monitors if a given resource must be scaled accord-
2 3 ing to the predicted configuration, and scales it if needed.
m1
6 . 7 X
n To estimate the quality of the predicted components’ set,
Q; M Þ ¼ Q  M ¼ Q  4 .. 5 ¼
CðQ ðzi  mi Þ; (2) we use common prediction measurements: Root Mean
i¼1 Square Error (RMSE), Mean Absolute Error (MAE), Rela-
mn
tive Absolute Error (RAE) and Root Relative Squared Error
where mi is the price of Ti configuration type. The resource (RRSE). To compare the predicted configuration with real
level P provided by Q is defined as usage history, we defined the R metric, which is the mean
2 3 of overusage errors. For the given predicted usage during
s11 ; . . .; s1m hours t1 to tm , R is defined as
6 . .. 7
P ¼ Q  4 .. . 5 ¼ ½ P1 ; . . . ; Pm ; (3) Pm
Et
sn1 ; . . .; snm R ¼ t¼1 ; (8)
m
P
where Pj ¼ ni¼1 ðzi  sij Þ and sij is the j resource level pro- where Et is prediction error for the hour t, defined as
vided by the Ti configuration type. In the example defined
Et ¼ ðut  pt Þ  Hðut  pt Þ; (9)
before, cost is calculated as
3 2 where H is a discrete Heaviside Step Function
€12
C ¼ ½ 1; 2; 0   4 €14 5 ¼ €40:00; (4) 
0; n < 0
€10 HðnÞ ¼ ; (10)
1; n  0
and resource level as
2 3 pt is the calculated level for hour t and ut is the actual
4; 1 resource usage level for hour t.
P ¼ ½ 1; 2; 0   4 2; 8 5 ¼ ½ 8; 17 : (5) In the end, we measure average cost savings per hour: V .
2; 2 For a given resource and given predicted usage of this
Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.
2084 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 10, NO. 3, JULY-SEPTEMBER 2022

Fig. 6. Time-compressed environment test architecture.

Fig. 5. Architecture of TMS – a real-life system used as the test data pro- devices [42]. TMS enables credit card payments in vending
vider for our simulations. machines and kiosks. It consists of many endpoint devices
which connect to the central server. The central server pro-
resource during hours t1 to tm , V is defined as cesses payment transactions and allows operators to config-
ure and maintain end-point devices. The central server
Pm
t¼1 ðGt  Ct Þ consists of micro services (which deal with payment) and
V ¼ ; (11) virtual machines (which host management/reporting web-
m
pages). Both micro services and virtual machines connect to
the SQL database (Fig. 5).
where Gt is the cost of the configuration without optimiza-
Payment devices connect to the Payment service during
tion during the hour t and Ct is the cost of predicted config-
the credit card payment process. These devices are located in
uration during the hour t. Both are expressed in cloud
Asia, Europe and America and are used mostly in unat-
provider currency.
tended vending machines. This causes daily variations in
The system defined above, which uses machine learning
resource demand. Web browsers connect to the Management
combined with anomaly detection along with the PSO algo-
webpage when the operator changes configurations or gen-
rithm, calculates the optimal cloud resource configuration.
erates reports. Also, devices connect to the Management web
As a result, resource usage cost reduction is achieved.
page to report their status and check for configuration
changes. The main load comes from the devices, which are
4 EVALUATION configured to connect periodically. Therefore, there is no vis-
Based on the concept from the previous section, we have ible resource demand variation pattern. The database is used
developed an optimization system which uses Azure cloud by both the Payment service and the webpage, and thus the
computing. The efficiency of our system was proved during resource demand variations visible in the payment module
tests with cloud simulators; using a simulator reduces test- are also present in database usage to a certain extent. As
ing time and improves testing elasticity as described in [41]. TMS consists of components with different usage character-
Azure (Microsoft’s cloud service) exposes an API which istics, we can test our idea in different test conditions.
gives access to a component’s historical usage and makes it We set up a test environment (Fig. 6) that allowed us to
possible to get and set a component’s parameters. The perform time-compressed tests. Instead of using the real pro-
Azure API also exposes the current pricing plan. Since this duction system (TMS), we created mock components: the
is convenient, we focus on the Azure cloud only, especially Payment service, the Management web page and the Data-
on virtual machines (IaaS), App Services (PaaS) and the base, which were used as inputs for our solution. Data col-
Azure SQL (SaaS). Virtual machines and App Services can lected from the production TMS (10 months in total) were
be scaled in terms of Azure Compute Units (ACUs), which stored in a separate database for test purposes. The entire
represent unified compute (CPU) performance power. The TMS system was monitored and all types of components
available RAM can be scaled for an App Service and the (PaaS, IaaS and SaaS) were taken into account; ACU, RAM,
maximum level of input/output operations per second can IOPS, DTU and storage usage were used in tests. We imple-
be scaled for a virtual machine. SQL databases can be scaled mented four different prediction types: Bayesian Linear
in terms of storage size and available Database Transaction (BL), Decision Forest Regression (DF), Boosted Decision Tree
Units (DTUs) which are a blend of used memory, CPU Regression (BDT) and Neural Network Regression (NN).
power and IOPS level. Nevertheless, our solution is suitable The Bayesian approach uses linear regression enhanced by
for any cloud provider and any cloud resources which can information in the form of a probability distribution. Statisti-
be scaled. cal analysis is undertaken, prior knowledge about model
To take advantage of the Azure environment, we selected parameters is merged with a likelihood function, and poste-
Microsoft Azure Machine Learning Studio as our main rior estimates for the parameters are generated [43]. Decision
prediction engine. Machine Learning Studio offers ready- trees are models which execute a sequence of data analyses
to-use data processing and ML components. It also allows until a decision is achieved. The Decision Forest Regression
custom functions written in the R and Python languages. model consists of multiple decision trees. Each tree creates a
We tested our solution using real-life data from a work- prediction (a Gaussian distribution) which is compared to
ing system called Terminal Management System (TMS), the combined distribution for all trees in the model [44].
which is a cloud-based manager of Internet of Things Boosted Decision Tree Regression uses the MART gradient
Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.
OSYPANKA AND NAWROCKI: RESOURCE USAGE COST OPTIMIZATION IN CLOUD COMPUTING USING MACHINE LEARNING 2085

boosting algorithm which gradually builds a series of deci- In total, we made 24 predictions (6 optimization factors mul-
sion trees. The optimal tree is selected using an arbitrary loss tiplied by 4 algorithms). Each prediction consisted of more
function [45]. Neural Network Regression uses a neural net- than 6,500 points (10 months with hourly resolution). For
work as a model. This type of regression is suitable for diffi- purposes of clarity, we chose one optimization factor for
cult problems where other regression models cannot fit a every component and presented them separately for
solution [46]. selected periods (Fig. 7). In fact, as defined in Equations (3),
Each prediction type operates in the “Tune Model Hyper- (2), and (7), all resources included in the component in
parameters”7 self-tune mode, which means that prediction question are calculated together. For every component, the
algorithm parameters are picked automatically. Prediction chart presents a workload characteristic (Actual usage)
was performed for every type of machine learning, thus we and all prediction algorithm results. We selected periods
were able to compare the results. As initial values for self- so that they consisted of anomalies (Figs. 7a, 7b, 7c), visible
tuning, the following default configurations were used: patterns (Fig. 7b) and longer high-usage events (Fig. 7a). In
addition, negative impact of previous data on the predic-
1) BL tion process can be observed (Fig. 7b) where, especially in
 Regularization weight = 1 the beginning, the usage level predicted is clearly lower
 Tune Model Hyperparameters maximum num- than the actual one. Nevertheless, even this error has no
ber of runs = 15 negative impact on the real system as we aim to keep
2) DF resource usage at 70 percent, which gives us a 30 percent
 Re-sampling method = Bagging safety margin.
 Number of decision trees = 8 To clearly visualize the optimization process, we choose
 Maximum depth of the decision trees = 32 3 weeks (8th to 31st May 2019), one component (SaaS), one
 Number of random splits per node = 128 property (DTU) and one ML algorithm (DF). Fig. 8 presents
 Minimum number of samples per leaf node = 1 the actual usage level along with usage level after anomaly
 Tune Model Hyperparameters maximum num- detection (with the anomalies removed). Prediction is based
ber of runs = 5 on the data after anomaly detection and thus it is not dis-
3) BDT torted by temporary usage spikes (Fig. 9).
 Maximum number of leaves per tree = 20 Although we are using the Integer-PSO algorithm to find
 Minimum number of samples per leaf node = 10 the optimum component configuration, due to cloud
 Learning rate = 0.2 resource granulation (the cloud provider only offers pre-
 Total number of trees constructed = 100 configured component variants, e.g., a VM with 210 ACUs
 Tune Model Hyperparameters maximum num- and 4,000 IOPS), the values predicted are not used exactly
ber of runs = 5 in the calculated configuration. In the chart (Fig. 9) we pres-
4) NN ent the actual resource usage, predicted usage and calcu-
 Hidden layers = 1, fully connected lated configuration based on DF prediction. Despite this
 Number of hidden nodes = 100 granulation, we still observe a significant reduction in
 Learning rate = 0.02 resource costs (Fig. 10). In the TMS system, the cost of SaaS
 Number of iterations = 80 in May 2019 equals € 392 and the optimization achieved by
 The initial learning weights diameter = 0.1 our system reduces the cost to € 23. For the entire period
 The momentum = 0 tested, PaaS costs were reduced by 88 percent, which results
 The type of normalizer = Do not normalize in savings of € 4,268.
Integer-PSO was used with 300 particles in 500 epochs. Our tests demonstrate that in the case of anomalous
As in [38], we set the Inertia weight to 0.6 and acceleration behavior (sudden high resource usage), the calculated con-
coefficients to 0.2. The maximum velocity was set to 0:1  n, figuration does not cover 100 percent of resource demand
where n is the number of available configuration options, and the cloud provider resorts to throttling. This slows
and minimum velocity was set accordingly with the minus down the processing of incoming requests or, in cases of
sign. Accuracy was set to 3 digits. prolonged high-level usage, results in a timeout response to
For Payment service (PaaS) optimization, we selected the client (we did not observe such long-lasting anomalies).
ACU and RAM utilization levels as optimization factors. For Nevertheless, the TMS system is designed to handle such
the Management web page (IaaS), ACU and IOPS were situations, as timeouts are often caused by poor network
selected, and for the Database (SaaS), DTU and disk space conditions at the endpoint side (in this case – the credit card
were selected. Because we are using the Database equally for payment terminal) anyway.
reading and writing, it is hard to scale it out (multiply its During tests conducted for data between 8th and 31st
instances), so in this case we set k ¼ 1 in the Equation (1) to May 2019, we observed a reduction in resource usage cost
limit PSO-Integer to one instance only. We performed our not only for SaaS (as presented here), but also for other com-
simulation using 10 months of data from TMS, and we com- ponent types: IaaS and PaaS. Although we did not find simi-
pared the results with the production configuration. We also lar solutions or test data to compare them with our system,
compared the results obtained from different ML algorithms. we used the Azure Autoscale mechanism described in
Section 2 as the point of reference. Although Azure Auto-
scale performs only a horizontal (quantity) optimization for
7. Tune Model Hyperparameters – https://docs.microsoft.com/en-
us/azure/machine-learning/studio-module-reference/tune-model- IaaS and PaaS, we chose it for its out-of-the-box availability.
hyperparameters As vertical (quality) optimization is not available, we used
Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.
2086 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 10, NO. 3, JULY-SEPTEMBER 2022

Fig. 7. Comparison of predictions made with different ML algorithms for different resources.

the cheapest possible components (Autoscale is not enabled time was 2.5 times longer and the variance of response time
for low-price PaaS) to ensure the most detailed scaling. In was 7 times higher when compared to our solution. This led
Table 1, we present financial savings along with common to longer periods of availability issues. On the other hand,
prediction quality metrics described in Section 3, the R for PaaS, where the resource demand was stable, both
(mean of overusage errors) and V (cost savings per hour) response time and variance were similar to our system.
parameters calculated according to Equations (8) and (11). Azure Autoscale was not able to optimize SaaS resources,
When compared to the original value, the high savings per- and PaaS and IaaS were optimized only in one dimension;
centage figure is caused by the considerable resource over- the final result was more expensive and in the case of IaaS,
provisioning in the TMS system due to the tendency performance was much lower.
described in Section 1. Our anomaly detection solution Optimization introduces quality degradation when com-
makes this over-provisioning unnecessary. Azure Autoscale pared to the original system. During our test period (from
reduced the cost to some degree, but it was still only half as 8th to 31st May 2019), in which the original system was
efficient as our solution. Additionally, being a reactive sys- highly over-provisioned, we observed a mean response
tem, it introduced a performance decline. For IaaS, where time that was 4 times longer and a variance that was almost
dynamic resource demand was observed, the mean response 100 times higher while the original system response time
Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.
OSYPANKA AND NAWROCKI: RESOURCE USAGE COST OPTIMIZATION IN CLOUD COMPUTING USING MACHINE LEARNING 2087

Fig. 8. Comparison of the actual and anomaly filtered DTU usage level Fig. 10. Comparison of the cost of the database with and without the
(SaaS, May 2019). optimization (SaaS, May 2019).

was almost constant. However, when tested during high Microsoft Azure Machine Learning Studio. All these opera-
usage periods (from 12th to 19th September 2019), the origi- tions fit in the free plans offered by Azure.
nal system’s mean response time and variance were similar
to the values observed after our optimization. Despite the 5 CONCLUSION AND FURTHER WORK
fact that longer response times still allow the system to
operate properly, optimization with quality of user experi- In this work, we present a solution for optimizing cloud
ence as a parameter will be a topic of our further studies as resource costs. Our approach operates autonomously, in
mentioned in Section 5. closed-loop configuration, without any need for external
The optimization solution runs independently from the tuning. We used real-world data from a production system.
working system which is being optimized, and thus the Tests show that the savings calculated are significant and
optimization process does not introduce any performance that our system works properly, minimizing cloud resource
overhead. The monitoring module only runs when a com- usage and cost. A comparison between current system costs
ponent change is required (usually once a couple of hours), and those after optimization demonstrates that during the
and the prediction module runs once a week and uses 10 months covered by tests, the solution, if implemented in
the working system, would have resulted in savings of €
6,128, which translates to an 85 percent cost reduction.
Our solution aims to reduce the cost of using cloud resour-
ces by predicting future demand for resources and adjusting
the provisioned resources accordingly. Therefore, any cloud-
based system which uses scalable resources (i.e., IaaS, PaaS or
SaaS) can be optimized using our solution. Optimization is
performed at the resource allocation level and knowledge of
the internal structure of the system being optimized is not
required; however, any performance improvements in this
system will be captured by our solution and less resources
will be provisioned in the future. Since we are using predic-
tion techniques, the greatest cost reduction will be observed
for systems with usage patterns that are complicated, hard to
define and varied over time; these patterns will be determined
by machine learning algorithms. Scaling resources is simpler
when client-server communication is stateless, as every call
can be directed to the appropriate resource independently;
nevertheless, cloud providers also offer scaling of stateful
communications. Our solution is compatible with many
cloud-based system types, i.e., IoT hubs or Enterprise Resour-
ces Planning services in the form of web services, payment
Fig. 9. Comparison of the actual DTU usage level, its DF prediction and gateways that process online transactions, e-commerce solu-
the calculated configuration (SaaS, May 2019). tions as well as web information portals and social networks.
Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.
2088 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 10, NO. 3, JULY-SEPTEMBER 2022

TABLE 1
Savings and Quality Metrics for the Best Algorithms (May 2019)

Time-compressed tests demonstrate that the efficiency of [8] Y. Yu, V. Jindal, F. Bastani, F. Li, and I. Yen, “Improving the
smartness of cloud management via machine learning based
our solution improves over time. This is why, if historical workload prediction,” in Proc. IEEE 42nd Annu. Comput. Softw.
data are available, the solution can be trained in advance to Appl. Conf., 2018, pp. 38–44.
boost efficiency from the start. This topic may be the subject [9] R. Yang, X. Ouyang, Y. Chen, P. Townend, and J. Xu, “Intelligent
of our further studies. Currently, we are monitoring and resource scheduling at scale: A machine learning perspective,” in
Proc. IEEE Symp. Service-Oriented Syst. Eng., 2018, pp. 132–141.
storing over 100 parameters of the production system [10] C.-C. Crecana and F. Pop, “Monitoring-based auto-scalability
(TMS). In the future, we would like to incorporate quality of across hybrid clouds,” in Proc. 33rd Annu. ACM Symp. Appl. Com-
user experience criteria in our resource prediction process, put., 2018, pp. 1087–1094.
[11] T. Mehmood, S. Latif, and S. Malik, “Prediction of cloud comput-
which may result in better resource usage optimization and ing resource utilization,” in Proc. 15th Int. Conf. Smart Cities:
quicker system response times. Improving Qual. Life Using ICT IoT, 2018, pp. 38–42.
[12] I. K. Kim, W. Wang, Y. Qi, and M. Humphrey, “CloudInsight: Uti-
lizing a council of experts to predict future cloud application work-
ACKNOWLEDGMENT loads,” in Proc. IEEE 11th Int. Conf. Cloud Comput., 2018, pp. 41–48.
The research presented in this article was supported by funds [13] B. Sniezynski, P. Nawrocki, M. Wilk, M. Jarzab, and K. Zielinski,
“VM reservation plan adaptation using machine learning in cloud
from the Polish Ministry of Science and Higher Education computing,” J. Grid Comput., vol. 17, pp. 797–812, Jul. 2019.
assigned to the AGH University of Science and Technology. [14] S. Chen, Y. Shen, and Y. Zhu, “Modeling conceptual characteris-
tics of virtual machines for CPU utilization prediction,” in Proc.
Int. Conf. Conceptual Model., 2018, pp. 319–333.
REFERENCES [15] M. Ghobaei-Arani, S. Jabbehdari, and M. A. Pourmina, “An auto-
[1] A. S. Andrae and T. Edler, “On global electricity usage of commu- nomic resource provisioning approach for service-based cloud
nication technology: Trends to 2030,” Challenges, vol. 6, no. 1, applications: A hybrid approach,” Future Gener. Comput. Syst.,
pp. 117–157, 2015. vol. 78, pp. 191–210, 2018.
[2] M. Mao and M. Humphrey, “Auto-scaling to minimize cost and [16] Q. Zhang, L. T. Yang, Z. Yan, Z. Chen, and P. Li, “An efficient deep
meet application deadlines in cloud workflows,” in Proc. Int. Conf. learning model to predict cloud workload for industry informatics,”
High Perform. Comput. Netw. Storage Anal., 2011, pp. 1–12. IEEE Trans. Ind. Informat., vol. 14, no. 7, pp. 3170–3178, Jul. 2018.
[3] J. Yang, W. Xiao, C. Jiang, M. S. Hossain, G. Muhammad, and [17] A. Abdelaziz, M. Elhoseny, A. S. Salama, and A. Riad, “A machine
S. U. Amin, “AI-powered green cloud and data center,” IEEE learning model for improving healthcare services on cloud com-
Access, vol. 7, pp. 4195–4203, 2019. puting environment,” Measurement, vol. 119, pp. 117–128, 2018.
[4] S. Abrishami, “Deadline-constrained workflow scheduling algo- [18] J. Kumar and A. K. Singh, “Workload prediction in cloud using
rithms for infrastructure as a service clouds,” Future Gener. Com- artificial neural network and adaptive differential evolution,”
put. Syst., vol. 29, pp. 158–169, 2013. Future Gener. Comput. Syst., vol. 81, pp. 41–52, 2018.
[5] S. Memeti, S. Pllana, A. Binotto, J. Kolodziej, and I. Brandic, “A [19] J. N. Witanto, H. Lim, and M. Atiquzzaman, “Adaptive selection
review of machine learning and meta-heuristic methods for of dynamic VM consolidation algorithm using neural network for
scheduling parallel computing systems,” in Proc. Int. Conf. Learn. cloud resource management,” Future Gener. Comput. Syst., vol. 87,
Optim. Algorithms: Theory Appl., 2018, pp. 5:1–5:6. pp. 35–42, 2018.
[6] Y. Zhang, J. Yao, and H. Guan, “Intelligent cloud resource man- [20] K. Mason, M. Duggan, E. Barrett, J. Duggan, and E. Howley,
agement with deep reinforcement learning,” IEEE Cloud Comput., “Predicting host CPU utilization in the cloud using evolutionary
vol. 4, no. 6, pp. 60–69, Nov./Dec. 2017. neural networks,” Future Gener. Comput. Syst., vol. 86, pp. 162–173,
[7] M. H. Hilman, M. A. Rodriguez, and R. Buyya, “Task runtime pre- 2018.
diction in scientific workflows using an online incremental learn- [21] A. M. Al-Faifi, B. Song, M. M. Hassan, A. Alamri, and A. Gumaei,
ing approach,” in Proc. IEEE/ACM 11th Int. Conf. Utility Cloud “Performance prediction model for cloud service selection from
Comput., 2018, pp. 93–102. smart data,” Future Gener. Comput. Syst., vol. 85, pp. 97–106, 2018.

Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.
OSYPANKA AND NAWROCKI: RESOURCE USAGE COST OPTIMIZATION IN CLOUD COMPUTING USING MACHINE LEARNING 2089

[22] A. A. Rahmanian, M. Ghobaei-Arani, and S. Tofighy, “A learning [39] N. K. Gondhi and A. Gupta, “Survey on machine learning based
automata-based ensemble resource usage prediction algorithm scheduling in cloud computing,” in Proc. Int. Conf. Intell. Syst.
for cloud computing environment,” Future Gener. Comput. Syst., Metaheuristics Swarm Intell., 2017, pp. 57–61.
vol. 79, pp. 54–71, 2018. [40] G. Cherubin, A. Baldwin, and J. Griffin, “Exchangeability martin-
[23] M. Ranjbari and J. A. Torkestani, “A learning automata-based gales for selecting features in anomaly detection,” in Proc. 7th
algorithm for energy and SLA efficient consolidation of virtual Symp. Conformal Probabilistic Prediction Appl., 2018, pp. 157–170.
machines in cloud data centers,” J. Parallel Distrib. Comput., vol. 113, [41] T. Lorido-Botran, J. Miguel-Alonso, and J. A. Lozano, “A review
pp. 55–62, 2018. of auto-scaling techniques for elastic applications in cloud envi-
[24] G. Kaur, A. Bala, and I. Chana, “An intelligent regressive ensem- ronments,” J. Grid Comput., vol. 12, pp. 559–592, 2014.
ble approach for predicting resource usage in cloud computing,” [42] A. Botta, W. de Donato, V. Persico, and A. Pescape, “On the inte-
J. Parallel Distrib. Comput., vol. 123, pp. 1–12, 2019. gration of cloud computing and Internet of Things,” in Proc. Int.
[25] X. Chen, J. Lin, B. Lin, T. Xiang, Y. Zhang, and G. Huang, “Self-learning Conf. Future Internet Things Cloud, 2014, pp. 23–30.
and self-adaptive resource allocation for cloud-based software serv- [43] C. Bishop and M. Tipping, “Bayesian regression and classification,”
ices,” Concurrency Comput., Practice Experience, vol. 31, 2018, in Advances in Learning Theory: Methods, Models and Applications,
Art. no. e4463. J. Suykens, I. Horvath, S. Basu, C. Micchelli, and J. Vandewalle, Eds.
[26] C. Qu, R. N. Calheiros, and R. Buyya, “Auto-scaling web applica- Amsterdam, The Netherlands: IOS Press, 2003, pp. 267–285.
tions in clouds: A taxonomy and survey,” ACM Comput. Surv., [44] A. Criminisi, J. Shotton, and E. Konukoglu, “Decision forests: A
vol. 51, no. 4, pp. 73:1–73:33, Jul. 2018. unified framework for classification, regression, density estima-
[27] Y. Al-Dhuraibi, F. Paraiso, N. Djarallah, and P. Merle, “Elasticity in tion, manifold learning and semi-supervised learning,” Found.
cloud computing: State-of-the-art and research challenges,” IEEE Trends Comput. Graph. Vis., vol. 7, no. 2/3, pp. 81–227, Feb. 2012.
Trans. Services Comput., vol. 11, no. 2, pp. 430–447, Mar./Apr. 2018. [45] C. J. Burges, “From RankNet to LambdaRank to LambdaMART:
[28] H. M. Makrani, H. Sayadi, D. Motwani, H. Wang, S. Rafatirad, An overview,” Microsoft, Redmond, WA, USA, Tech. Rep. MSR-
and H. Homayoun, “Energy-aware and machine learning-based TR-2010–82, Jun. 2010.
resource provisioning of in-memory analytics on cloud,” in Proc. [46] C. M. Bishop, “Neural networks: A pattern recognition perspective,”
ACM Symp. Cloud Comput., 2018, pp. 517–517. Aston Univ., Birmingham, U.K., Tech. Rep. NCRG/96/001, Jan. 1996.
[29] D. Minarolli and B. Freisleben, “Virtual machine resource alloca-
tion in cloud computing via multi-agent fuzzy control,” in Proc. Patryk Osypanka received the MSc degree, and
Int. Conf. Cloud Green Comput., 2013, pp. 188–194. is currently working toward the doctoral degree
[30] A. Singh, D. Juneja, and M. Malhotra, “A novel agent based auton- with the Department of Computer Science, AGH
omous and service composition framework for cost optimization University of Science and Technology, Krakow,
of resource provisioning in cloud computing,” J. King Saud Univ. Poland. He works professionally with ASEC S.A.
Comput. Inf. Sci., vol. 29, no. 1, pp. 19–28, 2017. as software development team leader, mainly
[31] G. Wei, A. V. Vasilakos, Y. Zheng, and N. Xiong, “A game-theoretic using Microsoft technologies (.Net, Azure). His
method of fair resource allocation for cloud computing services,” J. research focuses on cloud computing.
Supercomput., vol. 54, no. 2, pp. 252–269, Nov. 2010.
[32] M. Ficco, C. Esposito, F. Palmieri, and A. Castiglione, “A coral-
reefs and game theory-based approach for optimizing elastic
cloud resource allocation,” Future Gener. Comput. Syst., vol. 78,
pp. 343–352, 2018.
[33] S. Sotiriadis, N. Bessis, and R. Buyya, “Self managed virtual machine Piotr Nawrocki received the PhD degree. He is
scheduling in cloud systems,” Inf. Sci., vol. 433/434, pp. 381–400, 2018. an associate professor with the Department of
[34] D. Gudu, M. Hardt, and A. Streit, “Combinatorial auction algo- Computer Science, AGH University of Science
rithm selection for cloud resource allocation using machine and Technology, Krakow, Poland. His research
learning,” in Proc. Eur. Conf. Parallel Process., 2018, pp. 378–391. interests include distributed systems, computer
[35] S. A. Tafsiri and S. Yousefi, “Combinatorial double auction-based networks, mobile systems, cloud computing,
resource allocation mechanism in cloud computing market,” J. Internet of Things, and service-oriented architec-
Syst. Softw., vol. 137, pp. 322–334, 2018. tures. He has participated in several EU research
[36] J. Zhang, N. Xie, K. Yue, W. Li, and D. Kumar, “Machine learning projects including MECCANO, 6WINIT, Univer-
based resource allocation of cloud computing in auction,” Comput. sAAL and national projects including IT-SOA, and
Mater. Continua, vol. 56, pp. 123–135, Jan. 2018. ISMOP. He is a member of the Polish Information
[37] Z. Zhong, K. Chen, X. Zhai, and S. Zhou, “Virtual machine-based Processing Society (PTI).
task scheduling algorithm in a cloud computing environment,”
Tsinghua Sci. Technol., vol. 21, no. 6, pp. 660–667, Dec. 2016.
[38] A. S. Ajeena Beegom and M. S. Rajasree, “Integer-PSO: A discrete " For more information on this or any other computing topic,
PSO algorithm for task scheduling in cloud computing systems,” please visit our Digital Library at www.computer.org/csdl.
Evol. Intell., vol. 12, pp. 227–239, Feb. 2019.

Authorized licensed use limited to: GITAM University. Downloaded on August 07,2024 at 04:55:45 UTC from IEEE Xplore. Restrictions apply.

You might also like