Developing Accurate and Scalable Simulators of Production Workflow Management Systems With WRENCH
Developing Accurate and Scalable Simulators of Production Workflow Management Systems With WRENCH
article info a b s t r a c t
Article history: Scientific workflows are used routinely in numerous scientific domains, and Workflow Management
Received 1 July 2019 Systems (WMSs) have been developed to orchestrate and optimize workflow executions on distributed
Received in revised form 28 April 2020 platforms. WMSs are complex software systems that interact with complex software infrastructures.
Accepted 20 May 2020
Most WMS research and development activities rely on empirical experiments conducted with full-
Available online 26 May 2020
fledged software stacks on actual hardware platforms. These experiments, however, are limited to
Keywords: hardware and software infrastructures at hand and can be labor- and/or time-intensive. As a result,
Scientific workflows relying solely on real-world experiments impedes WMS research and development. An alternative
Workflow management systems is to conduct experiments in simulation. In this work we present WRENCH, a WMS simulation
Simulation framework, whose objectives are (i) accurate and scalable simulations; and (ii) easy simulation
Distributed computing software development. WRENCH achieves its first objective by building on the SimGrid framework.
While SimGrid is recognized for the accuracy and scalability of its simulation models, it only provides
low-level simulation abstractions and thus large software development efforts are required when
implementing simulators of complex systems. WRENCH thus achieves its second objective by providing
high-level and directly re-usable simulation abstractions on top of SimGrid. After describing and giving
rationales for WRENCH’s software architecture and APIs, we present two case studies in which we
apply WRENCH to simulate the Pegasus production WMS and the WorkQueue application execution
framework. We report on ease of implementation, simulation accuracy, and simulation scalability so
as to determine to which extent WRENCH achieves its objectives. We also draw both qualitative and
quantitative comparisons with a previously proposed workflow simulator.
© 2020 Published by Elsevier B.V.
https://doi.org/10.1016/j.future.2020.05.030
0167-739X/© 2020 Published by Elsevier B.V.
H. Casanova, R. Ferreira da Silva, R. Tanaka et al. / Future Generation Computer Systems 112 (2020) 162–175 163
considered in available theoretical results. Consequently, current 1. We justify the need for WRENCH and explain how it im-
research that aims at improving and evolving the state of the art, proves on the state of the art.
although sometimes informed by theory, is mostly done via ‘‘real- 2. We describe the high-level simulation abstractions pro-
world’’ experiments: designs and algorithms are implemented, vided by WRENCH that (i) make it straightforward to im-
evaluated, and selected based on experiments conducted for a plement full-fledged simulated versions of complex WMS
particular WMS implementation with particular workflow con- systems; and (ii) make it possible to instantiate simulation
figurations on particular platforms. As a corollary, from the WMS scenarios with only few lines of code.
user’s perspective, quantifying accurately how a WMS would 3. Via two case studies with the Pegasus [2] production WMS
perform for a particular workflow configuration on a particu- and the WorkQueue [49] application execution framework,
lar platform entails actually executing that workflow on that we evaluate the ease-of-use, accuracy, and scalability of
platform. WRENCH, and compare it with a previously proposed sim-
Unfortunately, real-world experiments have limited scope, ulator, WorkflowSim [35].
which impedes WMS research and development. This is because
they are confined to application and platform configurations This paper is organized as follows. Section 2 discusses re-
available at hand, and thus cover only a small subset of the lated work. Section 3 outlines the design of WRENCH and de-
relevant scenarios that may be encountered in practice. Further- scribes how its APIs are used to implement simulators. Section 4
more, exclusively relying on real-world experiments makes it presents our case studies. Finally, Section 5 concludes with a brief
difficult or even impossible to investigate hypothetical scenarios summary of results and a discussion of future research directions.
(e.g., ‘‘What if the network had a different topology?’’, ‘‘What if
there were 10 times more compute nodes but they had half as 2. Related work
many cores?’’). Real-world experiments, especially when large-
scale, are often not fully reproducible due to shared networks and Many simulation frameworks have been developed for paral-
lel and distributed computing research and development. They
compute resources, and due to transient or idiosyncratic behav-
span domains such as HPC [18–21], Grid [22–24], Cloud [25–
iors (maintenance schedules, software upgrades, and particular
27], Peer-to-peer [28,29], or Volunteer Computing [30–32]. Some
software (mis)configurations). Running real-world experiments is
frameworks have striven to be applicable across some or all or
also time-consuming, thus possibly making it difficult to obtain
the above domains [33,34]. Two conflicting concerns are accuracy
statistically significant numbers of experimental results. Real-
(the ability to capture the behavior of a real-world system with
world experiments are driven by WMS implementations that
as little bias as possible) and scalability (the ability to simulate
often impose constraints on workflow executions. Furthermore,
large systems with as few CPU cycles and bytes of RAM as
WMSs are typically not monolithic but instead reuse CyberInfras-
possible). The aforementioned simulation frameworks achieve
tructure (CI) components that impose their own overheads and
different compromises between these two concerns by using var-
constraints on workflow execution. Exploring what lies beyond
ious simulation models. At one extreme are discrete event models
these constraints via real-world executions, e.g., for research and
that simulate the ‘‘microscopic’’ behavior of hardware/software
development purposes, typically entails unacceptable software
systems (e.g., by relying on packet-level network simulation for
(re-)engineering costs. Finally, running real-world experiments
communication [50], on cycle-accurate CPU simulation [51] or
can also be labor-intensive. This is due to the need to install
emulation for computation). In this case, the scalability challenge
and execute many full-featured software stacks, including actual
can be handled by using Parallel Discrete Event Simulation [52],
scientific workflow implementations, which is often not deemed
i.e., the simulation itself is a parallel application that requires a
worthwhile for ‘‘just testing out’’ ideas.
parallel platform whose scale is at least commensurate to that
An alternative to conducting WMS research via real-world
of the simulated platform. At the other extreme are analytical
experiments is to use simulation, i.e., implement a software ar-
models that capture ‘‘macroscopic’’ behaviors (e.g., transfer times
tifact that models the functional and performance behaviors of as data sizes divided by bottleneck bandwidths, compute times
software and hardware stacks of interest. Simulation is used in as numbers of operations divided by compute speeds). While
many computer science domains and can address the limitations these models are typically more scalable, they must be developed
of real-world experiments outlined above. Several simulation with care so that they are accurate. In previous work, it has
frameworks have been developed that target the parallel and been shown that several available simulation frameworks use
distributed computing domain [18–34]. Some simulation frame- macroscopic models that can exhibit high inaccuracy [43].
works have also been developed specifically for the scientific A number of simulators have been developed that target scien-
workflow domain [11,35–40]. tific workflows. Some of them are stand-alone simulators [11,35–
We claim that advances in simulation capabilities in the field 37,53]. Others are integrated with a particular WMS to promote
have made it possible to simulate WMSs that execute large work- more faithful simulation and code re-use [38,39,54] or to exe-
flows using diverse CI services deployed on large-scale platforms cute simulations at runtime to guide on-line scheduling decisions
in a way that is accurate (via validated simulation models), scal- made by the WMS [40].
able (fast execution and low memory footprint), and expressive The authors in [39] conduct a critical analysis of the state-of-
(ability to describe arbitrary platforms, complex WMSs, and com- the-art of workflow simulators. They observe that many of these
plex software infrastructure). In this work, we build on the ex- simulators do not capture the details of underlying infrastruc-
isting open-source SimGrid simulation framework [33,41], which tures and/or use naive simulation models. This is the case with
has been one of the drivers of the above advances and whose custom simulators such as that in [36,37,40]. But it is also the
simulation models have been extensively validated [42–46], to case with workflow simulators built on top of generic simulation
develop a WMS simulation framework called WRENCH [47]. More frameworks that provide convenient user-level abstractions but
specifically, this work makes the following contributions1 : fail to model the details of the underlying infrastructure, e.g., the
simulators in [11,35,38], which build on the CloudSim [25] or
1 A preliminary shorter version of this paper appears in the proceed- GroudSim [24] frameworks. These frameworks have been shown
ings of the 2018 Workshop on Workflows in Support of Large-Scale Science to lack in their network modeling capabilities [43]. As a result,
(WORKS) [48]. some authors readily recognize that their simulators are likely
164 H. Casanova, R. Ferreira da Silva, R. Tanaka et al. / Future Generation Computer Systems 112 (2020) 162–175
1. Users who implement simulated WMSs — These users are Fig. 1 depicts WRENCH’s software architecture. At the bottom
engaged in WMS research and development activities and layer is the Simulation Core, which simulates low-level soft-
need an ‘‘in simulation’’ version of their current or intended ware and hardware stacks using the simulation abstractions and
WMS. Their goals typically include evaluating how their models provided by SimGrid (see Section 3.3). The next layer
WMS behaves over hypothetical experimental scenarios implements simulated CI services that are commonly found in
and comparing competing algorithm and system design current distributed platforms and used by production WMSs. At
options. For these users, WRENCH provides the WRENCH the time of this writing, WRENCH provides services in 4 cate-
Developer API (described in Section 3.4) that eases WMS gories: compute services that provide access to compute resources
development by removing the typical difficulties involved to execute workflow tasks; storage services that provide access to
when developing, either in real-world or in simulation storage resources for storing workflow data; network monitoring
mode, a system comprised of distributed components that services that can be queried to determine network distances;
interact both synchronously and asynchronously. To this and data registry services that can be used to track the location
end, WRENCH makes it possible to implement a WMS as of (replicas of) workflow data. Each category includes multiple
a single thread of control that interacts with simulated CI service implementations, so as to capture specifics of currently
services via high-level APIs and must react to a small set of available CI services used in production. For instance, WRENCH
asynchronous events. includes a ‘‘batch-scheduled cluster’’ compute service, a ‘‘cloud’’
2. Users who execute simulated WMSs — These users simu- compute service, and a ‘‘bare-metal’’ compute service. The above
late how given WMSs behave for particular workflows on layer in the software architecture consists of simulated WMS, that
particular platforms. Their goals include comparing differ- interact with CI services using the WRENCH Developer API (see
ent WMSs, determining how a given WMS would behave Section 3.4). These WMS implementations, which can simulate
H. Casanova, R. Ferreira da Silva, R. Tanaka et al. / Future Generation Computer Systems 112 (2020) 162–175 165
production WMSs or WMS research prototypes, are not included Algorithm 1 Blueprint for a WMS execution
as part of the WRENCH distribution, but implemented as stand-
1: procedure Main(w orkflow )
alone projects. Two such projects are the simulated Pegasus and 2: Obtain list of available services
Workqueue implementations used for our case study in Section 4. 3: Gather static information about the services
Finally, the top layer consists of simulators that configure and 4: while w orkflow execution has not completed/failed do
instantiate particular CI services and particular WMSs on a given 5: Gather dynamic service/resource information
simulated hardware platform, that launch the simulation, and 6: Make data/computation scheduling decisions
that analyze the simulation outcome. These simulators use the 7: Interact with services to enact decisions
WRENCH User API (see Section 3.5). Here again, these simula- 8: Wait for and react to the next event
tors are not part of WRENCH, but implemented as stand-alone 9: end while
projects. 10: return
11: end procedure
WRENCH’s simulation core is implemented using SimGrid’s service, the list of hosts monitored by a network monitoring
S4U API, which provides all necessary abstractions and models to service, etc. Then, the WMS iterates until the workflow execution
simulate computation, I/O, and communication activities on arbi- is complete or has failed (line 4). At each iteration it gathers
trary hardware platform configurations. These platform configu- dynamic information about available services and resources if
rations are defined by XML files that specify network topologies needed (line 5). Example of such information include currently
and endpoints, compute resources, and storage resources [56]. available capacities at compute or storage services, current net-
At its most fundamental level, SimGrid provides a Concur- work distances between pairs of hosts, etc. Based on resource
rent Sequential Processes (CSP) model: a simulation consists of information and on the current state of the workflow, the WMS
sequential threads of control that consume hardware resources. can then make whatever scheduling decisions it sees fit (line 7).
These threads of control can implement arbitrary code, exchange It then enacts these decisions by interacting with appropriate
messages via a simulated network, can perform computation on services. For instance, it could decide to submit a ‘‘job’’ to a
simulated (multicore) hosts, and can perform I/O on simulated compute service to execute a ready task on some number of cores
storage devices. In addition, SimGrid provides a virtual machine at some compute service and copy all produced files to some
abstraction that includes a migration feature. Therefore, SimGrid storage service, or it could decide to just copy a file between
provides all the base abstractions necessary to implement the storage services and then update a data location service to keep
classes of distributed systems that are relevant to scientific work- track of the location of this new file replica. It could also submit
flow executions. However, these abstractions are low-level and a one or more pilot jobs [57] to compute services if they support
common criticism of SimGrid is that implementing a simulation them. It is the responsibility of the developer to implement all
of a complex system requires a large software engineering effort. decision-making algorithms employed by the WMS. At the end
A WMS executing a workflow using several CI services is a com- of the iteration, the WMS simply waits for a workflow execution
plex system, and WRENCH builds on top of SimGrid to provide event to which it can react if need be. Most common events are
high-level abstractions so that implementing this complex system job completions/failures and data transfer completions/failures.
is not labor-intensive. The WRENCH Developer API provides a rich set of methods to
We have selected SimGrid for WRENCH for the following create and analyze a workflow and to interact with CI services
reasons. SimGrid has been used successfully in many distributed to execute a workflow. These methods were designed based
computing domains (cluster, peer-to-peer, grid, cloud, volunteer on current and envisioned capabilities of current state-of-the-
computing, etc.), and thus can be used to simulate WMSs that ex- art WMSs. We refer the reader to the WRENCH Web site [47]
ecute over a wide range of platforms. SimGrid is open source and for more information on how to use this API and for the full
freely available, has been stable for many years, is actively devel- API documentation. The key objective of this API is to make it
oped, has a sizable user community, and has provided simulation straightforward to implement a complex system, namely a full-
results for over 350 research publications since its inception. Sim- fledged WMS that interact with diverse CI services. We achieve
Grid has also been the object of many invalidation and validation this objective by providing simple solutions and abstractions to
studies [42–46], and its simulation models have been shown to handle well-known challenges when implementing a complex
provide compelling advantages over other simulation frameworks distributed system (whether in the real world or in simulation),
in terms of both accuracy and scalability [33]. Finally, most Sim- as explained hereafter.
Grid simulations can be executed in minutes on a standard laptop SimGrid provides simple point-to-point communication be-
computer, making it possible to perform large numbers of simu- tween threads of control via a mailbox abstraction. One of the
lations quickly with minimal compute resource expenses. To the recognized strengths of SimGrid is that it employs highly accurate
best of our knowledge, among comparable available simulation and yet scalable network simulation models. However, unlike
frameworks (as reviewed in Section 2), SimGrid is the only one some of its competitors, it does not provide any higher-level
to offer all the above desirable characteristics. simulation abstractions meaning that distributed systems must
be implemented essentially from scratch, with message-based
3.4. WRENCH Developer API interactions between processes. All message-based interaction
is abstracted away by WRENCH, and although the simulated CI
With the Developer API, a WMS is implemented as a single services exchange many messages with the WMS and among
thread of control that executes according to the pseudo-code themselves, the WRENCH Developer API only exposes higher-
blueprint shown in Algorithm 1. Given a workflow to execute, level interaction with services (‘‘run this job’’, ‘‘move this data’’)
a WMS first gathers information about all the CI services it and only requires that the WMS handle a few events. The WMS
can use to execute the workflow (lines 2–3). Examples of such developer thus completely avoids the need to send and receive
information include the number of compute nodes provided by a (and thus orchestrate) network messages.
compute service, the number of cores per node and the speed of Another challenge when developing a system like a WMS
these cores, the amount of storage space available in a storage is the need to handle asynchronous interactions. While some
166 H. Casanova, R. Ferreira da Silva, R. Tanaka et al. / Future Generation Computer Systems 112 (2020) 162–175
is possible to specify all message payloads in bytes (e.g., for • Montage [2]: A compute-intensive astronomy workflow for
control messages). Other parameters encompass various over- generating custom mosaics of the sky. For this experiment,
heads, either in seconds or in computation volumes (e.g., task we ran Montage for processing 1.5 and 2.0 square degrees
startup overhead on a compute service). In WRENCH, service mosaic 2MASS. We thus refer to each configuration as
implementations come with default values for all these param- Montage-1.5 and Montage-2.0, respectively. Montage-1.5,
eters, but it is possible to pick custom values upon service in- resp. Montage-2.0, comprises 573, resp. 1240, tasks.
stantiation. The process of picking parameter values so as to • SAND [64]: A compute-intensive bioinformatics workflow
match a specific real-world system is referred to as simulation for accelerating genome assembly. For this experiment, we
calibration. We calibrated our simulator by measuring delays ran SAND for a full set of reads from the Anopheles gambiae
observed in event traces of real-world executions for workflows Mopti form. We consider a SAND instance that comprises
on hardware/software infrastructures (see Section 4.3). 606 tasks.
The simulator code, details on the simulation calibration pro- We use these platforms, deploying on each a submit node (which
cedure, and experimental scenarios used in the rest of this section runs Pegasus and DAGMan or WorkQueue, and HTCondor’s job
are all publicly available online [61]. submission and central manager services), four worker nodes (4
or 24 cores per node/shared file system), and a data node in the
4.2. Implementing WorkQueue with WRENCH WAN:
We consider experimental scenarios defined by particular The fourth column in Table 1 shows average relative differ-
workflow instances to be executed on particular platforms. Due ences between actual and simulated makespans. We see that
to the lack of publicly available detailed workflow execution simulated makespans are close to actual makespans for all three
traces (i.e., execution logs that include data sizes for all files, Pegasus scenarios (average relative error is below 5%). One of
all execution delays, etc.), we have performed real workflow the key advantages of building WRENCH on top of SimGrid is
executions with Pegasus and WorkQueue, and collected raw, that WRENCH simulators benefit from the high-accuracy net-
time-stamped event traces from these executions. These traces work models in SimGrid. In particular, these models capture
form the ground truth to which we can compare simulated many features of the TCP protocol (without resorting to packet-
executions. We consider these workflow applications: level simulation). And indeed, when comparing real-world and
simulated executions we observe average relative error below
• 1000Genome [63]: A data-intensive workflow that identi- 3% for data movement operations. Furthermore, the many pro-
fies mutational overlaps using data from the 1000 genomes cesses involved in a workflow execution interact by exchanging
project in order to provide a null distribution for rigor- (typically small) control messages, and our simulators simulate
ous statistical evaluation of potential disease-related muta- these message exchanges. For instance, each time an output file
tions. We consider a 1000Genome instance that comprises is produced by a task a data registry service is contacted so
71 tasks. that a new entry can be added to its database of file replicas,
H. Casanova, R. Ferreira da Silva, R. Tanaka et al. / Future Generation Computer Systems 112 (2020) 162–175 169
Table 1
Average simulated makespan error (%), and p-values and Kolmogorov–Smirnov (KS) distances for task submission and completion dates, computed
for 5 runs of each of our 4 experimental scenarios.
Experimental scenario Avg. Makespan Task submissions Tasks completions
Workflow System Platform Error (%) p-value Distance p-value Distance
1000Genome Pegasus ExoGENI 1.10 ± 0.28 0.06 ± 0.01 0.21 ± 0.04 0.72 ± 0.06 0.12 ± 0.01
Montage-1.5 Pegasus AWS-t2.xlarge 4.25 ± 1.16 0.08 ± 0.01 0.16 ± 0.03 0.12 ± 0.05 0.21 ± 0.02
Montage-2.0 Pegasus AWS-m5.xlarge 3.37 ± 0.46 0.11 ± 0.03 0.06 ± 0.02 0.10 ± 0.01 0.11 ± 0.01
SAND WorkQueue Chameleon 3.96 ± 1.04 0.06 ± 0.01 0.11 ± 0.02 0.09 ± 0.02 0.09 ± 0.03
Fig. 5. Task execution Gantt chart for sample real-world (‘‘pegasus’’) and simu- Fig. 7. Task execution Gantt chart for sample real-world (‘‘workqueue’’) and
lated (‘‘wrench’’) executions of the Montage-2.0 workflow on the AWS-m5.xlarge simulated (‘‘wrench’’) executions of the SAND framework on the Chameleon
platform. Cloud platform.
Table 2
Simulated workflow makespans and simulation times averaged over 5 runs of each of our 4
experimental scenarios.
Experimental scenario Avg. workflow Avg. simulation
Workflow System Platform Makespan (s) Time (s)
1000Genome Pegasus ExoGENI 761.0 ± 7.93 0.3 ± 0.01
Montage-1.5 Pegasus AWS-t2.xlarge 1,784.0 ± 137.67 8.3 ± 0.09
Montage-2.0 Pegasus AWS-m5.xlarge 2,911.8 ± 48.80 28.1 ± 0.52
SAND WorkQueue Chameleon 5,339.2 ± 133.56 16.3 ± 0.86
Fig. 9. Example fully functional WRENCH simulator. Try-catch clauses are omitted.
all distributed system simulators. In our case studies, we have CRediT authorship contribution statement
calibrated these parameters manually by analyzing and compar-
ing simulated and real-world execution event traces. While, to Henri Casanova: Conceptualization, Methodology, Software,
the best of our knowledge, this is the typical practice, what is Validation, Formal analysis, Investigation, Data curation, Writing -
truly needed is an automated calibration method. Ideally, this original draft, Visualization, Funding acquisition. Rafael Ferreira
da Silva: Conceptualization, Methodology, Software, Validation,
method would process a (small) number of (not too large) real-
Formal analysis, Investigation, Data curation, Writing - original
world execution traces for ‘‘training scenarios’’, and compute a
draft, Visualization, Funding acquisition. Ryan Tanaka: Software,
valid and robust set of calibration parameter values. An important Writing - review & editing. Suraj Pandey: Software. Gautam Jeth-
research question will then be to understand to which extent wani: Software. William Koch: Software. Spencer Albrecht: Soft-
these automatically computed calibrations can be composed and ware. James Oeth: Software. Frédéric Suter: Software, Writing -
extrapolated to scenarios beyond the training scenarios. review & editing, Funding acquisition.
H. Casanova, R. Ferreira da Silva, R. Tanaka et al. / Future Generation Computer Systems 112 (2020) 162–175 173
Declaration of competing interest For brevity, the example in Fig. 9 omits try/catch clauses.
Also, note that although the simulator uses the new operator to
The authors declare that they have no known competing finan- instantiate WRENCH objects, the simulation object takes owner-
cial interests or personal relationships that could have appeared ship of these objects (using unique or shared pointers), so that
to influence the work reported in this paper. there is no memory deallocation onus placed on the user.
Acknowledgments References
This work is funded by National Science Foundation NSF, [1] I.J. Taylor, E. Deelman, D.B. Gannon, M. Shields, Workflows for E-Science:
Scientific Workflows for Grids, Springer Publishing Company, Incorporated,
USA contracts #1642369 and #1642335, ‘‘SI2-SSE: WRENCH: A 2007, http://dx.doi.org/10.1007/978-1-84628-757-2.
Simulation Workbench for Scientific Worflow Users, Developers, [2] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P.J. Maechling, R.
and Researchers’’; by CNRS, France under grant #PICS07239; Mayani, W. Chen, R. Ferreira da Silva, M. Livny, K. Wenger, Pegasus:
and partly funded by National Science Foundation NSF, USA a workflow management system for science automation, Future Gener.
Comput. Syst. 46 (2015) 17–35, http://dx.doi.org/10.1016/j.future.2014.10.
contracts #1923539 and #1923621: ‘‘CyberTraining: Implemen-
008.
tation: Small: Integrating core CI literacy and skills into university [3] T. Fahringer, R. Prodan, R. Duan, J. Hofer, F. Nadeem, F. Nerieri, S. Podlipnig,
curricula via simulation-driven activities’’. We thank Martin Quin- J. Qin, M. Siddiqui, H.-L. Truong, et al., Askalon: A development and
son, Arnaud Legrand, and Pierre-François Dutot for their valuable grid computing environment for scientific workflows, in: Workflows for
help. We also thank the NSF Chameleon Cloud, USA for providing E-Science, Springer, 2007, pp. 450–471, http://dx.doi.org/10.1007/978-1-
84628-757-2_27.
time grants to access their resources. [4] M. Wilde, M. Hategan, J.M. Wozniak, B. Clifford, D.S. Katz, I. Foster, Swift:
A language for distributed parallel scripting, Parallel Comput. 37 (9) (2011)
Appendix. Example WRENCH simulator 633–652, http://dx.doi.org/10.1016/j.parco.2011.05.005.
[5] K. Wolstencroft, R. Haines, D. Fellows, A. Williams, D. Withers, S. Owen, S.
Soiland-Reyes, I. Dunlop, A. Nenadic, P. Fisher, et al., The taverna workflow
An example WRENCH simulator developed using the WRENCH suite: designing and executing workflows of web services on the desktop,
User API (see Section 3.5) is shown in Fig. 9. This simulator uses web or in the cloud, Nucleic Acids Res. (2013) http://dx.doi.org/10.1093/
a WMS implementation (called SomeWMS) that has already been nar/gkt328, gkt328.
developed using the WRENCH Developer API (see Section 3.4). [6] I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, S. Mock, Kepler:
an extensible system for design and execution of scientific workflows, in:
After initializing the simulation (lines 7–8), the simulator in- Scientific and Statistical Database Management, 2004. Proceedings. 16th
stantiates a platform (line 11) and a workflow (line 14–15). A International Conference on, IEEE, 2004, pp. 423–424, http://dx.doi.org/10.
workflow is defined as a set of computation tasks and data files, 1109/SSDM.2004.1311241.
with control and data dependencies between tasks. Each task [7] M. Albrecht, P. Donnelly, P. Bui, D. Thain, Makeflow: A portable abstrac-
tion for data intensive computing on clusters, clouds, and grids, in: 1st
can also have a priority, which can then be taken into account
ACM SIGMOD Workshop on Scalable Workflow Execution Engines and
by a WMS for scheduling purposes. Although the workflow can Technologies, ACM, 2012, p. 1, http://dx.doi.org/10.1145/2443416.2443417.
be defined purely programmatically, in this example the work- [8] N. Vydyanathan, U.V. Catalyurek, T.M. Kurc, P. Sadayappan, J.H. Saltz,
flow is imported from a workflow description file in the DAX Toward optimizing latency under throughput constraints for application
format [66]. At line 18 the simulator creates a storage service workflows on clusters, in: Euro-Par 2007 Parallel Processing, Springer,
2007, pp. 173–183, http://dx.doi.org/10.1007/978-3-540-74466-5_20.
with 1PiB capacity accessible on host storage_host. This and [9] A. Benoit, V. Rehn-Sonigo, Y. Robert, Optimizing latency and reliability
other hostnames are specified in the XML platform description of pipeline workflow applications, in: Parallel and Distributed Processing,
file. At line 22 the simulator creates a compute service that 2008. IPDPS 2008. IEEE International Symposium on, IEEE, 2008, pp. 1–10,
corresponds to a 4-node batch-scheduled cluster. The physical http://dx.doi.org/10.1109/IPDPS.2008.4536160.
[10] Y. Gu, Q. Wu, Maximizing workflow throughput for streaming applications
characteristics of the compute nodes (node[1-4]) are specified
in distributed environments, in: Computer Communications and Networks
in the platform description file. This compute service has a 1TiB (ICCCN), 2010 Proceedings of 19th International Conference on, IEEE, 2010,
scratch storage space. Its behavior is customized by passing a pp. 1–6, http://dx.doi.org/10.1109/ICCCN.2010.5560146.
couple of property-value pairs to its constructor. It will be subject [11] M. Malawski, G. Juve, W. Deelman, J. Nabrzyski, Algorithms for cost- and
to a background load as defined by a trace in the standard SWF deadline-constrained provisioning for scientific workflow ensembles in iaas
clouds, Future Gener. Comput. Syst. 48 (2015) 1–18, http://dx.doi.org/10.
format [67], and its batch queue will be managed using the 1016/j.future.2015.01.004.
EASY Backfilling scheduling algorithm [68]. The simulator then [12] J. Chen, Y. Yang, Temporal dependency-based checkpoint selection for
creates a second compute service (line 28), which is a 4-host dynamic verification of temporal constraints in scientific workflow sys-
cloud service with 4TiB scratch space, customized so that it does tems, ACM Trans. Softw. Eng. Methodol. (TOSEM) 20 (3) (2011) 9, http:
//dx.doi.org/10.1145/2000791.2000793.
not support pilot jobs. Two helper services are instantiated, a
[13] G. Kandaswamy, A. Mandal, D. Reed, et al., Fault tolerance and recovery of
data registry service so that the WMS can keep track of file scientific workflows on computational grids, in: Cluster Computing and the
locations (line 33) and a network monitoring service that uses Grid, 2008. CCGRID’08. 8th IEEE International Symposium on, IEEE, 2008,
the Vivaldi algorithm [69] to measure network distances between pp. 777–782, http://dx.doi.org/10.1109/CCGRID.2008.79.
the two hosts from which the compute services are accessed [14] R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of workflow activity
incidents on distributed computing infrastructures, Future Gener. Comput.
(batch_login and cloud_gateway) and the my_host host, Syst. 29 (8) (2013) 2284–2294, http://dx.doi.org/10.1016/j.future.2013.06.
which is the host that runs these helper services and the WMS 012.
(line 36). At line 41, the simulator specifies that the workflow [15] W. Chen, R. Ferreira da Silva, E. Deelman, T. Fahringer, Dynamic and fault-
data file input_file is initially available at the storage service. tolerant clustering for scientific workflows, IEEE Trans. Cloud Comput. 4
(1) (2016) 49–62, http://dx.doi.org/10.1109/TCC.2015.2427200.
It then instantiates the WMS and passes to it all available services
[16] H.M. Fard, R. Prodan, J.J.D. Barrionuevo, T. Fahringer, A multi-objective
(line 44), and assigns the workflow to it (line 47). The crucial approach for workflow scheduling in heterogeneous environments, in:
call is at line 50, where the simulation is launched and the Proceedings of the 2012 12th IEEE/ACM International Symposium on
simulator hands off control to WRENCH. When this call returns Cluster, Cloud and Grid Computing (Ccgrid 2012), IEEE Computer Society,
the workflow has either completed or failed. Assuming it has 2012, pp. 300–309, http://dx.doi.org/10.1109/CCGrid.2012.114.
[17] I. Pietri, M. Malawski, G. Juve, E. Deelman, J. Nabrzyski, R. Sakellariou,
completed, the simulator then retrieves the ordered set of task Energy-constrained provisioning for scientific workflow ensembles, in:
completion events (line 53) and performs some (in this example, Cloud and Green Computing (CGC), 2013 Third International Conference
trivial) mining of these events (line 55). on, IEEE, 2013, pp. 34–41, http://dx.doi.org/10.1109/CGC.2013.14.
174 H. Casanova, R. Ferreira da Silva, R. Tanaka et al. / Future Generation Computer Systems 112 (2020) 162–175
[18] M. Tikir, M. Laurenzano, L. Carrington, A. Snavely, PSINS: An open source [39] G. Kecskemeti, S. Ostermann, R. Prodan, Fostering energy-awareness in
event tracer and execution simulator for mpi applications, in: Proc. of the simulations behind scientific workflow management systems, in: Proc. of
15th Intl. Euro-Par Conf. on Parallel Processing, in: LNCS, (5704) Springer, the 7th IEEE/ACM Intl. Conf. on Utility and Cloud Computing, 2014, pp.
2009, pp. 135–148, http://dx.doi.org/10.1007/978-3-642-03869-3_16. 29–38, http://dx.doi.org/10.1109/UCC.2014.11.
[19] T. Hoefler, T. Schneider, A. Lumsdaine, Loggopsim - simulating large- [40] J. Cao, S. Jarvis, S. Saini, G. Nudd, Gridflow: Workflow management for grid
scale applications in the loggops model, in: Proc. of the ACM Workshop
computing, in: Proc. of the 3rd IEEE/ACM Intl. Symp. on Cluster Computing
on Large-Scale System and Application Performance, 2010, pp. 597–604,
and the Grid (CCGrid), 2003, pp. 198–205.
http://dx.doi.org/10.1145/1851476.1851564.
[20] G. Zheng, G. Kakulapati, L. Kalé, Bigsim: A parallel simulator for perfor- [41] The simgrid project, 2019, Available at http://simgrid.org/.
mance prediction of extremely large parallel machines, in: Proc. of the [42] P. Bedaride, A. Degomme, S. Genaud, A. Legrand, G. Markomanolis, M.
18th Intl. Parallel and Distributed Processing Symposium (IPDPS), 2004, Quinson, M. Stillwell, F. Suter, B. Videau, Toward better simulation of
http://dx.doi.org/10.1109/IPDPS.2004.1303013. MPI applications on ethernet/tcp networks, in: Prod. of the 4th Intl.
[21] R. Bagrodia, E. Deelman, T. Phan, Parallel simulation of large-scale parallel Workshop on Performance Modeling, Benchmarking and Simulation of
applications, Int. J. High Perform. Comput. Appl. 15 (1) (2001) 3–12, High Performance Computer Systems, 2013, http://dx.doi.org/10.1007/978-
http://dx.doi.org/10.1177/109434200101500101. 3-319-10214-6_8.
[22] W.H. Bell, D.G. Cameron, A.P. Millar, L. Capozza, K. Stockinger, F. Zini, [43] P. Velho, L. Mello Schnorr, H. Casanova, A. Legrand, On the validity of
Optorsim - a grid simulator for studying dynamic data replication strate-
flow-level TCP network models for grid and cloud simulations, ACM Trans.
gies, Int. J. High Perform. Comput. Appl. 17 (4) (2003) 403–416, http:
Model. Comput. Simul. 23 (4) (2013) http://dx.doi.org/10.1145/2517448.
//dx.doi.org/10.1177/10943420030174005.
[23] R. Buyya, M. Murshed, Gridsim: A toolkit for the modeling and simulation [44] P. Velho, A. Legrand, Accuracy study and improvement of network sim-
of distributed resource management and scheduling for grid computing, ulation in the simgrid framework, in: Proc. of the 2nd Intl. Conf. on
Concurr. Comput.: Pract. Exper. 14 (13–15) (2003) 1175–1220, http://dx. Simulation Tools and Techniques, 2009, http://dx.doi.org/10.4108/ICST.
doi.org/10.1002/cpe.710. SIMUTOOLS2009.5592.
[24] S. Ostermann, R. Prodan, T. Fahringer, Dynamic cloud provisioning for [45] K. Fujiwara, H. Casanova, Speed and Accuracy of Network Simulation in
scientific grid workflows, in: Proc. of the 11th ACM/IEEE Intl. Conf. on the SimGrid Framework, in: Proc. of the 1st Intl. Workshop on Network
Grid Computing (Grid), 2010, pp. 97–104, http://dx.doi.org/10.1109/GRID. Simulation Tools, 2007.
2010.5697953. [46] A. Lèbre, A. Legrand, F. Suter, P. Veyre, Adding storage simulation Capacities
[25] R.N. Calheiros, R. Ranjan, A. Beloglazov, C.A.F. De Rose, R. Buyya, Cloudsim:
to the simgrid toolkit: Concepts, models, and API, in: Proc. of the 8th IEEE
A toolkit for modeling and simulation of cloud computing environments
Intl. Symp. on Cluster Computing and the Grid, 2015, http://dx.doi.org/10.
and evaluation of resource provisioning algorithms, Softw. - Pract. Exp. 41
1109/CCGrid.2015.134.
(1) (2011) 23–50, http://dx.doi.org/10.1002/spe.995.
[26] A. Nez, J. Vázquez-Poletti, A. Caminero, J. Carretero, I.M. Llorente, Design [47] The WRENCH project, 2020, https://wrench-project.org.
of a new cloud computing simulation platform, in: Proc. of the 11th Intl. [48] H. Casanova, S. Pandey, J. Oeth, R. Tanaka, F. Suter, R. Ferreira da
Conf. on Computational Science and Its Applications, 2011, pp. 582–593, Silva, WRENCH: A framework for simulating workflow management
http://dx.doi.org/10.1007/978-3-642-21931-3_45. systems, in: 13th Workshop on Workflows in Support of Large-Scale
[27] G. Kecskemeti, DISSECT-CF: A simulator to foster energy-aware scheduling Science (WORKS’18), 2018, pp. 74–85, http://dx.doi.org/10.1109/WORKS.
in infrastructure clouds, Simul. Model. Pract. Theory 58 (2) (2015) 188–218, 2018.00013.
http://dx.doi.org/10.1016/j.simpat.2015.05.009. [49] L. Yu, C. Moretti, A. Thrasher, S. Emrich, K. Judd, D. Thain, Harnessing par-
[28] A. Montresor, M. Jelasity, Peersim: A scalable p2p simulator, in: Proc. of allelism in multicore clusters with the all-pairs, wavefront, and makeflow
the 9th Intl. Conf. on Peer-To-Peer, 2009, pp. 99–100, http://dx.doi.org/10.
abstractions, Cluster Comput. 13 (3) (2010) 243–256, http://dx.doi.org/10.
1109/P2P.2009.5284506.
1007/s10586-010-0134-7.
[29] I. Baumgart, B. Heep, S. Krause, Oversim: A flexible overlay network sim-
ulation framework, in: Proc. of the 10th IEEE Global Internet Symposium, [50] The ns-3 Network Simulator, Available at http://www.nsnam.org.
IEEE, 2007, pp. 79–84, http://dx.doi.org/10.1109/GI.2007.4301435. [51] E. León, R. Riesen, A. Maccabe, P. Bridges, Instruction-level simulation of
[30] M. Taufer, A. Kerstens, T. Estrada, D. Flores, P.J. Teller, Simba: A dis- a cluster at scale, in: Proc. of the Intl. Conf. for High Performance Com-
crete event simulator for performance prediction of volunteer computing puting and Communications (SC), 2009, http://dx.doi.org/10.1145/1654059.
projects, in: Proc. of the 21st Intl. Workshop on Principles of Advanced 1654063.
and Distributed Simulation, 2007, pp. 189–197, http://dx.doi.org/10.1109/ [52] R. Fujimoto, Parallel discrete event simulation, Commun. ACM 33 (10)
PADS.2007.27. (1990) 30–53, http://dx.doi.org/10.1145/84537.84545.
[31] T. Estrada, M. Taufer, K. Reed, D.P. Anderson, Emboinc: An emulator for
[53] V. Cima, J. Beránek, S. Böhm, ESTEE: A Simulation Toolkit for Distributed
performance analysis of BOINC projects, in: Proc. of the Workshop on
Workflow Execution, in: Proceedings of the International Conference for
Large-Scale and Volatile Desktop Grids (PCGrid), 2009, http://dx.doi.org/
High Performance Computing, Networking, Storage, and Analysis (SC),
10.1109/IPDPS.2009.5161135.
[32] D. Kondo, SimBOINC: A simulator for desktop grids and volunteer Research Poster, 2019.
computing systems, 2007, Available at http://simboinc.gforge.inria.fr/. [54] S. Ostermann, G. Kecskemeti, R. Prodan, Multi-layered simulations at the
[33] H. Casanova, A. Giersch, A. Legrand, M. Quinson, F. Suter, Versatile, heart of workflow enactment on clouds, Concurr. Comput. Pract. Exp. 28
scalable, and accurate simulation of distributed applications and platforms, (2016) 3180—3201, http://dx.doi.org/10.1002/cpe.3733.
J. Parallel Distrib. Comput. 74 (10) (2014) 2899–2917, http://dx.doi.org/10. [55] R. Matha, S. Ristov, R. Prodan, Simulation of a workflow execution as a
1016/j.jpdc.2014.06.008. real cloud by adding noise, Simul. Model. Pract. Theory 79 (2017) 37–53,
[34] C.D. Carothers, D. Bauer, S. Pearce, ROSS: A high-performance, low mem- http://dx.doi.org/10.1016/j.simpat.2017.09.003.
ory, modular time warp system, in: Proc. of the 14th ACM/IEEE/SCS [56] L. Bobelin, A. Legrand, D.A.G. Márquez, P. Navarro, M. Quinson, F. Suter,
Workshop of Parallel on Distributed Simulation, 2000, pp. 53–60, http:
C. Thiery, Scalable multi-purpose network representation for large scale
//dx.doi.org/10.1109/PADS.2000.847144.
distributed system simulation, in: Proceedings of the 12th IEEE/ACM
[35] W. Chen, E. Deelman, Workflowsim: A toolkit for simulating scientific
International Symposium on Cluster, Cloud and Grid Computing (CCGrid),
workflows in distributed environments, in: Proc. of the 8th IEEE Intl.
Conf. on E-Science, 2012, pp. 1–8, http://dx.doi.org/10.1109/eScience.2012. Ottawa, Canada, 2012, pp. 220–227, http://dx.doi.org/10.1109/CCGrid.2012.
6404430. 31.
[36] A. Hirales-Carbajal, A. Tchernykh, T. Röblitz, R. Yahyapour, A grid sim- [57] M. Turilli, M. Santcroos, S. Jha, A comprehensive perspective on pilot-
ulation framework to study advance scheduling strategies for complex job systems, ACM Comput. Surv. 51 (2) (2018) http://dx.doi.org/10.1145/
workflow applications, in: In Proc. of IEEE Intl. Symp. on Parallel Dis- 3177851, 43:1–43:32.
tributed Processing Workshops (IPDPSW), 2010, http://dx.doi.org/10.1109/ [58] J. Frey, Condor DAGMan: Handling inter-job dependencies, Tech. Rep.,
IPDPSW.2010.5470918. University of Wisconsin, Dept. of Computer Science, 2002, URL http:
[37] M.-H. Tsai, K.-C. Lai, H.-Y. Chang, K. Fu Chen, K.-C. Huang, Pewss: A plat- //www.bo.infn.it/calcolo/condor/dagman/.
form of extensible workflow simulation service for workflow scheduling
[59] D. Thain, T. Tannenbaum, M. Livny, Distributed computing in practice: the
research, Softw. - Pract. Exp. 48 (4) (2017) 796–819, http://dx.doi.org/10.
condor experience, Concurr. Comput.: Pract. Exp. 17 (2–4) (2005) 323–356,
1002/spe.2555.
[38] S. Ostermann, K. Plankensteiner, D. Bodner, G. Kraler, R. Prodan, Integration http://dx.doi.org/10.1002/cpe.938.
of an event-based simulation framework into a scientific workflow execu- [60] B. Tovar, R. Ferreira da Silva, G. Juve, E. Deelman, W. Allcock, D. Thain,
tion environment for grids and clouds, in: In Proc. of the 4th ServiceWave M. Livny, A job sizing strategy for high-throughput scientific workflows,
European Conference, 2011, pp. 1–13, http://dx.doi.org/10.1007/978-3- IEEE Trans. Parallel Distrib. Syst. 29 (2) (2018) 240–253, http://dx.doi.org/
642-24755-2_1. 10.1109/TPDS.2017.2762310.
H. Casanova, R. Ferreira da Silva, R. Tanaka et al. / Future Generation Computer Systems 112 (2020) 162–175 175
[61] The WRENCH pegasus simulator, 2019, https://github.com/wrench-project/ Suraj Pandey obtained his Computer Science M.S. de-
pegasus. gree from the University of Hawaii at Manoa, and his
[62] The WRENCH workqueue simulator, 2019, https://github.com/wrench- undergraduate degree from the Institute of Engineering,
project/workqueue. Pulchowk Campus, Nepal.
[63] R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I.M.
Overton, M. Atkinson, Using simple PID-inspired controllers for online
resilient resource management of distributed scientific workflows, Future
Gener. Comput. Syst. 95 (2019) 615–628, http://dx.doi.org/10.1016/j.future.
2019.01.015.
[64] C. Moretti, A. Thrasher, L. Yu, M. Olson, S. Emrich, D. Thain, A framework
for scalable genome assembly on clusters, clouds, and grids, IEEE Trans.
Gautam Jethwani is a Computer Science Under-
Parallel Distrib. Syst. 23 (12) (2012) 2189–2197, http://dx.doi.org/10.1109/ graduate student at the University of Southern
TPDS.2012.80. California.
[65] R. Ferreira da Silva, W. Chen, G. Juve, K. Vahi, E. Deelman, Community
resources for enabling and evaluating research on scientific workflows, in:
10th IEEE International Conference on E-Science, in: eScience’14, 2014, pp.
177–184, http://dx.doi.org/10.1109/eScience.2014.44.
[66] Pegasus’ DAX workflow description format, 2019, https://pegasus.isi.edu/
documentation/creating_workflows.php.
[67] The standard workload format, 2019, http://www.cs.huji.ac.il/labs/parallel/
workload/swf.html.
[68] D. Lifka, The ANL/ibm SP scheduling system, in: Proc. of the 1st Workshop William Kock is a Computer Science M.S. student at
on Job Scheduling Strategies for Parallel Processing, in: LCNS, vol. 949, the University of Hawaii at Manoa.
1995, pp. 295–303, http://dx.doi.org/10.1007/3-540-60153-8_35.
[69] F. Dabek, R. Cox, F. Kaashoek, R. Morris, Vivaldi: A decentralized network
coordinate system, in: Proc. of SIGCOMM, 2004, http://dx.doi.org/10.1145/
1015467.1015471.
Ryan Tanaka is a programmer analyst in the Science Frederic Suter is a CNRS researcher at the IN2P3 Com-
Automation Technologies group at ISI. He received his puting Center in Lyon, France, since 2008. His research
Master’s degree in Computer Science from the Univer- interests include scheduling, Grid computing and plat-
sity of Hawaii at Manoa. His research interests include form and application simulation. He obtained his M.S.
distributed systems and data intensive applications. His from the Université Jules Verne, Amiens, France, in
current work has been focused on developing vari- 1999, his Ph.D. from the Ecole Normale Supérieure de
ous tools used in the Pegasus Workflow Management Lyon, France, in 2002 and his Habilitation Thesis from
System. the Ecole Normale Supérieure de Lyon, France in 2014.