Machine Learning Operations MLOps Overview Definition and Architecture

This document provides an overview, definition, and architecture of Machine Learning Operations (MLOps). It conducted mixed research methods including a literature review, tool review, and expert interviews. As a result, it contributes a definition of MLOps and highlights necessary principles, components, roles, and workflows to automate and operate machine learning products. The goal of MLOps is to address challenges of operationalizing machine learning models and bringing more proofs of concept into production.

Uploaded by

Serhiy Yehress

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

417 views

Machine Learning Operations MLOps Overview Definition and Architecture

Uploaded by

Serhiy Yehress

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Received 20 February 2023, accepted 16 March 2023, date of publication 27 March 2023, date of current version 3 April 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3262138

Machine Learning Operations (MLOps):

Overview, Definition, and Architecture
DOMINIK KREUZBERGER1 , NIKLAS KÜHL 1,2 , AND SEBASTIAN HIRSCHL1
1 IBM, 71139 Ehningen, Germany
2 Information Systems and Human-Centric Artificial Intelligence, University of Bayreuth, 95447 Bayreuth, Germany

Corresponding author: Niklas Kühl (kuehl@uni-bayreuth.de)

This work was supported in part by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Grant 491183248, and
in part by the Open Access Publishing Fund of the University of Bayreuth.

ABSTRACT The final goal of all industrial machine learning (ML) projects is to develop ML products
and rapidly bring them into production. However, it is highly challenging to automate and operationalize
ML products and thus many ML endeavors fail to deliver on their expectations. The paradigm of Machine
Learning Operations (MLOps) addresses this issue. MLOps includes several aspects, such as best practices,
sets of concepts, and development culture. However, MLOps is still a vague term and its consequences
for researchers and professionals are ambiguous. To address this gap, we conduct mixed-method research,
including a literature review, a tool review, and expert interviews. As a result of these investigations,
we contribute to the body of knowledge by providing an aggregated overview of the necessary principles,
components, and roles, as well as the associated architecture and workflows. Furthermore, we provide a
comprehensive definition of MLOps and highlight open challenges in the field. Finally, this work provides
guidance for ML researchers and practitioners who want to automate and operate their ML products with a
designated set of technologies.

INDEX TERMS CI/CD, DevOps, machine learning, MLOps, operations, workflow orchestration.

I. INTRODUCTION ML workflows manually to a great extent, resulting in many

Machine Learning (ML) has become an important technique issues during the operations of the respective ML solution [7].
to leverage the potential of data and allows businesses to To address these issues, the goal of this work is to examine
be more innovative [1], efficient [2], and sustainable [3]. how manual ML processes can be automated and operational-
However, the success of many productive ML applications ized so that more ML proofs of concept can be brought into
in real-world settings falls short of expectations [4]. A large production. In this work, we explore the emerging ML engi-
number of ML projects fail—with many ML proofs of con- neering practice ‘‘Machine Learning Operations’’—MLOps
cept never progressing as far as production [5]. From a for short—precisely addressing the issue of designing and
research perspective, this does not come as a surprise as the maintaining productive ML. We take a holistic perspective to
ML community has focused extensively on the building of gain a common understanding of the involved components,
ML models, but not on (a) building production-ready ML principles, roles, and architectures. While existing research
products and (b) providing the necessary coordination of sheds some light on various specific aspects of MLOps,
the resulting, often complex ML system components and a holistic conceptualization, generalization, and clarification
infrastructure, including the roles required to automate and of ML systems design are still missing. Different perspectives
operate an ML system in a real-world setting [6]. For instance, and conceptions of the term ‘‘MLOps’’ might lead to misun-
in many industrial applications, data scientists still manage derstandings and miscommunication, which, in turn, can lead
to errors in the overall setup of the entire ML system. Thus,
The associate editor coordinating the review of this manuscript and we ask the research question:
approving it for publication was Alberto Cano . RQ: What is MLOps?

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

31866 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
D. Kreuzberger et al.: Machine Learning Operations (MLOps): Overview, Definition, and Architecture

To answer that question, we conduct a mixed-method

research endeavor to (a) identify important principles of
MLOps, (b) carve out functional core components, (c) high-
light the roles necessary to successfully implement MLOps,
and (d) derive a general architecture for ML systems design.
In combination, these insights result in a definition of
MLOps, which contributes to a common understanding of the
term and related concepts.
Therefore, we hope to positively impact academic and
practical discussions by providing clear guidelines for pro-
fessionals and researchers alike with precise responsibilities.
These insights can assist in allowing more proofs of concept
to make it into production by having fewer errors in the
FIGURE 1. Overview of the methodology.
system’s design and, finally, enabling more robust predictions
in real-world environments.
The remainder of this article is structured as follows.
We will first elaborate on the necessary foundations and results demonstrate, DevOps ensures better software qual-
related work in the field. Next, we will give an overview of ity [15]. People in the industry, as well as academics, have
the utilized methodology, consisting of a literature review, gained a wealth of experience in software engineering using
a tool review, and an interview study. We then present the DevOps. This experience is now being used to automate and
insights derived from the application of the methodology operationalize ML.
and conceptualize these by providing a unifying definition.
We conclude the paper with a short summary, limitations, and III. METHODOLOGY
outlook. To derive insights from the academic knowledge base while
also drawing upon the expertise of practitioners from the
II. FOUNDATIONS OF DEVOPS field, we apply a mixed-method approach, as depicted in
In the past, different software process models and develop- Figure 1. As a first step, we conduct a structured literature
ment methodologies surfaced in the field of software engi- review [16], [17] to obtain an overview of relevant research.
neering. Prominent examples include waterfall [8] and the Furthermore, we review relevant tooling support in the field
agile manifesto [9]. Those methodologies have similar aims, of MLOps to gain a better understanding of the technical
namely to deliver production-ready software products. A con- components involved. Finally, we conduct semi-structured
cept called ‘‘DevOps’’ emerged in the years 2008/2009 and interviews [18], [19] with experts from different domains.
aims to reduce issues in software development [10], [11]. On that basis, we conceptualize the term ‘‘MLOps’’ and elab-
DevOps is more than a pure methodology and rather repre- orate on our findings by synthesizing literature and interviews
sents a paradigm addressing social and technical issues in in the next chapter (‘‘Results’’).
organizations engaged in software development. It has the
goal of eliminating the gap between development and oper- A. LITERATURE REVIEW
ations and emphasizes collaboration, communication, and To ensure that our results are based on scientific knowledge,
knowledge sharing. DevOps promotes automation through we conduct a systematic literature review according to the
the tactic of continuous integration, continuous delivery, and method of Webster and Watson [16] and Kitchenham et al.
continuous deployment (CI/CD), enabling fast, frequent, and [17]. After an initial exploratory search, we define our search
reliable releases. Moreover, it is designed to ensure con- query as follows: (((‘‘DevOps’’ OR ‘‘CICD’’ OR ‘‘Continuous
tinuous testing, quality assurance, continuous monitoring, Integration’’ OR ‘‘Continuous Delivery’’ OR ‘‘Continuous
logging, and feedback loops. Due to the commercialization Deployment’’) AND ‘‘Machine Learning’’) OR ‘‘MLOps’’ OR
of DevOps, many DevOps tools are emerging, which can be ‘‘CD4ML’’).
differentiated into six groups [12], [13]: collaboration and We query the scientific databases of Google Scholar, Web
knowledge sharing (e.g., Slack, Trello, GitLab wiki), source of Science, Science Direct, Scopus, and the Association for
code management (e.g., GitHub, GitLab), build process (e.g., Information Systems eLibrary. It should be mentioned that
Maven), continuous integration (e.g., Jenkins, GitLab CI), the use of DevOps for ML, MLOps, and continuous prac-
deployment automation (e.g., Kubernetes, Docker), mon- tices in combination with ML is a relatively new field in
itoring and logging (e.g., Prometheus, Logstash). Cloud academic literature. Thus, only a few peer-reviewed stud-
environments are increasingly equipped with ready-to-use ies are available at the time of this research. Neverthe-
DevOps tooling that is designed for cloud use, facilitating less, to gain experience in this area, the search included
the efficient generation of value [14]. With this novel shift non-peer-reviewed literature as well. The search was per-
towards DevOps, developers need to care about what they formed in May 2021 and resulted in 1,864 retrieved articles.
develop, as they need to operate it as well. As empirical Of those, we screened 194 papers in detail. From that group,