Architecture and Framework For Machine Learning As A Service
Architecture and Framework For Machine Learning As A Service
net/publication/351010034
CITATIONS READS
0 1,060
2 authors, including:
Rammohan Vadavalasa
Technical University of vienna
5 PUBLICATIONS 1 CITATION
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Predicting Breast Cancer Using Supervised and Deep Learning Algorithms View project
All content following this page was uploaded by Rammohan Vadavalasa on 20 April 2021.
Keywords–. Social media, bytes of data every, creation, validation, training, testing, and serving.
© 2020 IJSRET
1916
International Journal of Scientific Research & Engineering Trends
Volume 6, Issue 3, May-June-2020, ISSN (Online): 2395-566X
greater scale of performance because vendors focus on for handling data and application. Some of the
their infrastructure regularly based on evolving hardware applications of PaaS vendors are Google Cloud ML
and infrastructure(ex. CPU or GPU) availability. We Engine[35], IBM Watson Studio[36], Microsoft Azure
need to focus on our business, IoT, and big data ML Services[44], Amazon SageMaker[45], C3 – AI
applications. But Public cloud contains backup, suite[46], DataRobot[47], Deep Cognition[48],
Flexibility, Scalability, and disaster recovery capabilities. Dataiku[49] major platforms are available in the cloud.
© 2020 IJSRET
1917
International Journal of Scientific Research & Engineering Trends
Volume 6, Issue 3, May-June-2020, ISSN (Online): 2395-566X
is an open architecture for machine learning models and Some of the suitable processors are CPU(central
based on business process and requirements. And it is processing unit), GPU(graphics processing unit), and
possible to build our own architecture as shown in figure TPU(Tensor processing unit) in our application.
2, which will provide more security and privacy in the
long run. 3.4 Storage:
Machine learning requires huge amounts of data storage
to build and run machine learning pipeline. In each new
pipeline cycle, new data adds to the system and it requires
enough storage area. Considering several factors like
machine learning model lifetime and business size
suitable infrastructure storage type should be chosen.
3.5 Networking:
[Link] building blocks for machine learning networking is a blueprint for complete communication in
as a service the architecture because it provides a foundation for
framework design, building, and deploying services.
In this architecture concentrated on low level and high- Standard network infrastructure is required for building
level solutions to the machine learning as a service. long-lasting machine learning as a service architecture.
© 2020 IJSRET
1918
International Journal of Scientific Research & Engineering Trends
Volume 6, Issue 3, May-June-2020, ISSN (Online): 2395-566X
models and later finding suitable solution to a defined 3.7.3 Central module: It is responsible for data
problem. processing, here we build and validate raw data; after
Here, Collecting suitable data for machine learning is a validating raw data, process it into training and testing
primary task in machine learning life cycle because data sets. With those sets, we build predictive models.
is not just a heart but also oil for the machine learning
process.
3.7.4 data serving unit: This unit is responsible for
Data comes from different sources for example general deploying predictive analysis results to the user. We can
bills, medical records, electronic records, images, text, use apache Kafka for model examination and monitoring
audio and video, log files, weblogs, app logs, sensor logs, because Apache Kafka is used for distributed streaming
chat logs and etc. Using all kinds of data and preparing platforms and to publish a stream of records.
for the machine learning model is a complex task.
Building the best machine learning model for any This system contains a flexible interface between all the
business requires almost 70% of the time to spend on modules in the framework.
preparing data for machine learning training.
4. Implementing Machine learning as a service on the
Here, machine learning as a service covers end to end eCommerce platform:
machine learning life cycle. This framework contains ML is useful for e-commerce platform in many ways. It
different modules like central module, data processing finds wide applicability in predicting supply and demand
unit, data serving unit, and data storage unit. for recommendation systems in e-commerce applications.
The goal is to build a ML model for eCommerce fashion
Our approach is to build a framework that handles user clothing application using machine learning as a service.
interactions with machine learning tools. This framework
contains well-defined APIs and using those APIs possible
to access services from any remote location if we operate
it on cloud platform.
Fig.3. Framework for machine learnig as a service. Recommendation system plays a powerful role when it
comes to offering a much more powerful and personalized
In this framework machine learning as a service data experience to users. User likes when companies able to
processing unit is responsible for handling data. It recognize their thoughts. Using ML we can create
receives data from the data storage unit and sends it to the different user recommendations based on user sessions on
central module. The central module is responsible for our application. For example, new customers can see only
training and testing data, later it deploys prepared data to unique products first; we can recommend old customers
the data serving unit. The serving unit, serves the to best offer products based on their interests. short term
prepared analytics from previous phases. These modules user interest patterns become long term success for the
packed together to form a solid architecture for machine recommendation systems. Sequential user logs are useful
learning as a service as shown in figure 3. to derive patterns in user behavior and useful to predict
users next interest. For attracting customers and
An ML workflow contains the relevant data, preparing, promoting products, ML plays a major role in the current
validating and cleaning later training, testing, deploying, market scenario.
serving, and monitoring the model.
We can represent our recommendation system problem at
3.7.1 Data storage unit: In data storage unit user data, a more traditional formalism[8] level. Let say X be a set
validating, training, testing as well as other data assets are of application users and Y be a set of recommendable
stored. items to the user. In contradistinction to matrix
completion complications, predicting values for each y
3.7.2 Data processing unit: It receives historical raw data ∈ Y and for each x ∈ X but in determining an ordered
and currently produced data from different sources and list of items O of length L for each user, where each
merges together and forwards it to the central module. segment of o ∈ O resembling to a segment of y ∈ Y.
For example, we can use apache spark to distribute data
processing tasks on very large data sets.
© 2020 IJSRET
1919
International Journal of Scientific Research & Engineering Trends
Volume 6, Issue 3, May-June-2020, ISSN (Online): 2395-566X
literally after each succession O is a segment of the set of relative intelligibility and in order to generate
all permutations up to length L of the powerset of Y, probabilities of recommendation system. Initially we have
i.e., O ∈ SL (P(Y )). to collect raw data from our fashion clothing application.
final set of possible lists as O*. Let u be a function that
returns a utility score of a given succession O of a user x, In this section we explain about machine learning as a
i.e., u: X×O* → R. service for recommendation systems using the decision
The recommendaton system problem then consists of tree algorithm.
determining the sequence
lx„∈ O* 4.1 Data storage unit :
that magnifies the score for the user, i.e., The log data is automatically produced by systems for
∀x ∈ X, l„x = arg max u(x, o ) activities and action takes place on it. This log data is in
the form of a file. Examples of different log files are
o∈O*
server logs, audit log files, transaction logs, message logs,
Syslog, server Log File, daemon Logs, and swift Logs.
For example recommendation system is learning from
These generated log files from different users collected
utility function u from a given user data. Suppose our user
using log collectors and stores this data in the data storage
data is a data set of T consisting of a segment of user
unit and later sends it to the data processing unit.
actions, where each user action A ∈ T has the number of
characteristics. Here data set D is user log files and where 4.2 Data processing unit:
each user action contains log file and each log file This collected log data should be accurate and informative
contains its own identification number. Here u function is because the type of received data affects the overall
not just for utility scores for individual items but it is for a model performance. Here each log record has its own
complete ordered list of items. information about user activity. In order to handle log
data, it is required huge experience because log data is
Making predictions in real-time recommendations using complex and it„s presentation directly affects
ML requires a huge amount of time and high-end hyperparameters and expected end results. Here
computing resources required. Because the fastest real- knowledge discovery plays a huge role in driving useful
time recommendation prediction system should take the information from log files
shortest time to make useful predictions against user
interest. For example displaying suitable predictive In the data cleaning process removing unrelated and
advertisements to the user, involves plenty of predictive corrupted data that can affect the accuracy of the process.
models within milliseconds. After cleaning data, it should be represented in the
vectorization form because quires could execute on them.
Here input features like user gender and age, purchasing
history and number of clicks per day, how much time Cleaned data should be processed for data integration,
users spend his time on the page, what are the user buying data validation data transformation(example
preferences, all of this information required for model normalization) later pattern analysis grasp the relevant
building and training. For better latency, we have to information from the data for finding variables behavior
update Historical features and input data on regular in the data.
intervals, and each feature should be built and predict
separately in a model. For data processing we can choose After cleaning and preparing data, we should select
apache-spark; For asynchronous, real-time log data suitable variables like user gender, age, marital status,
processing we can use apache storm. All these resources profession, education, average time on the application,
managed with git or svn for regular code reviews and for past buying data, user interested brands and price range
updates. All important attributes need to be stored, etc.
processed and computed.
4.3 Central Module:
Most of the preferred algorithams for recommendation In this module various steps will take place such as
systems are decision tree, Bayesian, Matrix factorization- finding hyperparameters for the model, finalizing
based, Neighbor-based, Neural Network, Rule Learning, algorithm, later training, testing, and model deployment.
Ensemble, Gradient descent-based, Kernel methods,
Clustering, Associative classification, Bandit, Lazy Initially analyze the vectorized data and finding patterns
learning, Regularization methods and Topic Independent in the data. Now, the next step is to choose a suitable
Scoring Algorithm. algorithm for data analysis.
The goal of this recommendation system is to recommend
suitable items to the user based on user interest. c5.0 decision tree classification model is used for
recommendation systems because it contains better
Out of above all algorithms decision tree algorithm is efficiency, more accurate, and less overfitting problems.
chosen for recommendation systems because of its
© 2020 IJSRET
1920
International Journal of Scientific Research & Engineering Trends
Volume 6, Issue 3, May-June-2020, ISSN (Online): 2395-566X
Easy to handle missing values in the fashion clothing [6]. V. M. Megler and D. Maier. Are Data Sets Like
recommendation application data set. Documents: Evaluating Similarity- Based Ranked
Using c5.0 classification to train and test our model using Search over Scientific Data. In TKDE, 2015.
available data. [7]. Devlin, K.; Rosenberg, D. Language at Work:
Applying c5.0 decision tree classifier to find out the Analyzing Communication Breakdown in the
interests that are similar clothes that the user ordered Workplace to Inform Systems Design; CSLI
earlier from the application. Using collaborative filtering Publications: Stanford, CA, USA, 1996.
[21] on the user to find out the similar users that they have [8]. MASSIMO QUADRANA, ContentWise PAOLO
similar interests comparted to the targeted users and find CREMONESI, Politecnico di Milano DIETMAR
out both parties interested in cloths and shortlisted clothes JANNACH, AAU Klagenfurt; Sequence-Aware
to recommend targeted users. Using the upper confidence Recommender Systems; ACM Comput. Surv. 1,
limit recommended those clothes to user on descending 1, Article 1 (February 2018), 35 pages.
order. [9]. Barroso, L. A., and Hoelzle, U. The Datacenter As
a Computer: An Introduction to the Design of
4.4 The serving unit Warehouse-Scale Machines, 1st ed. Morgan and
Once machine learning models are trained, tested, and Claypool Publishers, 2009.
validated. Making these models available to those, who [10]. Chilimbi, T., Suzue, Y., Apacible, J., and
need them. Served models need to be monitored for their Kalyanaraman, K. Project adam: Building an
performance because model degradation should be efficient and scalable deep learning training
detected immediately. system. In 11th USENIX Symposium on
Operating Systems Design and Implementation
(OSDI 14)(Broomfield, CO, 2014), USENIX
IV. CONCLUSION Association.
[11]. A. Halevy, P. Norvig, and F. Pereira. The
Machine learning has the ability to transform the global unreasonable eectiveness of data. IEEE Intelligent
economy using technological innovation and scientific Systems; 2009.
research. Preparing, Developing maintaining and [12]. C. J. Burges. A tutorial on support vector
operating machine learning as a service is a challenging machines for pattern recognition. Data mining
task. This paper presented architecture and framework for and knowledge discovery; 1998.
machine learning as a service and it helps to get a [13]. I. T. Foster, J.-S. Vöckler, M. Wilde, and Y. Zhao.
comprehensive overview of MlaaS. There is a lot of Chimera: AVirtual Data System for Representing,
research required in this domain in order to reduce Querying, and Automating Data Derivation. In
complexity, increase flexibility and security to the users. SSDBM,2002.
[14]. Fionn Murtagh and Keith Devlin ; The
RESOURCES/REFERENCES: Development of Data Science: Implications for
Education, Employment, Research, and the Data
[1]. W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Revolution for Sustainable Development; Big data
Conditional Functional Dependencies for and cognitive computing; Published: 19 June 2018
Capturing Data Inconsistencies. ACM [15]. Pariwat Ongsulee, “Artificial Intelligence,
Transactions on Database Systems; 2008. Machine Learning and Deep Learning”, 2017
[2]. A. Seering, P. Cudre-Mauroux, S. Madden, and M. Fifteenth International Conference on ICT and
Stonebraker. Efficient versioning for scientific Knowledge Engineering
array databases. In ICDE. IEEE, 2012. [16]. Duchi, J., Hazan, E., and Singer, Y. Adaptive
[3]. A. Bhardwaj, S. Bhattacherjee, A. Chavan, A. subgradient methods for online learning and
Deshpande, A. J. Elmore, [Link], and A. G. stochastic optimization. J. Mach. Learn. Res. 12
Parameswaran. DataHub: Collaborative data (July 2011).
science & dataset version management at scale. In [17]. F. Bach, R. Jenatton, J. Mairal, and G. Obozinski,
CIDR, 2015. “Structured Sparsity through Convex
[4]. S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Optimization,” Statistical Science, 2012.
Wrangler: Interactive visual specification of data [18]. Escoufier, Y.; Fichet, B.; Lebart, L.; Hayashi, C.;
transformation scripts. In Proceedings of the Ohsumi, N.; Baba, Y (Ed.) Data Science and Its
SIGCHI Conference on Human Factors in Applications; Academic Press: Tokyo, Japan,
Computing Systems, pages 3363–3372. ACM, 1995.
2011. [19]. Sandesh Gade, Abhimanyu Banerjee, Gautam
[5]. G. D. Simoni and R. Edjlali. Magic Quadrant for Somappa; MLaaS: A Framework for Exposing
Metadata Management Solutions. Technical Machine Learning as a Service on Cloud Platforms
report, Gartner, Inc., 2016. [20]. M. Yesilbudak, S. Sagiroglu, and I. Colak, “A
new approach to very short term wind speed
© 2020 IJSRET
1921
International Journal of Scientific Research & Engineering Trends
Volume 6, Issue 3, May-June-2020, ISSN (Online): 2395-566X
prediction using k-nearest neighbor classification,” [38]. Microsoft Azure; Available online link:
Energy Conversion and Management; 2013. [Link] Initial release
[21]. Sayali D. Jadhav1, H. P. Channe 2 ; Efficient date: February 1, 2010
Recommendation System Using Decision Tree [39]. Amazon Web Services (AWS); Available online
Classifier and Collaborative Filtering; link: [Link]
International Research Journal of Engineering and [40]. Linode; cassandra; Available online link:
Technology ; Aug-2016 [Link]
[22]. J. Polowinski and M. Voigt. Viso: a shared, [41]. Rackspace; Available online link:
formal knowledge base as a foundation for semi- [Link]
automatic infovis systems. In CHI‟13 Extended [42]. DigitalOcean; Available online link:
Abstracts on Human Factors in Computing [Link]
Systems; 2013. [43]. CloudSigma; Available online link:
[23]. M. Vartak, S. Madden, A. G. Parameswaran, and [Link]
N. Polyzotis. SEEDB: automatically generating [44]. Microsoft Azure ML Services; Available online
query visualizations; 2014. link: [Link]
[24]. H. Gonzalez et al. Google fusion tables: web- [45]. Amazon SageMaker; Available online:
centered data management and collaboration. In [Link]
SIGMOD Conference; 2010. [46]. C3 – AI suite; Available online link:
[25]. L. Kolb, A. Thor, and E. Rahm. Dedoop: Ecient [Link]
Deduplication with Hadoop. PVLDB, 2012. [47]. DataRobot; Available online link:
[26]. X. L. Dong and F. Naumann. Data fusion– [Link]
resolving data conflicts for integration. PVLDB, [48]. Deep Cognition;Available online link:
2009. [Link]
[27]. G. Beskales, I. F. Ilyas, and L. Golab. Sampling [49]. Dataiku; Available online link:
the repairs of functional dependency violations [Link]
under hard constraints. Sept. 2010. [50]. EnsoData; Available online link:
[28]. ÁLVARO LÓPEZ GARCÍA, JESÚS MARCO [Link]
DE LUCAS, MARICA ANTONACCI; A cloud- [51]. Excelion; Available online link:
based framework for machine learning workloads [Link]
and applications. IEEE publications [52]. RCM Brain; Available online link:
[29]. V. Nair et al. Learning a hierarchical monitoring [Link]
system for detecting and diagnosing service [53]. CognitiveScale; Available online link:
issues. In KDD, 2015. [Link]
[30]. J. Han et al. Frequent pattern mining: current [54]. Neurala; Available online link:
status and future directions. Data Mining and [Link]
Knowledge Discovery; 2007. [55]. LXD(Linux); Available online link:
[31]. Ii erran li, Eric Chen, Jeremy Hermann, Pusheng [Link]
Zhang, Luming Wang Scaling machine learning [56]. FreeBSD(Unix); Available online link:
as a service; JMLR Conference -2016 [Link] Initial release: 1
[32]. Y. Benjamini and D. Yekutieli. The control of the November 1993
false discovery rate in multiple testing under [57]. docker; License: Binaries: Freemium software as a
dependency. Annals of statistics; 2001. service; Available online link:
[33]. Felden, C., & Chamoni, P. (2007, January). [Link] Initial release date:
Recommender systems based on an active data March 20, 2013
warehouse with text documents. In System [58]. Kubernetes; Available online link:
Sciences, 2007. HICSS 2007. 40th Annual [Link] Stable release: 1.18 / March
Hawaii International Conference on (pp. 168a- 25, 2020;
168a). IEEE.
[34]. Ivens Portugal, Paulo Alencar, Donald Cowan;
The Use of Machine Learning Algorithms in
Recommender Systems: A Systematic Review
[35]. Google Cloud ML Engine; Available online link:
[Link]
[36]. IBM Watson Studio; Available online link:
[Link]/watson/studio
[37]. Google Compute Engine (GCE); Available online
link [Link] Initial
release: June 28, 2012
© 2020 IJSRET
1922