Crisp Methodology For Erp System
Crisp Methodology For Erp System
acquires information from and supply information into the fragmented applications operating on a universal
computing platform. Information in large business organizations is accumulated on various servers across many
functional units and sometimes separated by geographical boundaries [4]. Such information islands can possibly
service individual organizational units but fail to enhance enterprise wide performance, speed and competence.
In these cases, it is sometimes necessary to gather the scattered data into a single database called a Data
Warehouse (DW), before submitting it to data mining activity. The key objective of an ERP system is to
integrate information and processes from all functional divisions of an organization and merge it for effortless
access and structured workflow [5]. The integration is typically accomplished by constructing a single database
repository that communicates with multiple software applications providing different divisions of an
organization with various business statistics and information.
The Principal aim of Data Mining working in ERP Systems is to process data in Business Information
Warehouse for automation of decision-making and forecast processing. SAP offers a complete data warehousing
environment that simplifies the most challenging task in building a data warehouse – the data capture from ERP
applications and building closed-loop feedback mechanisms with business critical applications [7] [8]. For our
study, Cash Flow process data of Finance Module is used for implementing CRSIP methodology and extract
useful information. The Cash flow statement consists of ERP Data from Operations, Investments, Loans,
Payments & Receipts from branches and Special projects domains.
III. CRISP-DM METHODOLOGY
CRISP (Cross Industry Standard Process for Data Mining), is a data mining process model that describes
commonly used approaches that expert data miners use to tackle business problems [5]. It borrowed ideas from
the most important pre-2000 models and is the groundwork for many later proposals. The CRISP-DM 2.0 Special
Interest Group (SIG) was set up with the aim of upgrading the CRISP-DM model to a new version better suited to
the changes that have taken place in the business arena since the current version was formulated [6]. The CRISP-
DM methodology is described in terms of a hierarchical process model, consisting of sets of tasks described at
four levels of abstraction (from general to specific): phase, generic task, specialized task, and process instance.
CRISP-DM is divided into six phases to be carried out in a DM project as shown in figure 2. Implementation
details in each phase are also given in table I & CRISP-DM objectives are in table II.
IV. IMPLEMENTATION OF CRISP-DM METHODOLOGY
Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase
knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer
can use it. The methodology we have followed for Cash Flow data mining is shown in figure 3 [5] [6] [7].
Cash Flow Data Extraction from SAP ERP System
ETL Operation & InfoCube Updation
Selection of suitable Data Mining Technique
Data Mining Model Development in APD
Evaluation of DM Model Results
Deployement of DM Model
Business Understanding Understanding the project objectives and requirements from a business perspective,
and then converting this knowledge into a data mining problem definition. A
preliminary plan is designed to achieve the objectives.
Data Understanding It starts with an initial data collection and proceeds with activities in order to get
familiar with the data, to identify data quality problems, to discover first insights into
the data, or to detect interesting subsets to form hypotheses for hidden information.
Data Preparation It covers all activities to construct the final dataset (data that will be fed into the
modeling tool(s)) from the initial raw data. Data preparation tasks are likely to be
performed multiple times, and not in any prescribed order. Tasks include table,
record, and attribute selection as well as transformation and cleaning of data for
modeling tools.
Modeling In this phase, various modeling techniques are selected and applied, and their
parameters are calibrated to optimal values. Typically, there are several techniques
for the same data mining problem type. Some techniques have specific requirements
on the form of data. Therefore, stepping back to the data preparation phase is often
needed.
Evaluation At this stage in the project we would have built a model (or models) that appear to
have high quality, from a data analysis perspective. Before proceeding to final
deployment of the model, it is important to thoroughly evaluate the model, and
review the steps executed to construct the model, to be certain it properly achieves
the business objectives. A key objective is to determine if there is some important
business issue that has not been sufficiently considered. At the end of this phase, a
decision on the use of the data mining results should be reached.
Sl. Objective
No
We have used tools available in SAP Business Information Warehouse (BIW) to handle outliers, missing,
inconsistent and duplicate values in the source data [8]. The commonly sought supplementary value from this
DM Model includes preventing fraud, giving marketing advice, seeking profitable customers, predicting sales
and inventory and correcting data during bulk loading of the database, also known as the Extract-Transform-
Load operation (ETL) [9]. Motivation for using DM comes from the value it gives over the competitors and it
almost every time reduces costs, i.e. saves money, if the process is successful (Lukawiecki, 2008) [10].
Two high-level DM goals are prediction and description. The first one tries to find patterns to predict the value
or behavior of some entity in the future and the second one tries to find patterns describing the values or
behavior in a form understandable to humans. These high-level goals are pursued through several DM methods,
for example classification, regression, clustering, summarization, dependency modeling and change, as well as
deviation detection [11]. Each method has many algorithms that can be used to reach the goal, but some
algorithms suit some problem areas better than others. (Fayyad et al., 1996c). The ETL map developed in SAP
BIW workbench for InfoCube Updation is displayed below. This InfoCube is the data source for DM Model, to
be developed later in APD (Analysis Process Designer) workbench of SAP.
APD is a workbench that is used to visualize, transform and deploy data from Business warehouse [12] [13].
APD tool supports KDD process where we can merge and manipulate data sources for complex data mining
requirements
For Regression Model, field SC_Score002 represents predicted score for attribute amount
TABLE III. SOURCE FIELD ATTRIBUTES IN SAP SYSTEM FOR CASH FLOW STATEMENT
Field Description SAP Data Length Default Value
Type
(ZFI_CASH1)
Accounting 0AC_DOC_NO Char 10
Document No
(ZFI_CASH2)
Chart of 0CHRT_ACCTS Char 04
Accounts
Figure 6. Clustering Model: Relative Dominance of each attribute in Overall Influence Chart
Figure 10. Clustering Model for Venodr No Attribute Value Distribution Chart
Figure 11. Attribute Value Chart for G/L Account No in each Cluster
VII. CONCLUSIONS
ERP systems has limited capabilities with regard to, analytics that can answer key business questions, Collective
Intelligence and Actionable insight to quickly respond to market demands with appropriate decisions at each
level. Business analysts rely heavily on query and reporting to provide them with the information they need to
connect the dots between revenues and losses, products and profit ability, financial performance and market
trends and so on. They need a comprehensive query and reporting capability that can tap knowledge from huge
volumes of ERP systems and here CRISP-DM provides a methodology which can be adopted to support these
requirements. With Data Mining Analytics, ERP Systems can facilitate exploration of all types of information
from all angles to assess the current business situation, analyze facts and anticipate tactical and strategic
implications with more advanced, predictive or what-if analysis. The result of a Data Mining project under
CRSIP-DM pedagogy is not just Models but also findings that are important in meeting the objectives of the
business or important in leading to new questions, lines of approach, or side effects. There are diverse subjects
for future work and research, like mapping out more problem regions in ERP systems and descriptive attributes
with CRSIP-DM by exploring more data sets.
REFERENCES
[1] Óscar Marbán, Gonzalo Mariscal and Javier Segovia, “A Data Mining & Knowledge Discovery Process Model”. In Data Mining and
Knowledge Discovery in Real Life Applications, Book edited by: Julio Ponce and Adem Karahoca, , pp. 438-453, February 2009
[2] Lukasz Kurgan and Petr Musilek (2006), “A survey of Knowledge Discovery and Data Mining process models”. The Knowledge
Engineering Review. Volume 21 Issue 1, March 2006, pp 1 - 24, Cambridge University Press, New York, NY, USA
[3] Azevedo, A. and Santos, M. F. (2008),”KDD, SEMMA and CRISP-DM: a parallel overview”. In Proceedings of the IADIS European
Conference on Data Mining 2008, pp 182-185
[4] Sistla Hanumanth Sastry, Prof. M. S. Prasada Babu. ” ERP implementation for Manufacturing Enterprises.” International Journal of
Advanced Research in Computer Science and Software Engineering (IJARCSSE), Vol. 3, Issue 4, pp 18-24, April 2013.
[5] Pete Chapman, Julian Clinton , Randy Kerber et al, “The CRISP-DM User Guide”, 1999
[6] Pete Chapman, Julian "Glintin" Clinton, Randy Kerber, Thomas Khabaza, Thomas Reinartz, Colin Shearer, and Rüdiger Wirth ,
“CRISP-DM 1.0 Step-by-step data mining guide”, 2000
[7] Oscar Marban, Gonzlo Mariscal et al;” A Data Mining & Knowledge Discovery Model”, I-Tech, Vienna, Austria,pp 438, February
2009
[8] Sistla Hanumanth Sastry, Prof. M. S. Prasada Babu. "Implementing a successful Business Intelligence framework for Enterprises."
Journal of Global Research in Computer Science (JGRCS), Vol. 4, No. 3 (2013): pp. 55-59. April 2013.
[9] Han., M.Kamber , “Introduction to Data Mining” Morgan Kaufman Publishers, 2006 ;pp: 429-462
[10] Lior Rokach, Oded Maimon, “DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK” 2010, Springer USA; pp 322-350
[11] Margaret H. Dunham, “Data Mining: Introductory and Advanced Topics”, 2003,Prentice Hall of India; pp: 135-162
[12] Johannes Grabmeier, Andreas Rudolph Data Mining and Knowledge Discovery,” Techniques of Clustering Algorithms in Data
Mining”;Volume: 6, Issue: 1996, Publisher: Springer, Pages: 303-360
[13] Sistla Hanumanth Sastry, Prof. M. S.Prasada Babu, “Cluster Analysis of Material Stock Data of Enterprises”, International Journal
of Computer Information Systems (IJCIS), Vol. 6, Issue 6, pp. 8-19, June 2013.
[14] Pang-Ning Tan, Michael Steinbach, Vipin Kumar, “Introduction to Data Mining”, March 2006.,published by Addision-Wesley , PP
330-340
[15] Hand D., Mannila, H. and Smyth, P., “Principles of Data Mining”, 2001,Prentice Hall of India; pp: 292-305
[16] Sistla Hanumanth Sastry, Prof. M. S. Prasada Babu, ” Analysis of Enterprise Material Procurement Leadtime using Techniques of
Data Mining.”, International Journal of Advanced Research in Computer Science (IJARCS), Vol. 4, Issue 4, pp. 288-301, April
2013.
[17] Mark Hall ,Ian Witten , Eibe Frank, “Data Mining Practical Machine Learning Tools & Techniques”, January 2011 ; Morgan
Kaufmann Publishers; pp: 278-315
[18] Galit Shmueli, Nitin R. Patel ,Peter C. Bruce; “Data Mining for BusinessIntelligence”, 2007, John Wiley & Sons; pp: 220-237
[19] Dorian Pyle, “Data Preparation for Data Mining”, Morgan Kaufmann Publishers,1999, San Francisco, USA; pp: 100-132
[20] Boudaillier.E et al;“Interactive Interpretation of Hierarchical Clustering”; Principles of Data Mining and Knowledge Discovery:
Proceedings of First European Symposium, PKDD’97, Trondheim, Norway June 24-27,1997; pp: 280-288
[21] Ester, M., Kriegel, H.-P., Sander, J. & Xiaowei, X., 1996. “A Density-Based Algorithm for Discovering Clusters in Large Spatial
Databases with Noise”. In Proc. of 2nd International Conference on Knowledge Discovery and Data Mining (KDD '96)., 1996. AAAI
Press.
[22] Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P., 1996b. “From Data Mining to Knowledge Discovery in Databases”. AI Magazine, 17,
pp.37-54.
[23] Kurgan, L.A. & Musilek, P., 2006. A survey of Knowledge Discovery and Data Mining process models. The Knowledge Engineering
Review, 21(1), pp.1-24.
[24] Sistla Hanumanth Sastry, Prof. M. S.Prasada Babu, “Performance evaluation of clustering Algorithms”, International Journal of
Computational Science and Information Technology, Unpublished.
[25] Shtub, A., 2002. “Enterprise Resource Planning (ERP) : the dynamics of operations management”. Boston: Kluwer Academic
AUTHORS PROFILE
S.Hanumanth Sastry (Corresponding Author) Senior Manager (ERP) has implemented SAP-BI Solutions for
Steel Industry. He holds M.Tech (Computer Science) from NIELIT, New Delhi and MBA (Operations
Management) from IGNOU, New Delhi. His research interests include ERP systems, Data Mining, Business
Intelligence and Corporate Performance Management. He is pursuing PhD (Computer Science) from Andhra
University, Visakhapatnam (INDIA).
Prof. M.S. Prasad Babu obtained his Ph.D. degree from Andhra University in 1986. He was the Head of the
Department of the Department of Computer Science & Systems Engineering, Andhra University from 2006-09.
Presently he is the Chairman, Board of Studies of Computer Science & Systems Engineering. He received the
ISCA Young Scientist Award at the73rd Indian Science Congress in 1986.