Data Mining
Data Mining
Topic covered –
Data mining is the process of sorting through large data sets to identify patterns
and relationships that can help solve business problems through data analysis.
Data mining techniques and tools help enterprises to predict future trends and
make more informed business decisions.
The process of data mining relies on the effective implementation of data
collection, warehousing and processing. Data mining can be used to describe a
target data set, predict outcomes, detect fraud or security issues, learn more
about a user base, or detect bottlenecks and dependencies. It can also be
performed automatically or semiautomatically.
Features of data mining
Focus attribute
Properties that depend only on a single focus component, for example, store or day, are the simplest
because their values are expressions over values that are already contained in the original database
tables.
Aggregation
Typically, many properties are the result of an aggregation. The level of individual purchases is too fine-
grained for prediction, so the properties of many purchases must be aggregated to a meaningful focus
level.
Discretization
Some data mining algorithms require categorical input instead of numeric input. In this case, the data
must be preprocessed so that values in certain numeric ranges are mapped to discrete values.
Value mapping
Similar to the discretization of numeric features you can assign new values to discrete feature values.
Advantages of Data Mining
• Profitability and efficiency:
Data m ining ensures a com pany is collecting and analyzing reliable data. It is often a m ore rigid, structured process
that form ally identifies a problem, gathers data related to the problem , and strives to form ulate a solution. Therefore,
data m ining helps a business becom e m ore profitable, m ore efficient, or operationally stronger.
• Wide applications:
Data m ining can look very different across applications, but the overall process can be used with almost any new or
legacy application. Essentially any type of data can be gathered and analyzed, and almost every business problem
that relies on qualifiable evidence can be tackled using data m ining.
• No guarantees:
Data m ining doesn't always m ean guaranteed results. A com pany m ay perform statistical analysis, m ake conclusions
based on strong data, implement changes, and not reap any benefits. This m ay be due to inaccurate findings, m arket
changes, m odel errors, or inappropriate data pollution . Data m ining can only guide decisions and not ensure outcom es.
• High cost:
There is also a cost com ponent to data m ining. Data tools m ay require costly subscriptions, and som e data m ay be
expensive to obtain. Security and privacy concerns can be pacified, though additional IT infrastructure m ay be costly as
well. Data m ining m ay also be m ost effective when using huge data sets; however, these data sets m ust be stored and
require heavy com putational power to analyze.
Stages of Data Mining
1. Cleaning of Incomplete Data:
The first step to data mining is cleaning incomplete or dirty data in order to maintain the industry standard.
2. Integration of Data:
In the second step, the specialists perform data integration, which refers to analyzing data by combining the sources and sets of
multiple data.
3. Reduction of Data:
Now that the cleaning process is complete, it’s time for the reduction of data so that the quality enhances further. Hence, specialists
take small data and reduce the structure, to sum up, its main message.
4. Transformation of Data:
Every data mining task has its own mining goals, which gets clarified in the fourth step. It’s the phase when the specialists combine all
the preparation data through different methods such as data mapping, normalization, aggregation and others.
5. Data Mining:
Though the entire process is known as data mining, this step specifically includes the mining tasks.
6. Pattern Analysis:
Data mining is a process that finds out the pattern of relationships between multiple data.