Data Mining

BY :- ANUKARSH CHAUDHARY
Topic covered –
 What is data mining ?

 Features of data mining
 Advantages of data mining
 Disadvantages of data mining
 Stages of data mining
 Careers in data mining
 Steps of data mining
What is data mining
Data mining is the process of sorting through large data sets to identify patterns
and relationships that can help solve business problems through data analysis.
Data mining techniques and tools help enterprises to predict future trends and
make more informed business decisions.
The process of data mining relies on the effective implementation of data
collection, warehousing and processing. Data mining can be used to describe a
target data set, predict outcomes, detect fraud or security issues, learn more
about a user base, or detect bottlenecks and dependencies. It can also be
performed automatically or semiautomatically.
Features of data mining
 Focus attribute
 Properties that depend only on a single focus component, for example, store or day, are the simplest
because their values are expressions over values that are already contained in the original database
tables.
 Aggregation
 Typically, many properties are the result of an aggregation. The level of individual purchases is too fine-
grained for prediction, so the properties of many purchases must be aggregated to a meaningful focus
level.
 Discretization
 Some data mining algorithms require categorical input instead of numeric input. In this case, the data
must be preprocessed so that values in certain numeric ranges are mapped to discrete values.
 Value mapping
 Similar to the discretization of numeric features you can assign new values to discrete feature values.
Advantages of Data Mining
• Profitability and efficiency:
Data m ining ensures a com pany is collecting and analyzing reliable data. It is often a m ore rigid, structured process
that form ally identifies a problem, gathers data related to the problem , and strives to form ulate a solution. Therefore,
data m ining helps a business becom e m ore profitable, m ore efficient, or operationally stronger.
• Wide applications:
Data m ining can look very different across applications, but the overall process can be used with almost any new or
legacy application. Essentially any type of data can be gathered and analyzed, and almost every business problem
that relies on qualifiable evidence can be tackled using data m ining.
• Hidden information and trends:

The end goal of data m ining is to take raw bits of information and determ ine if there is cohesion or correlation
am ong the data. This benefit of data m ining allows a com pany to create value with the information they have on
hand that would otherwise not be overly apparent. Though data m odels can be com plex, they can also yield
fascinating results, unearth hidden trends, and suggest unique strategies.
Disadvantages of Data Mining
• Complexity:
The com plexity of data m ining is one of its greatest disadvantages. Data analytics often requires technical skill sets and
certain software tools. Sm aller com panies m ay find this to be a barrier of entry too difficult to overcom e.
• No guarantees:
Data m ining doesn't always m ean guaranteed results. A com pany m ay perform statistical analysis, m ake conclusions
based on strong data, implement changes, and not reap any benefits. This m ay be due to inaccurate findings, m arket
changes, m odel errors, or inappropriate data pollution . Data m ining can only guide decisions and not ensure outcom es.
• High cost:
There is also a cost com ponent to data m ining. Data tools m ay require costly subscriptions, and som e data m ay be
expensive to obtain. Security and privacy concerns can be pacified, though additional IT infrastructure m ay be costly as
well. Data m ining m ay also be m ost effective when using huge data sets; however, these data sets m ust be stored and
require heavy com putational power to analyze.
Stages of Data Mining
1. Cleaning of Incomplete Data:
The first step to data mining is cleaning incomplete or dirty data in order to maintain the industry standard.
2. Integration of Data:
In the second step, the specialists perform data integration, which refers to analyzing data by combining the sources and sets of
multiple data.
3. Reduction of Data:
Now that the cleaning process is complete, it’s time for the reduction of data so that the quality enhances further. Hence, specialists
take small data and reduce the structure, to sum up, its main message.
4. Transformation of Data:
Every data mining task has its own mining goals, which gets clarified in the fourth step. It’s the phase when the specialists combine all
the preparation data through different methods such as data mapping, normalization, aggregation and others.
5. Data Mining:
Though the entire process is known as data mining, this step specifically includes the mining tasks.
6. Pattern Analysis:
Data mining is a process that finds out the pattern of relationships between multiple data.
7. Sharing Final Report:

Right after the discussion, the specialists usually present their final report that includes every relevant information of the process
including their intelligent insight on the overall business performance and its pattern of problems.
Careers in data mining
Job Description Average Pay
Data Analyst Transform and m anipulate large datasets for $67,350
analysis
Business Intelligence Developer Design and develop strategies to assist $87,120

business users in finding information
Statistician Collect, analyze, and interpret data to $88,240

inform organizational decision-making
Data Engineer Build and m aintain data pipelines $96,690

Data Scientist Analyze large am ounts of com plex data to $101,530
find patterns that benefit an organization
Data Architect Ensure data solutions are built for $132,230

perform ance and design analytics
applications for m ultiple platforms
Machine Learning Engineer Create and m aintain m achine learning $148,310

system s
Steps of Data Mining
Step 1: Understand the Business
Before any data is touched, extracted, cleaned, or analyzed, it is important to understand the underlying entity and the project at hand.
Step 2: Understand the Data

Once the business problem has been clearly defined, it's time to start thinking about data. This includes what sources are available, how
they will be secured and stored, how the information will be gathered, and what the final outcome or analysis may look like.
Step 3: Prepare the Data

Data is gathered, uploaded, extracted, or calculated. It is then cleaned, standardized, scrubbed for outliers, assessed for mistakes, and
checked for reasonableness.
Step 4: Build the Model

With a clean data set in hand, it's time to crunch the numbers. Data scientists use the types of data mining above to search for
relationships, trends, associations, or sequential patterns
Step 5: Evaluate the Results

The data-centered aspect of data mining concludes by assessing the findings of the data model or models.
Step 6: Implement Change and Monitor

The data mining process concludes with management taking steps in response to the findings of the analysis.

Data Mining

Uploaded by

Data Mining

Uploaded by

BY :- ANUKARSH CHAUDHARY

 What is data mining ?

• Hidden information and trends:

7. Sharing Final Report:

Business Intelligence Developer Design and develop strategies to assist $87,120

Statistician Collect, analyze, and interpret data to $88,240

Data Engineer Build and m aintain data pipelines $96,690

Data Architect Ensure data solutions are built for $132,230

Machine Learning Engineer Create and m aintain m achine learning $148,310

Step 2: Understand the Data

Step 3: Prepare the Data

Step 4: Build the Model

Step 5: Evaluate the Results

Step 6: Implement Change and Monitor

You might also like