0% found this document useful (0 votes)
15 views6 pages

Process-Phase (Data Cleaning Features and Techniques (Lab-Topics)

fgjfjfj

Uploaded by

rsevrse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Process-Phase (Data Cleaning Features and Techniques (Lab-Topics)

fgjfjfj

Uploaded by

rsevrse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Topics for Lab:

1. Data Cleaning Tools and Techniques


2. Data Cleaning Features in Spreadsheet
3. Sorting and Filtering
4. Data Cleaning Verifying and Reporting Results
5. Capturing Cleaning Changes

1)Data Cleaning Tools and Techniques


https://www.bing.com/search?q=data%20cleaning%20tools%20and
%20techniques&qs=SYC&showconv=1&sendquery=1&FORM=ASCHT2&sp=6&lq=0&fbclid=
IwAR34znhYKs4VRC3Hd08InlMAhc70qN2h7LwkEVom9Iuefw_H9pwKfX3Uvec

All Data collected must be subjected to Data cleaning


through Data processing.

Data preprocessing involves identifying and correcting or


removing inaccurate, incomplete, or irrelevant data from
a dataset.

Techniques in Cleaning Data:


1) Removing duplicates: This technique involves identifying
and removing identical records from a dataset.
2) Removing irrelevant data: This involves identifying and
removing data that is not relevant to the analysis.
3) Standardizing capitalization: This is converting all text to
a consistent case format.
4) Converting data types: converting data from one type to
another, such as converting text to numbers.
5) Clearing formatting: removing any formatting from the
data, such as bold or italicized text.
6) Fixing errors: identifying and correcting errors in the data.
7) Language translation: translating data from one language
to another.
8) Handling missing values: identifying and handling missing
data in the dataset.

Most Popular Tools Available in Cleaning Data:


(through open source and SaaS Tools)

1) OpenRefine: A free, open-source tool for working with


messy data.
2) RapidMiner: A data science platform that includes data
cleaning and preparation tools.
3) Talend Data Preparation: A cloud-based data preparation
tool that allows users to clean and prepare data for analysis.
4) Data Ladder Cleansing Tool: A data cleaning tool that
uses machine learning algorithms to identify and correct
errors in data.
5) Rattle: A free, open-source data mining tool that includes
data cleaning and preparation tools.

2)Data Cleaning Features in


Spreadsheet
Microsoft Excel provides several features to help clean data, such
as:

1) Fill data automatically in worksheet cells: This feature


allows filling of data automatically in worksheet cells based
on patterns in the data.
2) Create and format tables: Creating and formatting tables
in your spreadsheet, which can make it easier to work with
large datasets.
3) Create a macro: Automating repetitive tasks in your
spreadsheet.
4) Check spelling and grammar: Checking the spelling and
grammar of your data.
5) Filter for unique values or remove duplicate values:
Filtering for unique values or remove duplicate values in
your data.
6) Find and replace text: Finding and replacing text in your
data.
7) Change the case of text: Changing the case of text in your
data.
8) Remove spaces and nonprinting characters from text:
Involves removal of spaces and nonprinting characters from
text in your data.
9) Fix numbers and number signs: Fixing numbers and
number signs in your data.
10) Fix dates and times: Fixing dates and times in your
data.

3) Sorting and Filtering


Sorting and filtering are powerful techniques to manage and
analyze data in spreadsheets.

 Sorting allows arranging data in a specific order, revealing


patterns and trends.
 Filtering helps in focusing on specific subsets of data.
o Advanced filtering techniques provide even greater
control over the data analysis.
o Includes
 custom number and text filters,
 wildcards,
 date filters, and
 filtering by color or icon

 Combining sorting and filtering can help draw insights and


allows data-driven decisions quickly, making these skills
essential for anyone working with spreadsheets.
Other Special Features of Sorting and Filtering:
 Sorting displays data in a specific order, often to reveal
patterns, trends, or relationships.
 Filtering displays only the rows that meet (your) specific
criteria, effectively hiding the rows that do not match
your conditions.
 Both Sorting and Filtering allow you to focus on specific
subsets of your data, making it easier to analyze and
draw insights

4) Data Cleaning, Verifying and Reporting Results

Data cleaning is the process of identifying and resolving


potential data inconsistencies or errors to improve the
quality of your data.

 It involves reviewing, analyzing, detecting,


modifying, or removing ‘dirty’ data to make your
dataset ‘clean’ 1.

Data validation at the time of data entry or collection


helps you minimize the amount of data cleaning you’ll
need to do.

 After data collection, you can use data


standardization and data transformation to clean
your data 1.

Data verification is the process of ensuring that the data


is accurate, complete, and consistent. It involves checking
the data for errors, inconsistencies, and missing values.
 Data verification is an essential step in ensuring that
the data is reliable and can be used for analysis and
decision-making 2.
Reporting results is the process of presenting the
findings of your data analysis. It involves summarizing the
data, identifying patterns and trends, and drawing
conclusions.
 The goal of reporting results is to communicate the
insights gained from the data analysis to
stakeholders in a clear and concise manner
5) Capturing Cleaning Changes

The Benefits of Effective Data Cleansing


https://www.techtarget.com/searchdatamanagement/definition/data-scrubbing

When data cleaning is done well, data cleansing provides


benefits to data management, business or organization in
general.

Benefits of Effective Data Cleansing


1) Improved decision-making.

With more accurate data, analytics applications can


produce better results.

That enables organizations to make more informed


decisions on business strategies and operations, as
well as things like patient care and government
programs.

2) More effective marketing and sales.


Customer data is often wrong, inconsistent or out of
date (many customers, by nature, don’t mind about
data integrity or quality, and just provide whatever
comes to mind. Many customers hate being asked and
disturbed).
Cleaning up the data in customer relationship
management and sales systems is very important
because it helps improve the effectiveness of
marketing campaigns and sales efforts.
3) Better operational performance.

Clean, high-quality data helps organizations avoid


inventory shortages, delivery snafus and other
business problems that can result in higher costs,
lower revenues and even damaging relationships with
customers.

4) Increased use of data.

Data has become a key corporate asset, but it can't


generate business value if it isn't used.
By making data more trustworthy, data cleansing helps
convince business managers and workers to rely on it
as part of their jobs.

5) Reduced data costs.

Data cleansing stops data errors and issues from


further propagating in systems and analytics
applications.

In the long term, that saves time and money, because


IT (Information Technology) and data management
teams do not have to continue fixing the same errors
in data sets.

You might also like