100% found this document useful (1 vote)
4K views8 pages

Weka-: Data Warehousing and Data Mining Lab Manual-Week 9

Here are the descriptions of the data sets mentioned in the document: Data Set: diabetes Multivariate, Real, Medical Domain Number of Instances: 768 Number of Attributes: 8 Missing Values?: Yes Associated Tasks: Classification Characteristics: This dataset is about diagnosing diabetes based on diagnostic measurements included in the dataset. Data Set: glass Multivariate, Real, Materials Domain Number of Instances: 214 Number of Attributes: 9 Missing Values?: No Associated Tasks: Classification Characteristics: This dataset contains glass identification data where the goal is to identify the type of glass from physical measurements. Data Set: iris Multivariate, Real

Uploaded by

pakizaamin436
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
100% found this document useful (1 vote)
4K views8 pages

Weka-: Data Warehousing and Data Mining Lab Manual-Week 9

Here are the descriptions of the data sets mentioned in the document: Data Set: diabetes Multivariate, Real, Medical Domain Number of Instances: 768 Number of Attributes: 8 Missing Values?: Yes Associated Tasks: Classification Characteristics: This dataset is about diagnosing diabetes based on diagnostic measurements included in the dataset. Data Set: glass Multivariate, Real, Materials Domain Number of Instances: 214 Number of Attributes: 9 Missing Values?: No Associated Tasks: Classification Characteristics: This dataset contains glass identification data where the goal is to identify the type of glass from physical measurements. Data Set: iris Multivariate, Real

Uploaded by

pakizaamin436
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 8

Data Warehousing and Data Mining

Lab Manual- Week 9

WEKA-
DATA MINING AND MACHINE LEARNING TOOL

09 Nov, 2020

What is WEKA?
 Waikato Environment for Knowledge Analysis
 It’s a data mining/machine learning tool
developed by Department of Computer Science,
University of Waikato, New Zealand.
 Weka is also a bird found only on the islands of
New Zealand.

Download and Install WEKA


 Website:
http://www.cs.waikato.ac.nz/~ml/weka/index.html
 Support multiple platforms (written in java):
 Windows, Mac OS X and Linux
The GUI Chooser consists of four buttons—one for each of the
four major Weka applications—and four menus.

The buttons can be used to start the following applications:


Explorer An environment for exploring data with WEKA (the rest of this
Documentation deals with this application in more detail).
Experimenter An environment for performing experiments and conducting
statistical tests between learning schemes.
Knowledge Flow This environment supports essentially the same functions as the
Explorer but with a drag-and-drop interface. One advantage is that it supports
incremental learning.
SimpleCLI Provides a simple command-line interface that allows direct execution
of WEKA commands.

Explorer
The Graphical user interface
Section Tabs
At the very top of the window, just below the title bar, is a row of tabs. When the
Explorer is first started only the first tab is active; the others are grayed out. This is
because it is necessary to open (and potentially pre-process) a data set before
starting to explore the data.
The tabs are as follows:
1. Preprocess. Choose and modify the data being acted on.
2. Classify. Train & test learning schemes that classify or perform regression
3. Cluster. Learn clusters for the data.
4. Associate. Learn association rules for the data.
5. Select attributes. Select the most relevant attributes in the data.
6. Visualize. View an interactive 2D plot of the data.
Once the tabs are active, clicking on them flicks between different screens, on
which the respective actions can be performed. The bottom area of the window
(including the status box, the log button, and the Weka bird) stays visible
regardless of which section you are in. The Explorer can be easily extended with
custom tabs.

Some Sample Weka Data Sets


Below are some sample WEKA data sets, in arff format.
• diabetes.arff
• glass.arff
• iris.arff
• supermarket.arff
• vote.arff
• weather.arff
• weather.nominal.arff

Load a data set (ex.Weather dataset,Iris dataset,etc.)


Ans: Steps for load the Weather data set.
1. Open WEKA Tool.
2. Click on WEKA Explorer.
3. Click on open file button.
4. Choose WEKA folder in C drive.
5. Select and Click on data option button.
6. Choose Weather.arff file and Open the file.

Steps for load the Iris data set.


1. Open WEKA Tool.
2. Click on WEKA Explorer.
3. Click on open file button.
4. Choose WEKA folder in C drive.
5. Select and Click on data option button.
6. Choose Iris.arff file and Open the file.
Lab Task:
1. Make Employee Table with Data Mining Tool WEKA.

Steps:
1) Open Start  Programs  Accessories  Notepad
2) Type the following training data set with the help of Notepad for Employee Table.

@relation employee
@attribute name {x,y,z,a,b}
@attribute id numeric
@attribute salary {low,medium,high}
@attribute exp numeric
@attribute gender {male,female}
@attribute phone numeric

@data
x,101,low,2,male,250311
y,102,high,3,female,251665
z,103,medium,1,male,240238
a,104,low,5,female,200200
b,105,high,2,male,240240
3) After that the file is saved with .arff file format.
4) Minimize the arff file and then open Start  Programs  weka-3-4.
5) Click on weka-3-4, then Weka dialog box is displayed on the screen.
6) In that dialog box there are four modes, click on explorer.
7) Explorer shows many options. In that click on ‘open file’ and select the arff file
8) Click on edit button which shows employee table on weka.

Apply following Pre-Processing techniques to the training data set of Employee Table

1) Add
2) Remove
3) Normalization

Add  Pre-Processing Technique:

Procedure:
1) Start  Programs  Weka-3-4  Weka-3-4
2) Click on explorer.
3) Click on open file.
4) Select Employee.arff file and click on open.
5) Click on Choose button and select the Filters option.
6) In Filters, we have Supervised and Unsupervised data.
7) Click on Unsupervised data.
8) Select the attribute Add.
9) A new window is opened.
10) In that we enter attribute index, type, data format, nominal label values for Address.
11) Click on OK.
12) Press the Apply button, then a new attribute is added to the Employee Table.
13) Save the file.
14) Click on the Edit button, it shows a new Employee Table on Weka.

Remove  Pre-Processing Technique:

Procedure:
1) Start  Programs  Weka-3-4  Weka-3-4
2) Click on explorer.
3) Click on open file.
4) Select Employee.arff file and click on open.
5) Click on Choose button and select the Filters option.
6) In Filters, we have Supervised and Unsupervised data.
7) Click on Unsupervised data.
8) Select the attribute Remove.
9) Select the attributes salary, gender to Remove.
10) Click Remove button and then Save.
11) Click on the Edit button, it shows a new Employee Table on Weka.

Normalize  Pre-Processing Technique:


Procedure:
1) Start  Programs  Weka-3-4  Weka-3-4
2) Click on explorer.
3) Click on open file.
4) Select Employee.arff file and click on open.
5) Click on Choose button and select the Filters option.
6) In Filters, we have Supervised and Unsupervised data.
7) Click on Unsupervised data.
8) Select the attribute Normalize.
9) Select the attributes id, experience, phone to Normalize.
10) Click on Apply button and then Save.
11) Click on the Edit button, it shows a new Employee Table with normalized values on Weka.

2. We are familiar with the spam dataset by now. To load the


spam dataset select the preprocess tab and then select Open
file ....
1. Go to the Cluster tab: Select the SimpleKMeans clusterer, bring up its
options window and set numClusters to 2.
2. In the Cluster mode panel, select Classes to clusters evaluation and
hit Start. This option evaluates clusters with respect to a class. More
specifically, in the mode Classes to clusters evaluation Weka first
ignores the class attribute and generates the clustering. Then during
the test phase it assigns classes to the clusters, based on the majority
value of the class attribute within each cluster. Then it computes the
classification error, based on this assignment and also shows the
corresponding confusion matrix.
3. Look at the Classes to Clusters confusion matrix. Clearly, we don't
have a perfect correspondence between classes and clusters.
a. How successful has the clustering been in this regard?
b. Looking at each class individually, can you spot the particular
class that is well identified by the clustering? Classes that are
poorly identified?
c. Which classes are mostly confused with each other?
Visualize the cluster assignments. To do this, right-click on the cluster in
the Result list panel and select Visualize cluster assignments. Plot Class
against Cluster. All the data points will lie on top of each other, so
increase the Jitter slide bar to about half way to add random noise to
each point. This allows us to see more clearly where the bulk of the
datapoints lies. In this scatter plot each row represents a class and each
column a cluster.
3. You are required to provide the description of data sets
provided in the file named “Data Sets”. You need to visit
the UCI Machine learning website to find the description
of the data sets. The sample is provided here. You have
to provide description of each data set like this sample.
Data Set Multivariate, Number of
7200 Area: Life
Characteristics: Domain-Theory Instances:
Attribute Number of Date
Categorical, Real 21 01/01/1987
Characteristics: Attributes: Donated
Missing Number of
Associated Tasks: Classification N/A 245242
Values? Web Hits:

 If there are missing values in the data set, Use Weka for
the imputation (to fill the missing values) of missing
values.
 Use Weka to discretize the data sets if your classifier
works on discretized data sets.

You might also like