WEB CONTENT MINING
By
Saumya Aggarwal(0232083107---IT)
Richa Sharma(0732082707---CSE)
WHAT IS WEB MINING…???
Web mining is the application of data mining techniques
to extract knowledge from web data, including web
documents, hyperlinks between documents, usage logs
of web sites, etc.
Data Mining Views
Process Centric Data Centric
WHY WEB MINING..???
The amount of information on the Web is huge and diverse.
Much of the Web information is redundant. The same piece
of information or its variants may appear in many pages.
A Web page typically contains a mixture of many kinds of
information, e.g., main contents , advertisements,
navigation panels, copyright notices, etc.
The Web is dynamic. Information on the Web changes
constantly. Keeping up with the changes and monitoring the
changes are important issues.
Above all, the Web is a virtual society. It is not only about
data, information and services, but also about interactions
among people, organizations and automatic systems.
TAXONOMY IN WEB MINING
Web Mining is a very broad term which has been
classified into three major streams:
Web Content Web Structure Web Usage
Mining Mining Mining
process of extracting process of discovering process of discovering
useful information useful knowledge from interesting usage from the web
the structures and patterns from the web
hyperlinks from the
web.
Next
WEB CONTENT MINING
Web content mining is the process of extracting useful
information from the contents of web documents.
It includes--
Mining
Extraction of data
Integration of knowledge
from Web page contents.
The content data may consist of text, images, audio,video,
or structured records such as lists and tables.
Back
WEB STRUCTURE MINING
Web structure mining is the process of discovering
structure information from the web.
Web graph---
hyperlink
node node
o Categories(based on structure of information)
Hyperlinks Document Structure
Back
WEB USAGE MINING
It discovers interesting usage patterns from web usage
data.
Understand and better serve the needs of web-based
applications.
Usage data captures the identity or origin of web users
and their browsing behaviour at a web site
Classification based on the kind of usage:
Web server logs Application Server Logs Application Level logs
Back
SEARCH ENGINE
Search engine is a software program that searches for sites based on
the words that you designate as search terms.
Search engines look through their own databases of information in
order to find what it is that you are looking for.
“Search engine” is the popular term for an Information Retrieval
(IR) system.
HOW DOES A SEARCH ENGINE WORK
WHAT NEXT…???
Search engine plays important role in accessing the content over the
internet, it fetches the pages requested by the user.
An in depth (comparative) study of the major search engines
today---
Google
Yahoo
Msn
Study of all the information retrieval models that have been
developed so far.
The need for better search engines only increases.
THANK YOU