Semantic Web Fundamentals Explained
Semantic Web Fundamentals Explained
Introduction
3
… an intermediate one …
12
… or this one
17
… on flickr …
19
… on Google …
20
… Dopplr,
28
… Twine,
29
… LinkedIn,
30
A “mashup” example:
36
was not…
43
• It is that simple…
• Of course, the devil is in the details
• a common model has to be provided for machines to
describe, query, etc, the data and their connections
• the “classification” of the terms can become very complex
for specific knowledge areas: this is where ontologies,
thesauri, etc, enter the game…
53
In what follows…
• We will use a simplistic example to introduce the
main technical concepts
• The details will be for later during the course
54
6 ID Auteur
7 ISBN-0-00-651409-X A12
11 Nom
12 Ghosh, Amitav
13 Besse, Christianne
59
nd
2 : export your second set of data
60
rd
3 : start merging your data
61
rd
3 : start merging your data (cont.)
62
rd
3 : merge identical resources
63
Is that surprising?
• It may look like it but, in fact, it should not be…
• What happened via automatic means is done every
day by Web users!
• The difference: a bit of extra rigour so that
machines could do this, too
72
RDF triples
• Let us begin to formalize what we did!
• we “connected” the data…
• but a simple connection is not enough… data should be
named somehow
• hence the RDF Triples: a labelled connection between two
resources
77
(<http://…isbn…6682>,
(<http://…isbn…6682>, <http://…/original>,
<http://…/original>, <http://…isbn…409X>)
<http://…isbn…409X>)
<rdf:Description
<rdf:Description rdf:about="http://…/isbn/2020386682">
rdf:about="http://…/isbn/2020386682">
<f:titre
<f:titre xml:lang="fr">Le palais
xml:lang="fr">Le palais des
des mirroirs</f:titre>
mirroirs</f:titre>
<f:original rdf:resource="http://…/isbn/000651409X"/>
<f:original rdf:resource="http://…/isbn/000651409X"/>
</rdf:Description>
</rdf:Description>
<http://…/isbn/2020386682>
<http://…/isbn/2020386682>
f:titre
f:titre "Le
"Le palais
palais des
des mirroirs"@fr
mirroirs"@fr ;;
f:original
f:original <http://…/isbn/000651409X>
<http://…/isbn/000651409X> ..
81
“Internal” nodes
• Consider the following statement:
• “the publisher is a «thing» that has a name and an address”
• Until now, nodes were identified with a URI. But…
• …what is the URI of «thing»?
82
<http://…/isbn/2020386682>
<http://…/isbn/2020386682> a:publisher
a:publisher _:A234.
_:A234.
_:A234 a:p_name "HarpersCollins".
_:A234 a:p_name "HarpersCollins".
Same in Turtle
<http://…/isbn/000651409X>
<http://…/isbn/000651409X> a:publisher
a:publisher [[
a:p_name
a:p_name "HarpersCollins";
"HarpersCollins";
……
].
].
85
Jena example
//
// create
create aa model
model
Model
Model model=new ModelMem();
model=new ModelMem();
Resource
Resource subject=model.createResource("URI_of_Subject")
subject=model.createResource("URI_of_Subject")
//
// 'in' refers to
'in' refers to the
the input
input file
file
model.read(new InputStreamReader(in));
model.read(new InputStreamReader(in));
StmtIterator
StmtIterator iter=model.listStatements(subject,null,null);
iter=model.listStatements(subject,null,null);
while(iter.hasNext())
while(iter.hasNext()) {{
st
st == iter.next();
iter.next();
pp == st.getProperty();
st.getProperty();
oo == st.getObject();
st.getObject();
do_something(p,o);
do_something(p,o);
}}
88
Merge in practice
• Environments merge graphs automatically
• e.g., in Jena, the Model can load several files
• the load merges the new statements automatically
89
Courtesy of Nigel Wilkinson, Lee Harland, Pfizer Ltd, Melliyal Annamalai, Oracle (SWEO Case Study)
90
Classes, resources, …
• Think of well known traditional ontologies or
taxonomies:
• use the term “novel”
• “every novel is a fiction”
• “«The Glass Palace» is a novel”
• etc.
• RDFS defines resources and classes:
• everything in RDF is a “resource”
• “classes” are also resources, but…
• …they are also a collection of possible resources (i.e.,
“individuals”)
• “fiction”, “novel”, …
93
<rdf:Description
<rdf:Description rdf:about="http://…/isbn/000651409X">
rdf:about="http://…/isbn/000651409X">
<rdf:type
<rdf:type rdf:resource="http://…/bookSchema.rdf#Novel"/>
rdf:resource="http://…/bookSchema.rdf#Novel"/>
</rdf:Description>
</rdf:Description>
96
Inferred properties
(<http://…/isbn/000651409X>
(<http://…/isbn/000651409X> rdf:type
rdf:type #Fiction)
#Fiction)
If:
If:
uuu
uuu rdfs:subClassOf
rdfs:subClassOf xxx
xxx ..
vvv
vvv rdf:type
rdf:type uuu
uuu ..
Then
Then add:
add:
vvv
vvv rdf:type
rdf:type xxx
xxx ..
99
Properties
• Property is a special class (rdf:Property)
• properties are also resources identified by URI-s
• There is also a possibility for a “sub-property”
• all resources bound by the “sub” are also bound by the other
• Range and domain of properties can be specified
• i.e., what type of resources serve as object and subject
100
• In Turtle:
:title
:title
rdf:type
rdf:type rdf:Property;
rdf:Property;
rdfs:domain
rdfs:domain :Fiction;
:Fiction;
rdfs:range rdfs:Literal.
rdfs:range rdfs:Literal.
101
<http://…/isbn/000651409X>
<http://…/isbn/000651409X> rdf:type
rdf:type :Fiction
:Fiction ..
102
Literals
• Literals may have a data type
• floats, integers, booleans, etc, defined in XML Schemas
• full XML fragments
• (Natural) language can also be specified
103
<http://…/isbn/000651409X>
<http://…/isbn/000651409X>
:page_number
:page_number "543"^^xsd:integer
"543"^^xsd:integer ;;
:publ_date
:publ_date "2000"^^xsd:gYear
"2000"^^xsd:gYear ;;
:price
:price "6.99"^^xsd:float
"6.99"^^xsd:float ..
104
Michael Grove, Clark & Parsia, LLC, and Andrew Schain, NASA, (SWEO Case Study)
106
Simple approach
• Write RDF/XML or Turtle “manually”
• In some cases that is necessary, but it really does
not scale…
108
Extract RDF
• Use intelligent “scrapers” or “wrappers” to extract a
structure (hence RDF) from a Web pages or XML
files…
• … and then generate RDF automatically (e.g., via
an XSLT script)
110
GRDDL
• The transformation itself has to be provided for
each set of conventions
• A more general syntax is defined for XML formats
in general (e.g., via the namespace document)
• a method to get data in other formats to RDF (e.g., XBRL)
112
RDFa
• RDFa extends (X)HTML a bit by:
• defining general attributes to add metadata to any elements
• provides an almost complete “serialization” of RDF in
XHTML
• It is a bit like the microformats/GRDDL approach
but fully generic
114
RDFa example
• For example:
<div
<div about="http://uri.to.newsitem">
about="http://uri.to.newsitem">
<span
<span property="dc:date">March
property="dc:date">March 23,
23, 2004</span>
2004</span>
<span
<span property="dc:title">Rollers hit casino
property="dc:title">Rollers hit casino for
for £1.3m</span>
£1.3m</span>
By
By <span property="dc:creator">Steve Bird</span>. See
<span property="dc:creator">Steve Bird</span>. See
<a
<a href="http://www.a.b.c/d.avi"
href="http://www.a.b.c/d.avi" rel="dcmtype:MovingImage">
rel="dcmtype:MovingImage">
also video footage</a>…
also video footage</a>…
</div>
</div>
London Gazette
117
Linking Data
120
dbpedia:Amsterdam
dbpedia:Amsterdam
dbterm:officialName
dbterm:officialName “Amsterdam”
“Amsterdam” ;;
dbterm:longd
dbterm:longd “4”
“4” ;;
dbterm:longm
dbterm:longm “53”
“53” ;;
dbterm:longs
dbterm:longs “32”
“32” ;;
...
...
dbterm:leaderTitle
dbterm:leaderTitle “Mayor”
“Mayor” ;;
dbterm:leaderName
dbterm:leaderName dbpedia:Job_Cohen
dbpedia:Job_Cohen ;;
...
...
dbterm:areaTotalKm
dbterm:areaTotalKm “219”
“219” ;;
...
...
dbpedia:ABN_AMRO
dbpedia:ABN_AMRO
dbterm:location
dbterm:location dbpedia:Amsterdam
dbpedia:Amsterdam ;;
...
...
123
<http://sws.geonames.org/2759793>
<http://sws.geonames.org/2759793>
owl:sameAs
owl:sameAs <http://dbpedia.org/resource/Amsterdam>
<http://dbpedia.org/resource/Amsterdam>
wgs84_pos:lat
wgs84_pos:lat “52.3666667”
“52.3666667” ;;
wgs84_pos:long
wgs84_pos:long “4.8833333”
“4.8833333” ;;
geo:inCountry
geo:inCountry <http://www.geonames.org/countries/#NL>
<http://www.geonames.org/countries/#NL> ;;
...
...
• Returns:
[[<..49X>,33,£], [<..49X>,50,€], [<..6682>,60,€],
[<..6682>,78,$]]
137
Pattern constraints
SELECT
SELECT ?isbn
?isbn ?price
?price ?currency
?currency ## note:
note: not
not ?x!
?x!
WHERE
WHERE { ?isbn a:price ?x. ?x rdf:value ?price. ?x
{ ?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency
p:currency ?currency.
?currency.
FILTER(?currency ==
FILTER(?currency == € }€ }
Ontologies
(OWL)
143
Ontologies
• RDFS is useful, but does not solve all possible
requirements
• Complex applications may want more possibilities:
• characterization of properties
• identification of objects with different URI-s
• disjointness or equivalence of classes
• construct classes, not only name them
• can a program reason about some terms? E.g.:
• “if «Person» resources «A» and «B» have the same
«foaf:email» property, then «A» and «B» are identical”
• etc.
144
Ontologies (cont.)
• The term ontologies is used in this respect:
“defines
“defines the
the concepts
concepts and
and relationships
relationships used
used to
to describe
describe
and
and represent
represent an
an area
area of
of knowledge”
knowledge”
OWL is complex…
• OWL is a large set of additional terms
• We will not cover the whole thing here…
147
Term equivalences
• For classes:
• owl:equivalentClass: two classes have the same
individuals
• owl:disjointWith: no individuals in common
• For properties:
• owl:equivalentProperty
• remember the a:author vs. f:auteur
• owl:propertyDisjointWith
• For individuals:
• owl:sameAs: two URIs refer to the same concept
(“individual”)
• owl:differentFrom: negation of owl:sameAs
148
Connecting to French…
149
<http://dbpedia.org/resource/Amsterdam>
<http://dbpedia.org/resource/Amsterdam>
owl:sameAs
owl:sameAs <http://sws.geonames.org/2759793>;
<http://sws.geonames.org/2759793>;
Property characterization
• In OWL, one can characterize the behaviour of
properties (symmetric, transitive, functional, inverse
functional…)
• One property may be the inverse of another
• OWL also separates data and object properties
• “datatype property” means that its range are typed literals
151
Classes in OWL
• In RDFS, you can subclass existing classes…
that’s all
• In OWL, you can construct classes from existing
ones:
• enumerate its content
• through intersection, union, complement
• Etc
153
ex:Person
ex:Person rdf:type
rdf:type owl:Class.
owl:Class.
<uri-for-Amitav-Ghosh>
<uri-for-Amitav-Ghosh>
rdf:type
rdf:type owl:Thing;
owl:Thing;
rdf:type
rdf:type owl:Person ..
owl:Person
154
:£
:£ rdf:type
rdf:type owl:Thing.
owl:Thing.
:€
:€ rdf:type
rdf:type owl:Thing.
owl:Thing.
:$
:$ rdf:type
rdf:type owl:Thing.
owl:Thing.
:Currency
:Currency
rdf:type
rdf:type owl:Class;
owl:Class;
owl:oneOf
owl:oneOf (:€
(:€ :£
:£ :$).
:$).
:Novel
:Novel rdf:type
rdf:type owl:Class.
owl:Class.
:Short_Story
:Short_Story rdf:type owl:Class.
rdf:type owl:Class.
:Poetry
:Poetry rdf:type
rdf:type owl:Class.
owl:Class.
:Literature rdf:type owl:Class;
:Literature rdf:type owl:Class;
owl:unionOf
owl:unionOf (:Novel
(:Novel :Short_Story
:Short_Story :Poetry).
:Poetry).
For example…
If:
:Novel
:Novel rdf:type
rdf:type owl:Class.
owl:Class.
:Short_Story
:Short_Story rdf:type owl:Class.
rdf:type owl:Class.
:Poetry
:Poetry rdf:type
rdf:type owl:Class.
owl:Class.
:Literature rdf:type owl:Class;
:Literature rdf:type owl:Class;
owl:unionOf
owl:unionOf (:Novel
(:Novel :Short_Story
:Short_Story :Poetry).
:Poetry).
<myWork>
<myWork> rdf:type
rdf:type :Novel
:Novel ..
Restrictions formally
• Defines a class of type owl:Restriction with a
• reference to the property that is constrained
• definition of the constraint itself
• One can, e.g., subclass from this node when
defining a particular class
:Listed_Price
:Listed_Price rdfs:subClassOf
rdfs:subClassOf [[
rdf:type
rdf:type owl:Restriction;
owl:Restriction;
owl:onProperty
owl:onProperty p:currency;
p:currency;
owl:allValuesFrom
owl:allValuesFrom :Currency.
:Currency.
].
].
162
Possible usage…
If:
:Listed_Price
:Listed_Price rdfs:subClassOf
rdfs:subClassOf [[
rdf:type
rdf:type owl:Restriction;
owl:Restriction;
owl:onProperty
owl:onProperty p:currency;
p:currency;
owl:allValuesFrom
owl:allValuesFrom :Currency.
:Currency.
].
].
:price
:price rdf:type
rdf:type :Listed_Price
:Listed_Price ..
:price
:price p:currency
p:currency <something>
<something> ..
Other restrictions
OWL “species”
• OWL species comes to the fore:
• restricting which terms can be used and under what
circumstances (restrictions)
• if one abides to those restrictions, then simpler inference
engines can be used
• They reflect compromises: expressibility vs.
implementability
166
OWL Full
• No constraints on any of the constructs
• owl:Class is just syntactic sugar for rdfs:Class
• owl:Thing is equivalent to rdfs:Resource
• this means that:
• Class can also be an individual, a URI can denote a property
as well as a Class
• e.g., it is possible to talk about class of classes, apply properties
on them
• etc
• etc.
• Extension of RDFS in all respects
• But: no system may exist that infers everything one
might expect
167
OWL DL
• A number of restrictions are defined
• classes, individuals, object and datatype properties, etc, are
fairly strictly separated
• object properties must be used with individuals
• i.e., properties are really used to create relationships between
individuals
• no characterization of datatype properties
• …
• But: well known inference algorithms exist!
169
<q>
<q> rdf:type
rdf:type <A>.
<A>. ## AA is
is aa class,
class, qq is
is an
an individual
individual
<r>
<r> rdf:type
rdf:type <q>.
<q>. ## error:
error: qq cannot
cannot be
be used
used for
for aa class,
class, too
too
<A>
<A> ex:something
ex:something <B>.
<B>. ## error:
error: properties
properties are
are for
for individuals
individuals only
only
<q>
<q> ex:something
ex:something <s>.
<s>. ## error:
error: same
same property
property cannot
cannot be
be used
used as
as
<p> ex:something “54”.
<p> ex:something “54”. ## object and datatype property
object and datatype property
170
OWL DL usage
• Abiding to the restrictions means that very large
ontologies can be developed that require precise
procedures
• eg, in the medical domain, biological research, energy
industry, financial services (eg, XBRL), etc
• the number of classes and properties described this way
can go up to the many thousands
• OWL DL has become a language of choice to
define and manage formal ontologies in general
• even if their usage is not necessarily on the Web
171
Ontology development
• The hard work is to create the ontologies
• requires a good knowledge of the area to be described
• some communities have good expertise already (e.g.,
librarians)
• OWL is just a tool to formalize ontologies
• large scale ontologies are often developed in a community
process
• Ontologies should be shared and reused
• can be via the simple namespace mechanisms…
• …or via explicit import
173
Ontologies examples
• eClassOwl: eBusiness ontology for products and
services, 75,000 classes and 5,500 properties
• National Cancer Institute’s ontology: about 58,000
classes
• Open Biomedical Ontologies Foundry: a collection
of ontologies, including the Gene Ontology to
describe gene and gene product attributes in any
organism or protein sequence and annotation
terminology and data (UniProt)
• BioPAX: for biological pathway data
175
Other SW technologies
• There are other technologies that we do not have
time for here
• find RDF data associated with general URI-s: POWDER
• bridge to thesauri, glossaries, etc: SKOS
• use Rule engines on RDF data
180
• Integration of
relevant data in
Zaragoza (using
RDF and ontologies)
• Use rules on the
RDF data to provide
a proper itinerary
Courtesy of Jesús Fernández, Mun. of Zaragoza, and Antonio Campos, CTIC (SWEO Use Case)
183
“Core” vocabularies
• There are also a number widely used “core
vocabularies”
• Dublin Core: about information resources, digital libraries,
with extensions for rights, permissions, digital right
management
• FOAF: about people and their organizations
• DOAP: on the descriptions of software projects
• SIOC: Semantically-Interlinked Online Communities
• vCard in RDF
• …
• One should never forget: ontologies/vocabularies
must be shared and reused!
186
Some books
• G. Antoniu and F. van Harmelen: Semantic Web
nd
Primer, 2 edition in 2008
• D. Allemang and J. Hendler: Semantic Web for the
Working Ontologist, 2008
• Jeffrey Pollock: Semantic Web for Dummies, 2009
• …
Further information
• Planet RDF aggregates a number of SW blogs:
• http://planetrdf.com/
• Semantic Web Interest Group
• a forum developers with archived (and public) mailing list,
and a constant IRC presence on freenode.net#swig
• anybody can sign up on the list:
• http://www.w3.org/2001/sw/interest/
188
Conclusions
• The Semantic Web is about creating a Web of
Data
• There is a great and very active user and
developer community, with new applications
• witness the size and diversity of this event
190
http://www.w3.org/2009/Talks/0615-SanJose-tutorial-IH/