1 An Introduction To Spatial Databases
1 An Introduction To Spatial Databases
Introduction to Spatial
Databases
Definition
•A spatial database is a database that is optimized to
store and query data related to objects in space,
including points, lines and polygons.
•While typical databases can understand various numeric
and character types of data, additional functionality
needs to be added for databases to process spatial data
types
Definition…
• Ralf Hartmut Güting defines Spatial DBMS:
– A spatial database system is a database system
– It offers spatial data types in its data model and
query language
– It supports spatial data types in its
implementation, providing at least spatial indexing
and efficient algorithms for spatial join.
Introduction
• Spatial DB contains objects “ in ” the space
• A spatial database system:
• Is a database system
– A DBMS with additional capabilities for handling spatial data
• Offers spatial data types (SDTs) in its data model and query
language
– Structure in space: e.g., POINT, LINE, REGION
– Relationships among them: (l intersects r)
• Supports SDT in its implementation providing at least
– spatial indexing (retrieving objects in particular area without scanning
the whole space)
– efficient algorithms for spatial joins (not simply filtering the cartesian
product)
Three meanings of the
acronym GIS
• Geographic Information Services.
– Web‐sites and service centers for casual users, e.g.
travelers
– Example: Service (e.g. AAA, mapquest) for route planning
• Geographic Information Systems.
– Software for professional users, e.g. cartographers
– Example: ESRI Arc/View –Info –Editor software.
• Geographic Information Science.
– Concepts, frameworks, theories to formalize use and
development of geographic information systems and services
– Example: design spatial data types and operations for
querying
Modeling
• Objects in space: cities, forests, or rivers
– single objects
• Coverage/Field: say something about every
point in space (e.g., partitions, thematic maps)
– spatially related collections of objects
Modeling: spatial primitives for objects
• Point: object represented only by its location
in space, e.g. center of a region
• Line (actually a curve or ployline):
representation of moving through or
connections in space, e.g. road, river
• Region: representation of an extent in 2d‐
space, e.g. lake, city
Modeling: coverages
• Partition: set of region objects that are
required to be disjoint (adjacency or region
objects with common boundaries), e.g.
thematic maps
• Networks: embedded graph in plane
consisting of set of points (vertices) and lines
(edges) objects, e.g. highways, power supply
lines, rivers
Modeling: a sample spatial type system (1)
EXT={lines, regions}, GEO={points, lines, regions}
Spatial predicates for topological relationships:
inside: geo x regions bool
intersect, meets: ext1 x ext2 bool
adjacent, encloses: regions x regions bool
Operations returning atomic spatial data types:
intersection: lines x lines points
intersection: regions x regions regions
plus, minus: geo x geo geo
contour: regions lines
Modeling: a sample spatial type system (2)
• Spatial operators returning numbers
– dist: geo1 x geo2 real
– perimeter, area: regions real
• Spatial operations on set of objects
– sum: set(obj) x (objgeo) geo
– A spatial aggregate function, geometric union of all
attribute values, e.g. union of set of provinces
determine the area of the country
– closest: set(obj) x (objgeo1) x geo2 set(obj)
– Determines within a set of objects those whose
spatial attribute value has minimal distance from
geometric query object
– Other complex operations: overlay, buffering, …
Modeling: Spatial Relationships
Most important operations of spatial algebras (predicates).
E.g. find all objects in a given relationship to a query
object.
• topological relationship: inside, intersects, adjacent
…(invariant under translation, rotation, scaling)
• Direction relationship: above, below, north_of, ...
• metric: distance < 100
Topological relationships studied in some depth. Any
completeness criteria ?
Yes! Egenhofer 89 and subsequent work. Originally for simple
regions only (no holes, connected) boundary
Spatial Relations
Topological Relations: containment, overlapping,
A B A B
1 Km
A B
B
Topological Relations
Topological relations are defined using point‐
set topology concepts, such as boundary and
interior.
Topological Relations (Cont.)
• For example:
– the boundary of a region consists of a set
of curves that separate the region from the
rest of the coordinate space
– The interior of a region consists of all points
in the region that are not on its boundary
• Given this, two regions are said to be
adjacent if they share part of a boundary
but do not share any points in their interior
Modeling: SDBMS data model
• DBMS data model must be extended by SDTs
at the level of atomic data types (such as
integer, string), or better be open for user‐
defined types
Querying
• Two main issues:
1. Connecting the operations of a spatial algebra
(including predicates for spatial relationships) to
the facilities of a DBMS query language.
Fundamental spatial algebra operator are:
– Spatial selection
– Spatial join
2. Providing graphical presentation of spatial data
(i.e. results of queries), and graphical input of
SDT values used in queries.
Querying: spatial selection
• Spatial selection: returning those objects satisfying a
spatial predicate with the query object
– “List All cities in Gojjam”
– “List All big cities not more than 350 Kms from Addis
Ababa”
Spatial Joins
• In practice, spatial join operations are divided
into a filter step and a refinement step to
efficiently process complex spatial data types
such as point collections in a row instance. In
the filter step, the spatial objects are
represented by simpler approximations such
as their Minimum Bounding Rectangle or Box
(MBR or MBB).
Spatial Join
A spatial join associates two tables based on a
spatial relationship, rather than an the classic
non-spatial relational attribute. A spatial join
operation is used to combine two or more dataset
with respect to a spatial predicate or spatial
operation. Predicates can be a combination of
directional, distance, and topological spatial
relations (e.g. overlap, contains). In case of non-
spatial join, the joining attributes must of the
same type, but for spatial join they can be of
different types.
Spatial Join Example1
Query: For all the rivers listed in the River table, find the counties
through which they pass.
The spatial predicate “Cross” is used to join River and Country tables
To view this we would add asbinary(R.the_geom,C.the_geom).
Note as we do not have a country table this will not work with our Irish data
set.
Spatial Joins Example 2
• A spatial join associate two tables based on a
spatial relationship, rather than an attribute
relationship. For example the query:
Summarize the election results
by municipality.
• Could be answered using the following SQL:
SELECT
m.name,
sum(v.fiannfail) AS FF,
sum(v.gp) AS Green,
sum(v.finegael) AS FG,
sum(v.vtotal) AS total
FROM voting_areas v, municipality m,
WHERE
intersects(v.the_geom, m.the_geom)
GROUP BY m.name
ORDER BY m.name;
http://www.oireachtas.ie/documents/publications/dublin2.pdf
http://www.oireachtas.ie/viewdoc.asp?DocID=10838
Dynamic and Static Data
• Static non‐spatial data is usually maintained in the
table with the geometry (e.g. county name or unique
identified). In some cases this data along with the
geometry is considered immutable.
• Dynamic non‐spatial data is often maintained in a
separate table e.g. land ownership.
• There can be more than one dynamic table for a
geometry table.
• Dynamic spatial can include moving objects or a
changing world (temporality requires different
treatment)
Traditional Joins
• Pure data tables can be joined with the
geometry tables for querying purposes
• A primary key is used to relate the 2 tables
together
• A primary key is a unique identifier for each
row in a table
Primary Key
• Spatial join: A join which compares any two
joined objects based on a predicate on their
spatial attribute values.
– “For each river that pass through Gojam, find all
cities within less than 50 Kms. from the rivers”
• Spatial data structures either store points
or rectangles (for line or region values)
• Operations on those structures: insert,
delete, member
• Query types for points:
– Range query: all points within a query rectangle
– Nearest neighbor: point closest to a query point
– Distance scan: enumerate points in increasing distance
from a query point.
• Query types for rectangles
– Intersection query
– Containment query
Spatial Indexing: idea… approximate!
• A fundamental idea: use of approximations
as keys
1) continuous (e.g. bounding box)
2) Grid (a geometric entity as a set of cells).
• Filter and refine strategy for query
processing:
1. Filter: returns a set of candidate object which is
a superset of the objects fulfilling a predicate
2. Refine: for each candidate, the exact geometry is
checked
Spatial Indexing: memory organization
• A spatial index structure organizes points into buckets.
• Each bucket has an associated bucket region, a part of space
containing all objects stored in that bucket.
• For point data structures, the regions are disjoint & partition
space so that each point belongs into precisely one bucket.
• For rectangle data structures, bucket regions may overlap.
A kd‐tree partitioning of
2d‐space
where each bucket can
hold up to 3 points
Spatial Indexing: 1‐D Grid approx. (1/2)
• One dimensional embedding: z‐order or bit‐
interleaving
– Find a linear order for the cells of the grid while
maintaining “locality” (i.e., cells close to each other in space
are also close to each other in the linear order)
– Define this order recursively for a grid that is
obtained by hierarchical subdivision of space
1111
01 10
00 11 0000
Spatial Join
• Traditional join methods such as hash join or
sort/merge join are not applicable.
• Filtering cartesian product is expensive.
• Two general classes:
1. Grid approximation/bounding box
2. None/one/both operands are presented in a spatial index
structure
• Grid approximations and overlap predicate:
– A parallel scan of two sets of z‐elements corresponding to
two sets of spatial objects is performed
– Too fine a grid, too many z‐elements per object (inefficient)
– Too coarse a grid, too many “false hits” in a spatial join
Spatial Join
• Bounding boxes: for two sets of rectangles R, S all
pairs (r,s), r in R, s in S, such that r intersects s:
– No spatial index on R and S: bb_join which uses a
computational geometry algorithm to detect rectangle
intersection, similar to external merge sorting
– Spatial index on either R or S: index join scan the non‐
indexed operand and for each object, the bounding box of
its SDT attribute is used as a search argument on the
indexed operand (only efficient if non‐indexed operand is
not too big or else bb‐join might be better)
– Both R and S are indexed: synchronized traversal of both
structures so that pairs of cells of their respective
partitions covering the same part of space are
encountered together.
Geometry Hierarchy
Geometry Hierarchy…
• Geometries can be broadly categorized into
two groups, as follows.
– Single geometries contain one discrete geometric
element. The most basic single geometry is a
Point. There are also three types of curve
(LineString, CircularString, and CompoundCurve)
and two types of surface (Polygon and
CurvePolygon).
– Geometry collections are compound elements,
containing one or more of the individual
geometries listed above.
Geometry Hierarchy…
– homogeneous or heterogeneous, it contains
serveral itmes of the same type of single geom
– A homogeneous geometry collection contains
several items of the same type of single geometry
only (e.g., a MultiPoint is a geometry collection
containing only Points).
– A heterogeneous geometry collection contains
one or more of several different sorts of geometry,
such as a collection containing a LineString and a
Polygon
Points
• A Point is the most fundamental type of
geometry, and is used to represent a singular
position in space.
• Defining Points
• Defining a Point
– A Point is defined by a pair of coordinate values,
either an x‐coordinate value and a y‐coordinate
value from a planar coordinate system, or a
latitude and longitude coordinate value from a
geographic coordinate system.
Point(3,5)
Characteristics of Points
• All Point geometries share the following
characteristics.
– A Point is zero‐dimensional, which means that it
has no length in any direction and there is no area
contained within a Point.
– A Point has no boundary.
– The interior of a Point is the Point itself. Everything
other than that Point is the exterior.
– Points are always classified as "simple"
geometries.
LineStrings
• Points and draw the path segments that directly
connect each one to the next in the series. This path
defines a LineString.
LINESTRING(2 3, 4 6, 6 6, 10 4)
DECLARE @LineString geometry = 'LINESTRING(0 0, 8 6)';
DECLARE @CircularString1 geometry = 'CIRCULARSTRING(0 0, 4 3, 8 6)';
SELECT
@LineString.STEquals(@CircularString1), ‐‐ Returns 1 (true)
@LineString.STEquals(@CircularString2); ‐‐ Returns 1 (true)
Polygons
• A Polygon is a type of surface; that is, a
Polygon is a two‐dimensional geometry that
contains an area of space. The outer extent of
the area of space contained within a Polygon
is defined by a closed LineString, called the
exterior ring.
Defining polygons
• The Well‐Known Text for a Polygon begins with the POLYGON
keyword, followed by a set of round brackets.
POLYGON((1 1, 3 1, 3 7, 1 7, 1 1) )
• The following code listing demonstrates the WKT syntax for a
triangular Polygon containing an interior ring.
POLYGON((10 1, 10 9, 4 9, 10 1), (9 4, 9 8, 6 8, 9 4))
Implementing spatial data in SQL
Examples: world database
• Country( Name: varchar(35), Cont:
varchar(30), Pop: integer, GDP: integer,
Life‐exp; integer, shape: char(9))
• City(Name: varchar(30), Country: varchar(35),
Pop: integer, capital: varchar(30), Shape:
varchar(9))
• River(Name: varchar(35), Origin: varchar(35),
Length: integer, Shape: char(13))
Datatypes
• Some common SQL data types
Spatial Data Entity Creation
• Form an entity to hold county names, states, populations, and
geographies
CREATE TABLE County(
Name varchar(30),
State varchar(30),
Pop Integer,
Shape Polygon);
• Form an entity to hold river names, sources, lengths, and
geographies
CREATE TABLE River(
Name varchar(30),
Source varchar(30),
Distance Integer,
Shape LineString);
Query
Find all the counties that border on Contra Costa county
SELECT C1.Name
FROM County C1, County C2
WHERE Touch(C1.Shape, C2.Shape) = 1 AND C2.Name = ‘Contra
Costa’;
Find all the counties through which the Merced river runs
SELECT C.Name, R.Name
FROM County C, River R
WHERE Intersect(C.Shape, R.Shape) = 1 AND R.Name = ‘Merced’;
• Operations fall into three categories:
– Apply to all geometry types
• SpatialReference, Envelope, Export,IsSimple, Boundary
– Predicates for Topological relationships
• Equal, Disjoint, Intersect, Touch, Cross, Within, Contains
– Spatial Data Analysis
• Distance,Buffer,Union, Intersection, ConvexHull, SymDiff
– Table 3.9 (pp. 66) details spatial operations
Spatial Queries with SQL/OGIS
• SQL/OGIS - General Information
•Both standard are being adopted by many vendors
•The choice of spatial data types and operations is similar
•Syntax differs from vendor to vendor
• Readers may need to alter SQL/OGIS queries given in text to make
them run on specific commercial products
• Using OGIS with SQL
• Spatial data types can be used in DML to type columns
• Spatial operations can be used in DML
• Scope of discussion
• Illustrate use of spatial data types with SQL
• Via a set of examples
Spatial Operations with SQL/OGIS
• Basic Functions
• SpatialReference(): Returns the underlying coordinate system.
• Envolope(): Returns the Minimum Bounding Rectangle (MBR).
• Export(): Returns the geometry in a different representation
• IsEmpty(): Returns true if the geometry is the null set.
• IsSimple(): true if no self intersections
• Boundary(): returns the boundary of the geometry.
Spatial Operations with SQL
• Topological and Set comparison Operations
• Equal: True if the interior and boundary of two geometries are equal.
• Disjoint: True if the boundaries and interiors do not intersect
• Intersection: true if geometries are not disjoint
• Touch: true if the boundaries of two surfaces intersect but their interiors
do not
• Cross: true if the interior of a surface intersects with a curve.
• Within: returns true if the interior of a given geometry does not intersect
with the exterior of another geometry
• Contains: tests whether the given geometry contains another given
geometry.
• Overlap: returns true if the interiors of two geometries have non‐empty
intersections.
Spatial Operations with SQL/OGIS
• Spatial Analysis
• Distance: return the shortest distance between to geometries
• Buffer: returns zone around some geometries
• ConvexHull: returns the smallest convex geometric set enclosing a
geometry.
• Intersection: Returns the intersection of two geometries
• Union: Returns the union of two geometries
• Difference: returns the portion of a geometry that does not intersect with
another given geometry.
• SymmDif: returns the portions of two geometry that do not intersect with
each other.
List of Spatial Query Examples
• Simple SQL SELECT_FROM_WHERE examples
•Spatial analysis operations
•Unary operator: Area (Q5, pp.68)
•Binary operator: Distance (Q3)
•Boolean Topological spatial operations - WHERE clause
•Touch (Q1, pp. 67)
•Cross (Q2, pp. 68)
•Using spatial analysis and topological operations
•Buffer, overlap (Q4)
•Complex SQL examples
• Aggreagate SQL queries
• Nested queries
Using spatial operation in SELECT clause
Query: List the GDP and the distance of a country’s capital city to the
equator for all countries.
Query: List the GDP and the distance of a country’s capital city to the
equator for all countries.
Note:
Point(0,Ci.Shape.y)
means lat = 0,
long=C1.Shape.y
Using Spatial Operation in WHERE
clause
Query: Find the names of all countries which are neighbors of the
United States (USA) in the Country table.
Query: The St. Lawrence River can supply water to cities that
are within 300 km. List the cities that can use water from the
St. Lawrence.
SELECT Ci.Name
FROM City Ci, River R
WHERE Overlap(Ci.Shape, Buffer(R.Shape,300))=1
AND R.Name =‘St.Lawrence ’
• Learning Objectives (LO)
– LO1: Understand concept of a query language
– LO2 : Learn to use standard query language (SQL)
– LO3: Learn to use spatial ADTs with SQL
– LO4: Learn about the trends in query languages
• Facilities for user defined data types in SQL3
• Mapping Sections to learning objectives
– LO2 ‐ 3.2, 3.3
– LO3 ‐ 3.4, 3.5
– LO4 ‐ 3.6
Defining Spatial Data Types in SQL3
• SQL3 User defined data type - Overview
• CREATE TYPE statements
• Defines a new data types
• Attributes and methods are defined
• Separate statements for interface and implementation
•Examples of interface in Table 3.12 (pp. 74)
– CREATE TYPE statement