Basic Data Models
Basic Data Models
minerals for the farmer, but its strength is important for a civil engineer,
hydrologist is concerned about the rate and amount of water percolation and a
etc. This
complexity and broad spectrum of interpretation suggests that the GIS system
will have an imprint of its creator/ developer. It represents the
user of the real world, andtermed as the conceptual data model. perception
This
of the
perception
is based on the level of information and understanding of the user. This
implies
that the conceptual model is database independent. A major benefit of data
3 model is that it provides theoretical foundation for the development of physical
Basic Data Models
models. From the theory emerges the power of analysis, the ability to extract
inferences and to create deductions that emerge from the raw data. Different
models provide different conceptualizations of the world; they have different
outlooks and different perspectives.
3.1 The Real World (In the most general sense, a map is a data model as it is set of cartographic
complex and
which we live in, is so iregular, features that have some physical or cultural manifestation on the surface or
The real world, the world in of it is a Herculean task.
dynamic that the accurate and
precise representation subsurface. Pequet (1984) defined a data model as a general description of
techniques has been able to represent
a simplified
GIS through mapping tools andheterogeneity of the information that can be stored specific groups of entities and the relation ship between these groups of entities"
view of the world. Despite the representing spatial But the real world is transformed into a datamodel at a fairly high level of
few common methods of abstraction using various generalization operators as selection and simplification.
in a GIS, there are only a developing a GIS application, real world
information in a GIS database.
In Data are observation of the real world and converted into information. Since
that can be stored
features need to be translated
into simplified representations thinking about the Geographical information system is_ a tool for creating, açcessing, monitoring,
Data models provide a way of
and manipulated in a computer. phenomena that interest us. They can be thought displayand analyzing domain specific, geographic database for planning.
world, a way of organizing the grammar by which management and decision-making, it heavily relies on two and three-dimensional
collection of words along with a
of as an abstract language, a geographic data. The traditional method for presenting and analyzing of spatial
we describe the subject. data is map, which uses symbols to represent the real world. The geographic
A data model is a
or view of the real world. data, has physical dimensions and geographical location on the earth surface, Le.
Data model is a description the data is actually stored in the
conceptual idea, as opposed to the
way that mathematical formalism
Spatial and attribute. GIS integrates these two sets of data and allows deriving
structure. Adata model is new data for analysis and query building.
computer, which is the data
1988):
consisting of two parts (Ullman 3.2. Spatial Data
A notation for describing data,and Spatial data is that which has physical dimension and geographic location on
to manipulate that data.
o A set of operations used of carth and available in the form of maps. Spatial data stored in maps can be
define a data model as "a set
Tsichritzis and Lochovsky (1977) organization of the data in a
described by three concepts: entity, attribute and relationship. By definition an
of the logical entity is a distinct spatial object. Attribute is a description of some aspect of
guidelines for the representation relationships
logical units of data and the entity, which may be qualitative or quantitative in natuYe and relationship, is the
database, consisting or having named which a set of precisely
formal system in spatial association among entities. For example, ariver is an entity, its width,
between them." A data model is a with a set of precisely
manipulated in accordance length and name are the attributes, and its connection with the other water body
defined objects can be
predefined rules, without any regard
for the 'meaning' or real-world is relation. That is, the relation represent the link between two different entities.
interpretation of those objects or rules. Reality
is an informal system, a system of Amap depicts a variety of information e.g. position, spatial relation, type of
and a system with an infinite amount of information. The feature and measurable quantities. All the geographical data must be represented
immense complexity model is to find a formal
comprehensive and useful data in a simplified form so as they can be stored in the computer. Hence they are
difficulty in defining a behavior of the real world as
informal represented by three geometric entities such as point, line and area or polygon
system whose behavior represents the
and a label,
closely as possible.
individual's perception and Apoint is specified by location x, y in the Cartesian coordinate system, a
The representation of realities is reflection of
and area of application. Meaning thereby the line by a sequence of connected points known as vertices and the polygon, a
understanding of the phenomenon storehouse of
data model is tailored to a specific problem. For, soil may be
InformationSystems Basic Data Models /71
70/Introduction to Geographical
starting and ending points. and displayed (Burrough, 1986). Computers require descriptive information
closed area specified by a
poly line having the same entity, to a large instructions to turn the spatial and
feature used to model a geographic entities into graphical form. The construction of
The selection of geometric example, Delhi may be spatial models passes through numerous stage of data abstraction.
depends on scale and size of the map. For (large scale) should terminology, the way of representing data is known as data models. InRealdatabase
extent Delhi
scale map but a map of features exist in two forms: objects and phenomenon. Objects are discreteworld
represented as point on a small number of points (landmarks) lines
and
which may have definite such as buildings; parks and cities etc. phenomena are distributed over
represent it as a polygon industrial area etc.. Generally a
railways) and polygons as built-up area, large area such as temperature, soil, etc. These two forms lead to two distinct
(roads &
identified i.e. adjacent polygons and island polygons.
two types of polygons are represent the common boundaries of two, approaches._These are object oriented based and field based models (Goodchild
those, which 1992, Wang and Howarth 1994). The object based spatial database (those
The adjacent polygons are boundaries, properties etc.
boundaries, administrative obtained by field surveying, remote sensing image analysis, photo interpretation,
or more polygons e.g. plot number of situations, for example, pasture lands
The island polygons occur in a and digitization etc.) are generally represented in the form of coordinate lines and
field. termed as Vector data models. When the spatial database is structured on the
within the forests, woodland in the
field-based model the basic spatial units are different forms of tessellation
Geographical Dato (regular as DEM or irregular as TIN) are termed as Raster data model!.
Jdentification of features from the real world followed by the selection of
Attribute data appropriate spatial data model, Raster or Vector and their structure are the key to
Geometric data
the spatial model building process. All of the major GIS and CAD software
Oualitative data packages are either vector or raster based with some extended functions to
Geometry support other data structures.
Quantitative data
Figure 3.3 Raster data are available from a number of sources, but the most important
When a picture is stored in a computer the raster data model is used.
source is satellite. The satellite data is generally stored in three different formats
The function of each grid cell is determined by the information it contains, and viz. .bil,_bip, and bsq. These formats are discussed in the chapter of data sources.
the characteristics of each grid cell can be defined. In the raster data model, the When we scan a vector map, it is also being rasterized.However in doing so
there is some loss of information.
accuracy of the map depends on the scale of the map and the resolution. The area
covered by a cell in agrid is temed as resolution. The strength of the raster data model its simpliity. Many operations on
geographic data are easier to implement and execute faster in a raster GIS.
The raster model provides simplest way of storing spatial data. The smaller Modeling of continuous data, as is often done with elevation or bydrological
the area of land that each cell represents, the higher the resolution of the data, and
data, is usually performed with a raster GIS, One disadvantage is that there is a
trade-off between the size of the resulting raster data sets and the precision with
74/ Introduction to Geographical lnformation Systems Basic Data Models l 75
which spatial features can be represented. A very fine raster grid will represent amined in turn, and pixels
having the same
all curves in a boundary with sufficient detail, but will require enough of disk are grouped together, It uses a lD method ofvaluc, that is, homogeneous pixels
identical values. Sequences of pixels with similar grouping pixels with similar or
space.The storage of such avoluminous data into the computer was achallenging Remory by the positional reference values
task, which resulted in the development of a number of techniques for the number representing the number of pixels to the first pixel in the are replaced in the
compaction of raster data. in the grouping. grouping and by the
Data compaction techniques For a clarity we have
beence of a feature. As evident selected land 0 as numbers
The techniques considered here for the purpose are, chain coding, run from the figure 3.6 the indicating presence or
feature in question is
length coding, block coding and quad trees. ahsent in the first row, thus it is represented as
be first row. The procedure is repeated 0 (absent), 10(number of cells)
Chain coding till all the cells of the entity are taken in into
account.
1
4,2 8,2 4,2
6,6 6,7 7,7 3
File 8tructure
4 8,3 7,s
5,2
Quadtree 3
method. It works best on bitmaps with no noise; black and white images with 03
solid shapes, but it can do graphic colour images also. A quadtree is a term used
-A AA
IMGRIDmodel l) Never use a raster to represent a sewer network, if the application requires
accurate connectivity. Coding 1 in cells where a sewer is present, 0
IMGRID model is a slight improvement over the grid model. To represent a elsewhere. If two adjacent cells both have 1, that's no guarantee the sewers
thematic map of land use that contains four categories: recreation, agriculture, they contain are connected.
industry and residence, cach of these features has to be separated out as an definition the
individual layer. In the layer that represents agriculture I or 0 will represent the (2) Never use a raster to represent land ownership parcels. By and
presence or absence of crops respectively. The rest of layer will be represented in boundary between two survey points is a mathematically straight line,
of a raster representation would be unacceptable.
the jagged appearance
80/ Introduction to Geographical Information Systems Basic Data Models 181
straightforward but when complex
3.5. Vector Data Model entities
stored, the adjoining boundaries will be such as more than two polygons are
of GIS that use adjacent line and will create matching entered twice. It will cause dupiication of
A vector GIS is simply a generic name to describe a class problem.
Beside, it wil
objects. Avector is space inthe computer. To overcome this
the vector data structure to describe, represent and use spatial models are considered here and the related problems,Occupypopalar
more
that requires both magnitude and direction for its description. two
a physical quantity
segment. Spaghetti model
This means, a vector can be represented graphically by a directed line
the same or
An important property of vectors is that when they are oriented in plus
Vector data that have been collected but
opposite directions, they can be added or subtracted, just like numbers with is also called as not structured are said to be spagheti
data model. It
and minus signs. model. As the name suggests, it isnon-topological or geometric or path topological
a direct line-by-line translation of vector map.
(The vector data model is an object based approach to the representation of Fach entity i.e. point, line or polygon becomes alogical record in the
the real world features and is best used to represent discrete objects. That is, it amd is defined as string of x, y coordinates. digital ile
It means what we get is something that
represents anything from a simple number to a complex entity. It is the familiar looks like a map or other drawing when we display it as a graphic,
coordinate geometry representation of specifying a point, line, or polygon by underlying structure. In other words a polygon is drawn by just makingbut it has no
lines. 1here
two-dimensional Cartesian coordinates because of the assumption that the real is no correspondence between these strings and the boundary of individual
U world can be divided in clearly defined elements. Dolygons. lt means it stores graphical elements not the graphical entities.
By using one or more pairs of coordinates, spatial objects are identified and The data storage and access is sequential, thus the search time is often long.
forms of basic gaphical elements. spatial
Such
represented by one of the three Besides, all the polygons are stored as independent polygons; as a result the
objects have locations depicted simply as points, lines or areas. In addition, such intervening boundaries are digitized twice thus occupying more spacen the
spatiat objects may have attributes. Point entities are represented by a single (x,
y)
computer.
Line and polygon are represented by a series of connecting points.
coordinate pair.
x7. y7
x8. y8
X8. y
x5, y5 x9. y9
x10 y10
11. y11
*3. y3
N 12. y 2
x2, y2 73. y13
have more than two pairs. In case of area it is a combination of line segments
are units that Figure 3.14 : Path Topological/Spaghetti Model
having common starting and ending points. The vector entities
points for the
carry the information. The selection of appropriate number of will cause Table 3.1
construction of an entity is an important step, as too many points
Coordinate List
repetition and too few will lead to over generalization of the feature. Paper maps Polygon
are the typical example of vector model. The vector data structure produces all its
smaller file size than raster images because a raster image needs space for P (x5, y5), (x6, y6), (x7, y7), (x8, y8), (x9, y9),
stored in vector representation. When
pixels while only point coordinates are (X10, y10), (x11, y11), (x17, y17), (x5, y5),
data
geometric shapes need to be represented precisely in a GIS or CAD, vector (x3, y3), (xl6, yl6), (x17, y17), (x11, yl1), (x12, y12),
structure is always the option to use it because it is not limited to spatial
P2
y2),
resolution or pixel size and mathematical formulae can be used for regular shapes (X13, y13), (x14, y14), (x15, yl5), (x1, yl), (x2,
and smooth curves. o (X3, y3),
Since most of the thematic mapping for Geographic Information System are P3 (K3, y3), (x4, y4), (x5, y5), (x17, y17), (x16, y16),
based on polygons, their storage and manipulation has received considerable (X3, y3),
attention. The characteristic of a polygon is that it must have a unique shape,
area and perimeter. The representation of a simple polygon with few points is
Information Systems
82/ Introduction to Geographical Basic Data Models / 83
other
the features are not retained. In
The spatial relationships between segments with no real connection. All Table 3.2 : Point dictionary representation
words it is a collection of points and line
is in place, but is randomly organized Polygon point list
the information necessary to draw a mapthe data easy, but makes it very difficult
("unlinked"). This makes organization of P1 p5, pó, p7, p8, p9, pl0, pll, pl7, p5
to use for analysis. P2 p3, pl6, pl7, pl 1, pl2, pl3, pl4, pl5, pl, p2. p3
in the one-to-one translation of the
The Spaghetti model is essentiallybased P3 p3, p4, p5, pl7, pl6, p3
map. The map is maintained as the
graphic image we nomally encounter on a Point Coordinates
conceptual model. For adjacent polygons, separate pieces
of spaghetti are placed
the same string of spaghetti. Each side P1 (x1, yl)
for adjacent sides. No one polygon shares coordinate pairs. Although
of the polygon is defined by its own set of lines and P2 (x2, y2)
It means
polygon sides are recorded separately, they have the same coordinates. P3 (x3, y3)
there is considerable redundancy with this data model. P4 (x4, y4)
relationships between
Because of its one-for-one translation structure, the
objects are implied rather than explicit. The result of this lack of
and among measurements and
topology is enormous computational overhead, making
of the spaghetti model is that it is
analysis difficult. The positive aspect P17 (x17, yl7)
relatively efficient as a method of cartographic display and used in Computer
Assisted Cartography (CAD) where analysis is not the primary purpose To over come the adjancy problem, the chain dictionary model is applied.
In this model cach polygon is given a circular list of chains and each chain in turm
A marginal improvement over the path topological models (spaghetti
between is a list of points.
model) is the point dictionary model. In this model the transformation
points and numerical coordinates is incorporated. All coordinate pairs are
numbered sequentially and stored in random access allocation. Each polygon is
the
stored as a circular list of the point ID that act as the address for accessing
appropriate coordinate. Thus the storage requirements are slightly reduced
as x, y C2
twice for
pairs are stored only once. However each point ID is still stored
common boundaries. Though, this model saved space in the system but the
adjacency problem remained. Thus the point ID values are first retrieved from C1
N2
the polygon list, which in tum are used to retrieve the respective coordinates. C5
N4
N3
C4
P7 C6
PF N
P6
C3
P1
P4
P17 P10
P11
P3
P3 P2
Figure 3.16: Chain Dictionary Model
P16
P2 P13
P12
This model has reduced the space requirement and the access time. The
P14 polygon retrieval is now a three-step procedure. Retrieval of polygon ID, point
P15
ID and corresponding coordinates.
P1
Topology model
Figure 3.1S : Point Dictionary model. The spaghetti looks fine, for just to see a picture, but it doesn't really satisfy
the needs of a GIS because the spaghetti model does not retain any explicit
relation. For example, in the spaghetti model, finding a possible route from Ato
Systems
Geographical Information
847 Introduction to It is so
because the
system Basic Data Models /85
consuming process.
B, will be cumbersome and time
co-ordinate pair in the database and sets of rules. Topology rules are embedded in spatially
and
everv single line
ability to store these topological rules and relationships intelligent
mors software. The
problem becomes
willneed to look at the line. The
it to every line segment in crosses ours. In such a case We have essential functions of a GIS product. Topology cfficiently is one of the
compare another line sea is defined as the mathematics of
complicated if at any moment,that line against all the segments of our line, to connectivity or adjacency of points or lines that detemines spatial
to test every single
segment of
meeting, we can tag that line and in a GIS. The topological data structure logically determines relationships
or exactly how and
If there is a crossing for the
if they cross or meet. complete search and test where points and lines connect on a map by means of nodes (topological
such query requires a is horriblv
Keep going, but every fairly large, this junctions). The order of connectivity defines the shape of an arc
GIS databases are or polygon. The
entire database. As most system, such as Maplnfo,
and even Computer stores this information in various tables of the database
storing information in a logical and ordered relationship missing structure.
consider it in asmall By
Inefficient. We can only simpler. For big GIS, we use
few hints to make things infomation,
there we give the system a e.g., a line segment of a polygon is readily apparent. A GIS manipulates,
a thing called Topology. analyzes, and uses topological data in determining data relationships.
Geometric and topologic characteristics
W In GIS, topology is implemented through data structure. Arc Info coverage
analyze large quantities of
GIS is to manipulate and
The primary role of understanding To date
is a familiar topological data structure. A coverage explicitly stores topological
of the nature of the spatial data. relationships among neighboring polygons in the Arc Attribute Table (AAT) by
spatial data. It requires an structured spatial data. It is
topologically storing the adjacent polygon IDs in the LPoly and RPoly fields. Adjacent lines
the accepted theoretical solution is reflects the geography of the real world are connected through nodes, and this information is stored in the arc-node table.
believed that topologic data model best foundation for encoding spatial relation The Arclnfo commands, CLEAN and BUILD, enforce planar topology on data
and provides an effective mathematical data.
manipulating and analyzing vector based and update topology tables.
ship, providing a data model for making of spatial relationships. In fact,
Topology is the term used to describe the GIS by explicitly building
The five axioms in topology are as follows:
'intelligence' to the database in a All arcs end in points or nodes.
topology adds
building (relationship), points, lines and
relationships, In the process of topology connection Arcs cannot intersect except at their nodes
Topological data define the logical
areas are calculated and encoded. geographical description and analysis.
Areas are completely enclosed by arcs.
between points, lines and areas for Areas do not overlap
Connections between spatial objects, for
example, information on areas, which Every lOcation is within some area.
topological data. From this adjacent
bound a line segment, are considered to be
objects may also be identified. The topology of any line would thus In other words, topology defines connections between features, identifies
spatial adjacent polygons and can define one feature, such as an area as a set of lines.
include the starting node, its destination node and the left and right polygons
through which the line passes.
The rule of thumb for topological data structures is that anything of interest on a
map must be explicitly defined as a point, line or area in order for systems to
theory, which deals
Topology is the branch of mathematics, based on graph transformation
perform any sort of spatial analysis on the data.
and remains unchanged under certain
with geometric properties this model is that it explicitly The geometric characteristics of a spatial entity may be described in terms
such as bending or stretching. The characteristic of
points, lines and of its two-dimensional shape, its distance between like or different entities, how
records adjacency information among spatial entities viz. it isconnected to other entities and what entities occupy the space adjacent to it.
polygons. Mathematical topology_assumes that geographic features occur on a Such characteristics may be casily described either in words or by a system of
two-dimensional plane. Through planar enforcement, spatial features can be coordinate grids on a map. Spatial entities can be described in terms of their
represented through nodes (0-dimensional cells); edges, sometimes called arcs dimensional characteristics. If these entities were to be changed by some kind of
(one-dimensionalcells); or polygons (two-dimensional cells). Because features a transformation, all the characteristics of these spatial entities would likewise
can exist only on a plane, lines that cross are broken into separate lines that change. Shape can be made smaller, lines made longer and so on. In other words,
terminate at nodes representing intersections rather than simple vertices. GIS there is a fundamental change to the entity itself. If a map is stretched and
with the aid of topology not only have the power to record location and simple distorted, some of its properties may change, for example, istance, direction
attribute information but also can examine spatial relationships based upon loca (angles) and relative location of objects. However, other properties such as 'next
tion, as well as functional and logical relationships among geographic features. to', 'is contained in' and "crosses' remain unchanged. Therefore, a strict
Technically, and in GIS terms, topology defines the spatial relationships topological property is one that remains unchanged by geometric distortions or
between features. It is the" glue" that holds everything together and is based on transformations of the surface. In topology the neighborhood function and adja
InformationSystems
Basic Data Models / 87
Geographical
86/ Introduction to
Associated tables
Point-peint Potnt-Arm
topological
map
Layers of
Acartographic map
Woods Pt IDX1
Point layer
5
Arc ID from
To line-line Lin-Are Aree-Area
1
e
b e
2
6 7
cross
Line layer
Poly ID Arc intersect flow into intersect border overlap inside adjacent
to
A 1,2,3
B 2
C 3.4
Figure 3.17: Topological relationships between Spatial Objects
ArcIDfrom To
3
Arc List Left Poygon Right Polygon Overshoot
Undershoot
A
A
D
Figurc 3.23:Left-right Topology Dangling node errors generally are identified by a graphic symbol different
from the one uscd for pseudo nodes. Corrections are quite simple. For
3.6. Errors in GIS indershoots, the node is identificd and is moved or "snapped" to the object to
There are six general types of errors: hich it should have been connected. Overshoot erors are corrected by
Extraentities have been digitized; identifying the intended line interscction point and "clipping" the line so that it
The entities are not in the right place and are not of the correct shape and connects where it is supposed. In the case of an open polygon, merely move one
SIze;
of the nodes to connect with theother. Another common error related to nodes is
called as Pseudo nodes. Nodes are not just points between line segments that
Entities that are supposed to be connected to each other are not properly show directionalchanges in the line; in addition, they have specific topological
connected;
meaning. Nodes may be used to identify the existence of an intersection between
All polygons have not single labcl point to identify them; two streets, or a connection between stream and alake, but they should not occur
All entities are not within the outside boundary identificd with registration at every line segment along aline or apolygon. Thus the first type of error that
marks. can be detected are false nodes, called as pseudo nodes, which occur where a line
connects with itself (island pseudo node) or where two lines intersect along a
At every stage of the input, storage, retrieval and manipulation, error is
parallel path rather than crossing. A pseudo node connecting a line with itself
being introduced. The objective of editing is to attempt to manage these errors (sometimes called
rather than climinate them. Here the focus is on errors arising from digitizing may simply be the beginning and ending of an island polygon
intervening node (also
errors a spatial pseudo node) or for two line segments with an
maps and their associated terminologies in context to Arc/Info. Common known as an attribute pseudo node).
of the digitizing process are associated with nodes, arcs and polygons. A polygon
by definition should have closed boundaries and the boundaries must be
continuous. Where this is the case, the source of the error may lie in cncoding Node
where an arc is independent of the polygon, or a digitizing error where the nodes
have not been connccted to form the polygon.
Dangling nodes:A common node error, called the dangling node, can be Pseudo Node Dangle
defined as a single node connected to asingle line entity. Dangling nodes
(usually called dangles) can result from threcpossible mistakes: failure to close a Figure 3.25
polygon, failure to connect the node to the objcct it was supposcd to be
the GIS for a
are intentionally input
connected to (called an undershoot), or going beyond the entity you were As with pseudo nodes,some dangles
nodes serve as indicators of something
Most often these
supposed to connect to (called an overshoot). particular purpose.
important at the end of aline or
arc.
96/ Introduction to Geographical Iyformation Systems
Basic Data Models / 97
Another type of common digitizing error occurs when the software uses a
vector data model that treats each polygon as a separate entity. In such cases the registration of each tile of the map, the map edges of
adjacent lines between polygons are required to digitize more than once. Filure match. Maps that are digitized at different times or adjacent sheets may not
at different coordinate
to place the digitizing puck at exactly the correct location for cach point along systems, also often appear distorted at their edges.
that line will often result in a series of tiny graphic polygons called sliver Before Edge Matching
polygons. After Edge Matching
Map Sheet A Map sheet B