0% found this document useful (0 votes)
78 views16 pages

Basic Data Models

The document discusses the complexity of data models in Geographic Information Systems (GIS), emphasizing their role in representing the real world through various abstractions. It explains the distinction between spatial and attribute data, detailing how data models can be either vector or raster-based, each with its own advantages and limitations. The text highlights the importance of selecting appropriate data models for effective analysis and decision-making in geographic contexts.

Uploaded by

manikulislam176
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views16 pages

Basic Data Models

The document discusses the complexity of data models in Geographic Information Systems (GIS), emphasizing their role in representing the real world through various abstractions. It explains the distinction between spatial and attribute data, detailing how data models can be either vector or raster-based, each with its own advantages and limitations. The text highlights the importance of selecting appropriate data models for effective analysis and decision-making in geographic contexts.

Uploaded by

manikulislam176
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Basic DataModels / 69

minerals for the farmer, but its strength is important for a civil engineer,
hydrologist is concerned about the rate and amount of water percolation and a
etc. This
complexity and broad spectrum of interpretation suggests that the GIS system
will have an imprint of its creator/ developer. It represents the
user of the real world, andtermed as the conceptual data model. perception
This
of the
perception
is based on the level of information and understanding of the user. This
implies
that the conceptual model is database independent. A major benefit of data
3 model is that it provides theoretical foundation for the development of physical
Basic Data Models
models. From the theory emerges the power of analysis, the ability to extract
inferences and to create deductions that emerge from the raw data. Different
models provide different conceptualizations of the world; they have different
outlooks and different perspectives.
3.1 The Real World (In the most general sense, a map is a data model as it is set of cartographic
complex and
which we live in, is so iregular, features that have some physical or cultural manifestation on the surface or
The real world, the world in of it is a Herculean task.
dynamic that the accurate and
precise representation subsurface. Pequet (1984) defined a data model as a general description of
techniques has been able to represent
a simplified
GIS through mapping tools andheterogeneity of the information that can be stored specific groups of entities and the relation ship between these groups of entities"
view of the world. Despite the representing spatial But the real world is transformed into a datamodel at a fairly high level of
few common methods of abstraction using various generalization operators as selection and simplification.
in a GIS, there are only a developing a GIS application, real world
information in a GIS database.
In Data are observation of the real world and converted into information. Since
that can be stored
features need to be translated
into simplified representations thinking about the Geographical information system is_ a tool for creating, açcessing, monitoring,
Data models provide a way of
and manipulated in a computer. phenomena that interest us. They can be thought displayand analyzing domain specific, geographic database for planning.
world, a way of organizing the grammar by which management and decision-making, it heavily relies on two and three-dimensional
collection of words along with a
of as an abstract language, a geographic data. The traditional method for presenting and analyzing of spatial
we describe the subject. data is map, which uses symbols to represent the real world. The geographic
A data model is a
or view of the real world. data, has physical dimensions and geographical location on the earth surface, Le.
Data model is a description the data is actually stored in the
conceptual idea, as opposed to the
way that mathematical formalism
Spatial and attribute. GIS integrates these two sets of data and allows deriving
structure. Adata model is new data for analysis and query building.
computer, which is the data
1988):
consisting of two parts (Ullman 3.2. Spatial Data
A notation for describing data,and Spatial data is that which has physical dimension and geographic location on
to manipulate that data.
o A set of operations used of carth and available in the form of maps. Spatial data stored in maps can be
define a data model as "a set
Tsichritzis and Lochovsky (1977) organization of the data in a
described by three concepts: entity, attribute and relationship. By definition an
of the logical entity is a distinct spatial object. Attribute is a description of some aspect of
guidelines for the representation relationships
logical units of data and the entity, which may be qualitative or quantitative in natuYe and relationship, is the
database, consisting or having named which a set of precisely
formal system in spatial association among entities. For example, ariver is an entity, its width,
between them." A data model is a with a set of precisely
manipulated in accordance length and name are the attributes, and its connection with the other water body
defined objects can be
predefined rules, without any regard
for the 'meaning' or real-world is relation. That is, the relation represent the link between two different entities.
interpretation of those objects or rules. Reality
is an informal system, a system of Amap depicts a variety of information e.g. position, spatial relation, type of
and a system with an infinite amount of information. The feature and measurable quantities. All the geographical data must be represented
immense complexity model is to find a formal
comprehensive and useful data in a simplified form so as they can be stored in the computer. Hence they are
difficulty in defining a behavior of the real world as
informal represented by three geometric entities such as point, line and area or polygon
system whose behavior represents the
and a label,
closely as possible.
individual's perception and Apoint is specified by location x, y in the Cartesian coordinate system, a
The representation of realities is reflection of
and area of application. Meaning thereby the line by a sequence of connected points known as vertices and the polygon, a
understanding of the phenomenon storehouse of
data model is tailored to a specific problem. For, soil may be
InformationSystems Basic Data Models /71
70/Introduction to Geographical
starting and ending points. and displayed (Burrough, 1986). Computers require descriptive information
closed area specified by a
poly line having the same entity, to a large instructions to turn the spatial and
feature used to model a geographic entities into graphical form. The construction of
The selection of geometric example, Delhi may be spatial models passes through numerous stage of data abstraction.
depends on scale and size of the map. For (large scale) should terminology, the way of representing data is known as data models. InRealdatabase
extent Delhi
scale map but a map of features exist in two forms: objects and phenomenon. Objects are discreteworld
represented as point on a small number of points (landmarks) lines
and
which may have definite such as buildings; parks and cities etc. phenomena are distributed over
represent it as a polygon industrial area etc.. Generally a
railways) and polygons as built-up area, large area such as temperature, soil, etc. These two forms lead to two distinct
(roads &
identified i.e. adjacent polygons and island polygons.
two types of polygons are represent the common boundaries of two, approaches._These are object oriented based and field based models (Goodchild
those, which 1992, Wang and Howarth 1994). The object based spatial database (those
The adjacent polygons are boundaries, properties etc.
boundaries, administrative obtained by field surveying, remote sensing image analysis, photo interpretation,
or more polygons e.g. plot number of situations, for example, pasture lands
The island polygons occur in a and digitization etc.) are generally represented in the form of coordinate lines and
field. termed as Vector data models. When the spatial database is structured on the
within the forests, woodland in the
field-based model the basic spatial units are different forms of tessellation
Geographical Dato (regular as DEM or irregular as TIN) are termed as Raster data model!.
Jdentification of features from the real world followed by the selection of
Attribute data appropriate spatial data model, Raster or Vector and their structure are the key to
Geometric data
the spatial model building process. All of the major GIS and CAD software
Oualitative data packages are either vector or raster based with some extended functions to
Geometry support other data structures.
Quantitative data

3.4.1. Raster data model


Point Line Area
Ratio Raster data fnodel is one of the variants of the field-based model of
Ordinal
Interval geographic data representation. These are defined by a set of uniform, adjacent
pixels covering the whole database extent. It is also referred to as alattice or
Figure 3.1 tessellation model. A tessellation is an infinitely repeatable pattern over space
(Coxeter, 1961). Each pixel has a value or values indicating the characteristics of
3.3. Attribute Data
spatial objects. They the phenomenon it represents.
information about specified
Attributes are descriptive spatial location but can be linked to Raster data are tessellations (a tessellation is a space filling mesh either with
information about the
often have no direct
Therefore it is often to call
attributes "non-spatial" explicit boundaries as amesh of polygon or with an implicit mesh as defined by a
spatialobjects they describe. In the words of Heywood "'spatial matrix of values in the logical model) and perform a discretization of the
(Modarres, 1998). means geometric area of interest. A tessellation may be either regular (mesh elements
or "aspatial" information what the things are?"" It
are and attribute data are are allof the same sizeand shape) or inegular. Elements of aregular mesh could
data are where the things Since all the information
cannot
quality to the spatial data. attach the be squares (raster), rectangles or hexagons,
attribute data attaches becomes necessary to
geometry alone, therefore it the spatial data. But
be represented by describes some aspects of It means the reality is generalized, as the surface may be triangular or
number etc. It
quality e.g. name,
domain specific. That is
the spatial data for different hexagonal but represented by square shape of the pixel (picture element), the
problem is that it is attributes keep on smallest non-divisible element of an image that can be processed and displayed
the whereas the
remains unchanged,
analysis of the region a waste disposal site for Delhi, individually. Cells are interconnected to create planar surfaces representing all
For example, siting
changing as per requirement. transport network, distance the space of a single area of study. The matrix of cells, organized into rows and
(1)terrain, population,
the required attribute data are waste and quantity of waste etc. but for siting columns is called a grid. In raster data model the focus is more in location. It is
centers, nture of more like a photograph rather than a map. Geographic features are represented in
for collection demographic structure,
will be as population,
health, centers the attributes communication system etc. grid cells or pixels filled with values. The linear dimension of
each cell defines
smallest
prevalent disease, sanitation and the spatial resolution of the data, which is detemined by the size of the
minimum
3.4. Vector and Raster Data object. The size is also known as minimum mapping unit (MMU). The
value assigned
at recognizing shapes and forms,
but the mapping unit on 1:50000 scale is 3mm*3mm or 2.5 hectares. The
The human eye is highly efficient be handled
exactly how spatial patterns should
computers needs to be instructed
72/ Introduction to Geographical Information Systems Basic Data Models / 73
values m a kigher the spatial accuracy and the
to cach cell is known as the gray value. Typically there are 256 gray represent larger the
Topology relations in raster data are not very file needed to store the data.
column number
raster image ranging between 0 and 255, A row and a
have several columns are known, then only the location strong, if th¹ values of
rows and
each grid cell. A single cell is assigned only one value. Objects that attribute. It calculatcd. In other words, its
of the
ncighboring cell may
attributes are represented by a number of raster layers meant for each relation to origin of the matrix definesbe
iTopology' of the pixels, e.g.: top leftcormer ofthethe image.
generally happens that a single cellcover part of two or more objects or values.
In such case, normally the value assigned is that of the object taking the larger Raster data are not easily amenable to
part of the cellor of the object at the middle of the cell or that of an average for enatial features such as points, lines or polygon.association of attribute data with
the whole ofthe cells. tity in the raster approach is grid cell This is due to the fact the basic
biects in their own merit. The entities, and the entities are not recognized as
Roaster Data Structure which
are smaller than the areal extent of
The pixel (resolution), can be only
positional error. approximated in their location causing
The belonging of a pixel to a particular
Grid Hierarchal Iregular class depends on classification
methods used. Virtually all data has to be classified to be useful for further
analysis. Some programs are able to automatically classify pixels,
expert knowledge and experience is very important in raster however, the
Variable spatial
Techniques. Raster GIS is "best for analysis-oriented processing" classification
Fixed spatial Resolution
Resolution In particular, it is useful when the phenomenon being (Berry, 1987).
studied is best represented
by a continuous surface (Chrisman, 1997, DeMers, 1997).
Square triangular hexagonal quadtree thiessen As such, it is suitable for modelling phenomena in which
location and its
surroundings are more important than the object's geometric makeup. Individual
Figure 3.2 cells areconsidered for creating the image of the points, lines and area etc. The
Figure 3.4 shows how the different features will appear in raster format.

Square Triangular Hexagonal Point Line


rosette Polygon
Figure 3.4: Raster representation

Figure 3.3 Raster data are available from a number of sources, but the most important
When a picture is stored in a computer the raster data model is used.
source is satellite. The satellite data is generally stored in three different formats
The function of each grid cell is determined by the information it contains, and viz. .bil,_bip, and bsq. These formats are discussed in the chapter of data sources.
the characteristics of each grid cell can be defined. In the raster data model, the When we scan a vector map, it is also being rasterized.However in doing so
there is some loss of information.
accuracy of the map depends on the scale of the map and the resolution. The area
covered by a cell in agrid is temed as resolution. The strength of the raster data model its simpliity. Many operations on
geographic data are easier to implement and execute faster in a raster GIS.
The raster model provides simplest way of storing spatial data. The smaller Modeling of continuous data, as is often done with elevation or bydrological
the area of land that each cell represents, the higher the resolution of the data, and
data, is usually performed with a raster GIS, One disadvantage is that there is a
trade-off between the size of the resulting raster data sets and the precision with
74/ Introduction to Geographical lnformation Systems Basic Data Models l 75
which spatial features can be represented. A very fine raster grid will represent amined in turn, and pixels
having the same
all curves in a boundary with sufficient detail, but will require enough of disk are grouped together, It uses a lD method ofvaluc, that is, homogeneous pixels
identical values. Sequences of pixels with similar grouping pixels with similar or
space.The storage of such avoluminous data into the computer was achallenging Remory by the positional reference values
task, which resulted in the development of a number of techniques for the number representing the number of pixels to the first pixel in the are replaced in the
compaction of raster data. in the grouping. grouping and by the
Data compaction techniques For a clarity we have
beence of a feature. As evident selected land 0 as numbers
The techniques considered here for the purpose are, chain coding, run from the figure 3.6 the indicating presence or
feature in question is
length coding, block coding and quad trees. ahsent in the first row, thus it is represented as
be first row. The procedure is repeated 0 (absent), 10(number of cells)
Chain coding till all the cells of the entity are taken in into
account.

The chain coding, alsosometimes called Freeman chain coding is a compact


method of storing raster data and useful for detectíon of sharp turns and area 10,10.1
estimation but introduces redundancy because of repetition of intervening 0.10
0,2
boundaries. The technique is simple and straightforward. In a chain-code |.6 0.1
I,5
representation of a map, using any starting point on the border of an object, the 0.4 0. t
0.2
0.5
sequence of cardinal directions of the cells that make up the boundary of the 0.5 1.3 a2
0.5
object are recorded systematically in a clockwise direction. The polygon is 0,5
1,2 a3
defined in terms of unit cells measured in cardinal directions. The points are 0.10
0.10
identified by a number ranging from 0to 7. For example, East may be identified 0,10
as 0, North asl, West as 2and South as 3. You may select your own numbering Fig. 3.6:
system. Once a move in the direction of the line is made and recorded, the 1 2 3 4 5 6 7 8
locational grid is re-centered over a new location and the next move defined in Run-length codes store only
the same way. Consider the following figure and Tind out how many cells are the beginning and end cells
in
each row of the shaded Tigure
required for storing the given entity. C
Row c=3.4
12134 ls 6|7|8|9 d= 2,5
Row e = 1,6
Row f =2,2 4,5
Row g= 5,5

. run-length coding complete

9 Figure 3.6(a) : Run Length Coding


10
Source : Heywood (2002)
Entity Model Cell Values
In run-length encoded data, pairs replace the original data. A limitation of
Figure 3.5 this method is that pixel groups are identified only in one direction (parallel to
The resultant file structure will be as, N2, Es, SI, E1, S2, WI, S2, WI, S1, the x-axis) as a result the nearby rows, that is, those above and below the current
row which may have the same values of the group are represented separately. ts
2N, 3W, IN, IW. This way the total number of cells required for storing the usefulness in practice is rather limited because it does not lead to a good
given entity is 24.
compression ratio.
Run-length codes Block codes
(It is_ an improvement over the conventional chain code technique and codes. It assumes
Block codes are a two-dimensional extension to run length
suitable to the personal computers with limited storage capacity. It stores a single recorded in the form of blocks of squares. The method uses
that the cells are It is obtained
value for a group of cells instead of storing the value for each individual cell. (Buroughs 1986).
square blocks to tile the area to be mapped obtaining as many large
This method exploits the fact that many datasets have large homogeneous through the Medial axis Transformation (MAT) forin the hierarchical form. A
regions. In this procedure, adjacent cells along a row that have the same value are possible. The blocks are arranged
square blocks as
treated as a group and termed a run. Each row in the grid (one pixel width) is
76/Introduction to Geographical Information Systems Basic Data Models I 17
represents 2x2cells, a 9
unit square represents one cell, whereas a 4 square block with the
coded only
square block represents 3x3 cells and so on. Each block is the block.
location of a cell (the lower left of the block) and the side length of
may be fitted into a region, the more efficient block
The larger the square that 1
union and
coding becomes. This method has largely been applied in performing
intersectionof regions and for detecting properties such as elongation.
2 3 4 | S | G 7 8 9||0
1= Pixel = Divide in to four
parts

Figure 3.8 Figure 3.9:This what the tree is build


Start with the large squares, end with the smallest squares/pixels.
Entity Model Cell values Start from
ton left to top right to bottom left to bottom right,
Block No. cell
slze Coordinates

1
4,2 8,2 4,2
6,6 6,7 7,7 3
File 8tructure
4 8,3 7,s

5,2

Figure 3.7: Block Code 1

Source :Heywood (2003) A4A

Quadtree 3

The quadtree is a variable spatial resolution model sometimes equated with IA A

hierarchical tessellation model. Quadtree is a simple and losless compression 02 02

method. It works best on bitmaps with no noise; black and white images with 03

solid shapes, but it can do graphic colour images also. A quadtree is a term used
-A AA

to describe a hierarchical or tree-based data structure whose common property is


that the structure is based on the principle of the recursive decomposition of
space to represent both vector and raster data. Quad trees are differentiated
by:
The type of data it may be used to represent Figure 3.10: Fill the colour information

The principles governing the decomposition process 2 3


The type of resolution - whether variable or fixed 03|2
p3zp33 ,o313
The quadtree structure can be used for point data, for regions, surfaces and 02
volumes. Typically a region is always split into four other regions and hence the Jo310
name quad. It divides a geographical area into square cells of size varying from 00 01
large to that of the smallest cell.
The idea is to grab a large square block of pixels, for example 16x16 pixels. Figure 3.1I
the quad
If all are white: assign "1" and if not; assign "0" and split the image in four parts The lowest limit of the division
is the single cell. The structure of
attributes
of 8x8 pixels, If an 8x8 part is white: assign a "1"if not; assign a "0" and split the tree resembles to an inverted
tree, whose leaves are the pointers to the
part in four parts of 4x4 pixels, and so on. pointers to smaller squares.
of homogeneous squares and whose branches are each cell can be
the raster data model is that
One of the advantages of shape and orientation (Pequot, 1990).
subdivided into smaller cells of the
same
78/ Introduction to Geographical lnfornation Systens Basic Data Models /
79
with each variable
two-dimensional referenced
way,
the samne directly. The rrajor
MAP is its
resembliny advantaze
IMGRID array of numbers of
The binary character of the a
structure.
and eliminates information
the need for complex
in each coverage nap-lke
long
computations
simplifie
map legends. Since
o3 coverage feature is uniquely identified, there is each
no limitation of
attribute value to a single grid cell. On the other side,
the main
assiyning asingle
O31
ö32 information storage in an IMGRID structure is the problem
A 033
A
stored. Each grid cell will contain more than l or 0
excessive volume related to
of data
coverage and a large number of coverages are
values from more than
0311
B 0313 needed to store different types oneof
information.

Figure 3.12: Quadtrce


Map Analysis Package (MAP)
The quadtree model addresses both the resolution and the redundancy issues This type of data structure integrates the two
directly. The quadtree seems to be a more compact version for data storage and structure discussed previously.
has the advantage of using a variable sized grid cell. Thus, instead of dividing up In thhiis raster structure, each thematic coverage is recorded and accessed
by map name or title. This is accomplished
separately by
manping unit, of the cOverage's theme as a separate recording each variable,
an area into cells of one size, finer and finer subdivisions are used only in those
regions where finer details have been recorded. Operations such as point-in number code or label.
which can be accessed individually when the Coverage is retrieved. The label
polygon searches can be performed on quadtree-structured data. However, oorresponds to a portion ot the legend and has its own symbol assigned to it. This
Burroughs (1984) is of the view that the largest problems associated with structure facilitates the performance of operations on
quadtrees is that "tree translation is not translation-invariant two regions of the individual grid cells and
nc of similar cells,and the resulting changes in value
same shape and size may have quite diferent quadtree, so consequently shape require rewriting only a
analysis and pattern recognition are not straight forward". Amajor disadvantage single number per mapping unit, simplifying the computations. The MAP data
is that it is time consuming to create a quadtree data structure. This implies that structure allows the manipulation of intormation in a many-to-one relationship of
the attribute values and the sets of grids.
changes to the original data willrequire are-building of the entire quadtree.
Raster GIS models V Raster Applications
GRID model There are many practical applications of raster within and outside GIS. The
importantapplications of raster data are as:
In GRID model each cell of a layer is referenced and addressed individually
and associated with identically positioned grid cell in all the subscquent layers. A computer display is a raster;
For example, the grid cell in the first layer has been identified as x5, y 10. It will Digital cameras use raster;
be compared and associated with the same grid cell location in the other layers. Images on the Web are raster.
This makes the comparison of multiple layers comparatively casy but at the same Scanned images are raster
time problematic for group comparison and requires more storage space in the Fax is a raster
computer. It means comparisons between coverages are performed on a single Certain kinds of data always come in raster form :
column at a time. The advantage of this data model is that it facilitates the Digital Elevation Models
multiple coverage analysis for single cells. However, this limits the examination Remote Sensing Images
of spatial relationships between entire groups or themes in different coverages.
This model was popular in the early GIS. A word of caution!

IMGRIDmodel l) Never use a raster to represent a sewer network, if the application requires
accurate connectivity. Coding 1 in cells where a sewer is present, 0
IMGRID model is a slight improvement over the grid model. To represent a elsewhere. If two adjacent cells both have 1, that's no guarantee the sewers
thematic map of land use that contains four categories: recreation, agriculture, they contain are connected.
industry and residence, cach of these features has to be separated out as an definition the
individual layer. In the layer that represents agriculture I or 0 will represent the (2) Never use a raster to represent land ownership parcels. By and
presence or absence of crops respectively. The rest of layer will be represented in boundary between two survey points is a mathematically straight line,
of a raster representation would be unacceptable.
the jagged appearance
80/ Introduction to Geographical Information Systems Basic Data Models 181
straightforward but when complex
3.5. Vector Data Model entities
stored, the adjoining boundaries will be such as more than two polygons are
of GIS that use adjacent line and will create matching entered twice. It will cause dupiication of
A vector GIS is simply a generic name to describe a class problem.
Beside, it wil
objects. Avector is space inthe computer. To overcome this
the vector data structure to describe, represent and use spatial models are considered here and the related problems,Occupypopalar
more
that requires both magnitude and direction for its description. two
a physical quantity
segment. Spaghetti model
This means, a vector can be represented graphically by a directed line
the same or
An important property of vectors is that when they are oriented in plus
Vector data that have been collected but
opposite directions, they can be added or subtracted, just like numbers with is also called as not structured are said to be spagheti
data model. It
and minus signs. model. As the name suggests, it isnon-topological or geometric or path topological
a direct line-by-line translation of vector map.
(The vector data model is an object based approach to the representation of Fach entity i.e. point, line or polygon becomes alogical record in the
the real world features and is best used to represent discrete objects. That is, it amd is defined as string of x, y coordinates. digital ile
It means what we get is something that
represents anything from a simple number to a complex entity. It is the familiar looks like a map or other drawing when we display it as a graphic,
coordinate geometry representation of specifying a point, line, or polygon by underlying structure. In other words a polygon is drawn by just makingbut it has no
lines. 1here
two-dimensional Cartesian coordinates because of the assumption that the real is no correspondence between these strings and the boundary of individual
U world can be divided in clearly defined elements. Dolygons. lt means it stores graphical elements not the graphical entities.
By using one or more pairs of coordinates, spatial objects are identified and The data storage and access is sequential, thus the search time is often long.
forms of basic gaphical elements. spatial
Such
represented by one of the three Besides, all the polygons are stored as independent polygons; as a result the
objects have locations depicted simply as points, lines or areas. In addition, such intervening boundaries are digitized twice thus occupying more spacen the
spatiat objects may have attributes. Point entities are represented by a single (x,
y)
computer.
Line and polygon are represented by a series of connecting points.
coordinate pair.
x7. y7
x8. y8
X8. y
x5, y5 x9. y9

x10 y10
11. y11

*3. y3
N 12. y 2
x2, y2 73. y13

Figure 3.13:The Vector View 15 y15 x14. y14

a poly line should


A straight line has only two pairs of x, y coordinates but ote00rit x1, y1

have more than two pairs. In case of area it is a combination of line segments
are units that Figure 3.14 : Path Topological/Spaghetti Model
having common starting and ending points. The vector entities
points for the
carry the information. The selection of appropriate number of will cause Table 3.1
construction of an entity is an important step, as too many points
Coordinate List
repetition and too few will lead to over generalization of the feature. Paper maps Polygon
are the typical example of vector model. The vector data structure produces all its
smaller file size than raster images because a raster image needs space for P (x5, y5), (x6, y6), (x7, y7), (x8, y8), (x9, y9),
stored in vector representation. When
pixels while only point coordinates are (X10, y10), (x11, y11), (x17, y17), (x5, y5),
data
geometric shapes need to be represented precisely in a GIS or CAD, vector (x3, y3), (xl6, yl6), (x17, y17), (x11, yl1), (x12, y12),
structure is always the option to use it because it is not limited to spatial
P2
y2),
resolution or pixel size and mathematical formulae can be used for regular shapes (X13, y13), (x14, y14), (x15, yl5), (x1, yl), (x2,
and smooth curves. o (X3, y3),
Since most of the thematic mapping for Geographic Information System are P3 (K3, y3), (x4, y4), (x5, y5), (x17, y17), (x16, y16),
based on polygons, their storage and manipulation has received considerable (X3, y3),
attention. The characteristic of a polygon is that it must have a unique shape,
area and perimeter. The representation of a simple polygon with few points is
Information Systems
82/ Introduction to Geographical Basic Data Models / 83
other
the features are not retained. In
The spatial relationships between segments with no real connection. All Table 3.2 : Point dictionary representation
words it is a collection of points and line
is in place, but is randomly organized Polygon point list
the information necessary to draw a mapthe data easy, but makes it very difficult
("unlinked"). This makes organization of P1 p5, pó, p7, p8, p9, pl0, pll, pl7, p5
to use for analysis. P2 p3, pl6, pl7, pl 1, pl2, pl3, pl4, pl5, pl, p2. p3
in the one-to-one translation of the
The Spaghetti model is essentiallybased P3 p3, p4, p5, pl7, pl6, p3
map. The map is maintained as the
graphic image we nomally encounter on a Point Coordinates
conceptual model. For adjacent polygons, separate pieces
of spaghetti are placed
the same string of spaghetti. Each side P1 (x1, yl)
for adjacent sides. No one polygon shares coordinate pairs. Although
of the polygon is defined by its own set of lines and P2 (x2, y2)
It means
polygon sides are recorded separately, they have the same coordinates. P3 (x3, y3)
there is considerable redundancy with this data model. P4 (x4, y4)
relationships between
Because of its one-for-one translation structure, the
objects are implied rather than explicit. The result of this lack of
and among measurements and
topology is enormous computational overhead, making
of the spaghetti model is that it is
analysis difficult. The positive aspect P17 (x17, yl7)
relatively efficient as a method of cartographic display and used in Computer
Assisted Cartography (CAD) where analysis is not the primary purpose To over come the adjancy problem, the chain dictionary model is applied.
In this model cach polygon is given a circular list of chains and each chain in turm
A marginal improvement over the path topological models (spaghetti
between is a list of points.
model) is the point dictionary model. In this model the transformation
points and numerical coordinates is incorporated. All coordinate pairs are
numbered sequentially and stored in random access allocation. Each polygon is
the
stored as a circular list of the point ID that act as the address for accessing
appropriate coordinate. Thus the storage requirements are slightly reduced
as x, y C2
twice for
pairs are stored only once. However each point ID is still stored
common boundaries. Though, this model saved space in the system but the
adjacency problem remained. Thus the point ID values are first retrieved from C1
N2

the polygon list, which in tum are used to retrieve the respective coordinates. C5
N4
N3
C4

P7 C6
PF N

P6
C3
P1
P4
P17 P10
P11
P3
P3 P2
Figure 3.16: Chain Dictionary Model
P16

P2 P13
P12
This model has reduced the space requirement and the access time. The
P14 polygon retrieval is now a three-step procedure. Retrieval of polygon ID, point
P15
ID and corresponding coordinates.
P1
Topology model
Figure 3.1S : Point Dictionary model. The spaghetti looks fine, for just to see a picture, but it doesn't really satisfy
the needs of a GIS because the spaghetti model does not retain any explicit
relation. For example, in the spaghetti model, finding a possible route from Ato
Systems
Geographical Information
847 Introduction to It is so
because the
system Basic Data Models /85
consuming process.
B, will be cumbersome and time
co-ordinate pair in the database and sets of rules. Topology rules are embedded in spatially
and
everv single line
ability to store these topological rules and relationships intelligent
mors software. The
problem becomes
willneed to look at the line. The
it to every line segment in crosses ours. In such a case We have essential functions of a GIS product. Topology cfficiently is one of the
compare another line sea is defined as the mathematics of
complicated if at any moment,that line against all the segments of our line, to connectivity or adjacency of points or lines that detemines spatial
to test every single
segment of
meeting, we can tag that line and in a GIS. The topological data structure logically determines relationships
or exactly how and
If there is a crossing for the
if they cross or meet. complete search and test where points and lines connect on a map by means of nodes (topological
such query requires a is horriblv
Keep going, but every fairly large, this junctions). The order of connectivity defines the shape of an arc
GIS databases are or polygon. The
entire database. As most system, such as Maplnfo,
and even Computer stores this information in various tables of the database
storing information in a logical and ordered relationship missing structure.
consider it in asmall By
Inefficient. We can only simpler. For big GIS, we use
few hints to make things infomation,
there we give the system a e.g., a line segment of a polygon is readily apparent. A GIS manipulates,
a thing called Topology. analyzes, and uses topological data in determining data relationships.
Geometric and topologic characteristics
W In GIS, topology is implemented through data structure. Arc Info coverage
analyze large quantities of
GIS is to manipulate and
The primary role of understanding To date
is a familiar topological data structure. A coverage explicitly stores topological
of the nature of the spatial data. relationships among neighboring polygons in the Arc Attribute Table (AAT) by
spatial data. It requires an structured spatial data. It is
topologically storing the adjacent polygon IDs in the LPoly and RPoly fields. Adjacent lines
the accepted theoretical solution is reflects the geography of the real world are connected through nodes, and this information is stored in the arc-node table.
believed that topologic data model best foundation for encoding spatial relation The Arclnfo commands, CLEAN and BUILD, enforce planar topology on data
and provides an effective mathematical data.
manipulating and analyzing vector based and update topology tables.
ship, providing a data model for making of spatial relationships. In fact,
Topology is the term used to describe the GIS by explicitly building
The five axioms in topology are as follows:
'intelligence' to the database in a All arcs end in points or nodes.
topology adds
building (relationship), points, lines and
relationships, In the process of topology connection Arcs cannot intersect except at their nodes
Topological data define the logical
areas are calculated and encoded. geographical description and analysis.
Areas are completely enclosed by arcs.
between points, lines and areas for Areas do not overlap
Connections between spatial objects, for
example, information on areas, which Every lOcation is within some area.
topological data. From this adjacent
bound a line segment, are considered to be
objects may also be identified. The topology of any line would thus In other words, topology defines connections between features, identifies
spatial adjacent polygons and can define one feature, such as an area as a set of lines.
include the starting node, its destination node and the left and right polygons
through which the line passes.
The rule of thumb for topological data structures is that anything of interest on a
map must be explicitly defined as a point, line or area in order for systems to
theory, which deals
Topology is the branch of mathematics, based on graph transformation
perform any sort of spatial analysis on the data.
and remains unchanged under certain
with geometric properties this model is that it explicitly The geometric characteristics of a spatial entity may be described in terms
such as bending or stretching. The characteristic of
points, lines and of its two-dimensional shape, its distance between like or different entities, how
records adjacency information among spatial entities viz. it isconnected to other entities and what entities occupy the space adjacent to it.
polygons. Mathematical topology_assumes that geographic features occur on a Such characteristics may be casily described either in words or by a system of
two-dimensional plane. Through planar enforcement, spatial features can be coordinate grids on a map. Spatial entities can be described in terms of their
represented through nodes (0-dimensional cells); edges, sometimes called arcs dimensional characteristics. If these entities were to be changed by some kind of
(one-dimensionalcells); or polygons (two-dimensional cells). Because features a transformation, all the characteristics of these spatial entities would likewise
can exist only on a plane, lines that cross are broken into separate lines that change. Shape can be made smaller, lines made longer and so on. In other words,
terminate at nodes representing intersections rather than simple vertices. GIS there is a fundamental change to the entity itself. If a map is stretched and
with the aid of topology not only have the power to record location and simple distorted, some of its properties may change, for example, istance, direction
attribute information but also can examine spatial relationships based upon loca (angles) and relative location of objects. However, other properties such as 'next
tion, as well as functional and logical relationships among geographic features. to', 'is contained in' and "crosses' remain unchanged. Therefore, a strict
Technically, and in GIS terms, topology defines the spatial relationships topological property is one that remains unchanged by geometric distortions or
between features. It is the" glue" that holds everything together and is based on transformations of the surface. In topology the neighborhood function and adja
InformationSystems
Basic Data Models / 87
Geographical
86/ Introduction to
Associated tables
Point-peint Potnt-Arm
topological
map
Layers of
Acartographic map
Woods Pt IDX1

Hwy 101 1 h5000, 13450


Well
w y8 8
Hwy &

is withim on line in area on area.


nearest to nearest to

Point layer

5
Arc ID from
To line-line Lin-Are Aree-Area
1
e
b e
2
6 7

cross
Line layer
Poly ID Arc intersect flow into intersect border overlap inside adjacent
to
A 1,2,3
B 2
C 3.4
Figure 3.17: Topological relationships between Spatial Objects
ArcIDfrom To

1 The firstattempt to build atopological relationship into ageographical data


2
3 a
structure was made by mathematician James Corbett. It was modified by George
4 a Farnsworth as Dual Independent Mapping Encoding (DIME) and used by the
U.S. Bureau of the census for storing digitally the linear block faces of the
Arc IDLeft Right Standard Metropolitan Statistical Areas of United states. The DIME file was an
outgrowth of the ACG (Address Coding Guide). A good example of use of
Polygon layer
Poly Poly
1 A World
A
topology is the TIGER (Topologically Integrated Geographic Encoding and
2
3 A Referencing) used by the U.S. Bureau of Census. Commercial GIS vendors such
World C
as ESRI and Intergraph have proprietary topological data structures. But in the
recent years non-topological data formats have emerged as the standard. The
important packages of the category are Arc View, Map Info and GeoMedia.
map
(The graphical data in a topology can be derived from existing maps, scanned
Figure 3.16:Concept oftopological map, or digitized. Raw digitized data always contain some errors and omissions
Source: Lo, C.P. and Yeung (2003 ) that are not evident until logical connections are established. Some errors might
performing
features. They are important for occur as a result of computational miscalculations and rounding eror in the GIS
cency properties are important the feature both in
spatial analyses because we need to
know the position of software, but most database errors result from improper input. Even with the
neighboring features. Many of the methods most meticulous map preparation procedures, the finest equipment, and the best
absolute space and with respect to its
and geometric relationships work better if we know trained technicians, mistakes will happen. A GIS database is subject to errors
of solving mathematical systems store boundaries as several
which areas share common boundaries. Some involving entity, attribute and entity-attribute agreement. Entity error (positional
pointers), which indicate error) is present primarily in vector systems. It is generally found in three
individual line segments and include arc attributes (or
which polygon falls on each side of the line segment. By storing common different forms: missing entities , incorectly placed entiies, and disorderd
boundaries instead of complete polygon boundaries, duplication in
digitizing is entities. Attribute error and entity-attribute error forms second group that haye no
avoided, as is the problem where two versions of each common boundary do not influence on topology creation.
coincide.
Basic Data Models / 89
InformationSystems
perfom an
Geographical
to
88/ Introduction systems require to
GIS In order to handle the topological characteristics of spatial data. several
of the
digitization,
part of the digitizing procedure forms of vector data structures are applied. Some of the important topological
Upon completiontopology, unless this was explicit information about tha structures are diScussed here.
operation that builds by providing the types of
cáse, the topology,database, should permit to identify Whole Polygon Structure
itself. In either entities in the errors will be pointed out
relationships of the Usually, some of
these
text-based eror In this structure each layer in the database is divided into a set of polygons
coverages. through
entity eror in GIS
graphics displayed
on the screen or and recorded as a serics of point locations that define the boundaries of each
inspecting the closed area in a specified coordinate system. Since each polygon is stored as a
by
flags. separate entity, and therefore as such there is no topology. or relationship
Why Topology Matters structure has the
advantage hetween the different spatial objects. Atributes information related to the
topological data Dolygons are storcd with the coordinate list. There is some redundancy at the
(1989), the coordinate data
According to Amoff using the
proceed without edges of the polygon where the boundaries or vertices are recorded twice:
spatial analysis to
performed
of allowing relationships analyses may be
such as connectivity spatial relationships from the similarly points shared by two or more polygons are also encoded, as many times
Topological derive the
quickly since there
is no need to topological data
structure conmes as there are polygons.
However, the use of the existing map altered
is entered or an
geographic coordinates.
time a new map relationship rebuilding and updatino
at a cost. Every procedure of
must be re-built. The is performed - whether in 'real.
or changed, topologytime-consuming however
it
include topology.
data structure does not more powerful
3
can be relatively Where the
or, in 'batch mode'. including
time', are required, 2
programs and algorithms vector
more complex analyses. Because of
this reason nearly all
perform these data model.
computers to
topological datamodel as the principal
based GISs now use the has a number of
advantages.
storing topological relationships speed and the 1 4 6
Creating and processing
storing the data efficiently and enhances the digital maps. It can Figure 3. 18: Whole Polygon Structure
helps in useful for detection of
errors on
volume of data. Besides it is polygons that are not closed and other Source: Star & Estes (1990)
connect properly,
detect lines that do not corrected automatically by
setting DIME (Dual Independent Map Encoding)
digitization._ These errors may be process. model tha
DIME, developed by US Bureau of Census, was the first
errors of building
tolerances in the topology
user-defined data i.e. editing correcting the errors, the polygons. This model was used
generally a repetitive process. After incorporated the adjancy relations between
Topology building is the shared boundary Standard Metropolitan Statistical
topology must be rebuilt. It reduces data
redundancy, as
the updation of for digitally storing the linear block faces of the
the Moreover
between two polygons is
stored only once, not twice. GIS from
Areas of the United States.
which separates that is defined by two
lastly, it is topology, The basic element in a DIME file is a line segment
polygons has become easy. And assumption is that the line segment is straight and
end points. The implicit
CAD (Computer Aided Design). will be made up ofa series of line
with infomation that can help in uncrossed by any other line. So a complex line
Topological data models are provided GIS analytical techniques. has two end points recorded as the "from" and "to"
in advanced scgments. Also the segment
obtaining solutions to common
operations this direction the polygon lds for
adjacency infomation into the data structure, points giving directions to the segment. Given
This is done by explicitly recording left of the segment are recorded accordingly.
for multiple operations. Each line segment, the arca to the right and
eliminating the need to determine it when it Neighborhood relationships are not made
explicit. DIME structure is cumber
data structures, begins and ends
the basic logical entity in topological when there is a change in direction of some to use when there are areas made up of complex lines.
either contacts or intersects another line, or approximates how we as map of the following:
the line. The Topological data model more closely By design, DIME consists
an analog map document. segment, for example name of street
readers identify the spatial relationships contained inanalytical functions such as
I. Segment name to identify thecoordinates of the node (fron-node)
ldentifying topological relationships also means that 2. Beginning node number and
adjacent polygons
modeling traffic flows through a road network, combiningfeatures, coordinates of the node (to-node)
would be 3. Ending node number and
with similar characteristics, and overlaying geographic of segment (LPOLY)
4. Identifier for polygon on left side
accomplished much faster. polygon on right side of segment (RPOLY)
5. Identifier for
90/ Introduction to Geographical Information Systems Basic Data Models /91
POLYVRT (POLYgon conVERT), model used in
L7
Peucker and Chrisman in 1975. GIS was developed by
This
L6 L8 senarately. It is structured around a chain model stores cach geographical entity
structure. This model is synthesis of the
LE P1
chain/point dictionary model and the DIME model. In this model, the same
relations for each chain are given as for each DIME
L4 L9
end point of a chain is now recognized to be segment except that that the
10
a node
P3 L10
nodes are shared by both arcs and contiguous polygons. rather than a point. The
retrieval of selective and specific entity types and the This model is capable in
L17 L11
L16
P2 L12 directly saving time for searches. But the problem with polygons can be accessed
this
L1 L14
L13 difficult to detect an incorrect pointer for a given polygon untilmodel is that it is
the polygon has
L15 actually been retrieved.
TIGER (Topologically Integrated Geographic Encoding and
DIME Model Referencing) used by the US census in 1990 for evaluation of population change
Figure 3.19: Dime Model between the two censuses. In this model, the census blocks can be retrieved
Though this model solved the problerm of adjancy to a large extent, but the directly block number rather than depending on the adjancy information. Also
retrieval of polygon outline is still more complicated. The polygon retrieval is a TIGER files contain roads, rivers, railway lines, physical features and adminis
trative boundaries for the entire country.
three-stage problem. In stage one an extensive search is being made to retrieve
all the segments for which a given polygon lies either to the right or left. The DLG Structure (Digital Line Graph)
following table presents the list of segments belonging to the boundary of DLG is the product of US Geological Survey (USGS) and incorporates the
polygon P3. concept of whole polygon. It has arc-node and relational structure in which all the
Table 3.3 : Extraction of polygon outline from a DIME model data are topologically structured. The contents of DLG files are subdivided into
Stagel-retrieves allsegments belonging to P3 four thematic layers boundary information, hydrographic features, transport
Polygon network and the Public Land Survey System. Each file has a header record to
Line segment from point to point to left to right provide infomation on the date of creation of the file, map projection and
L3 p3 p4 P4 P3 coordinate system used, and the number of points, lines and areas stored in the file.
L4 p4 ps P4 P3 Topological Structure
L16 p3 pl6 P3 P2 The use of topological relationships to represent geographic information is
L17 pl6 pl7 P? P2 not new. Our cognition or perception of geographic space, usually referred to as
L18 pl7 PI P3 mental maps, is based on conceptual topological relationships rather than the
In the next stage, these segments are ordered into a circular list so that the actual coordinates. The topological model is widely used method of encoding
"to" endpoint of one segment is the "from" endpoint of the next segment. The spatial relationships. The objects in the database are structured hierarchically
end point of the last segment is the first node of the head segment in the list. In points, arcs, nodes and polygons. The system is hierarchical because all entities
this process the "from" and to" end point Ids are to be switched when necessary are built up from nodes or points as the elemental components. Topology for
so that the polygon always lies the right of each segment (table 3.4) points, each of these spatial elements, are stored in separate tables. The topological
lines and areas. structure permits encoding of the geometry of the data with no redundancy.
Table 3.4 The geometry of the data is encoded with little or no redundancy in arc-node
structure. The database can also include attribute data for each node, arc and
Polygon
Line segment from point to point to left to right polygon. These are expanded in the atribute table, which explicitly links it to the
L3
p3 p4 P4
geometry of the spatial object.
P3
L4 D4
ps P4 P3 The three basic topological relationships in ARCINFO are as;
L18 ps
L18
pl7 Pl P3 Connectivity (arcs connect to each other at nodes and provides information
pl7 pl6 P2
LI6
P3 about linkages among spatíal features.)
pl6 p3 P2 P3
94/Introduction to Geographical lnformation Systems Busic Data Models I 95
Let-Right Topology
2

3
Arc List Left Poygon Right Polygon Overshoot
Undershoot
A

A
D

A Figure 3.24: Measuring Error

Figurc 3.23:Left-right Topology Dangling node errors generally are identified by a graphic symbol different
from the one uscd for pseudo nodes. Corrections are quite simple. For
3.6. Errors in GIS indershoots, the node is identificd and is moved or "snapped" to the object to
There are six general types of errors: hich it should have been connected. Overshoot erors are corrected by
Extraentities have been digitized; identifying the intended line interscction point and "clipping" the line so that it
The entities are not in the right place and are not of the correct shape and connects where it is supposed. In the case of an open polygon, merely move one
SIze;
of the nodes to connect with theother. Another common error related to nodes is
called as Pseudo nodes. Nodes are not just points between line segments that
Entities that are supposed to be connected to each other are not properly show directionalchanges in the line; in addition, they have specific topological
connected;
meaning. Nodes may be used to identify the existence of an intersection between
All polygons have not single labcl point to identify them; two streets, or a connection between stream and alake, but they should not occur
All entities are not within the outside boundary identificd with registration at every line segment along aline or apolygon. Thus the first type of error that
marks. can be detected are false nodes, called as pseudo nodes, which occur where a line
connects with itself (island pseudo node) or where two lines intersect along a
At every stage of the input, storage, retrieval and manipulation, error is
parallel path rather than crossing. A pseudo node connecting a line with itself
being introduced. The objective of editing is to attempt to manage these errors (sometimes called
rather than climinate them. Here the focus is on errors arising from digitizing may simply be the beginning and ending of an island polygon
intervening node (also
errors a spatial pseudo node) or for two line segments with an
maps and their associated terminologies in context to Arc/Info. Common known as an attribute pseudo node).
of the digitizing process are associated with nodes, arcs and polygons. A polygon
by definition should have closed boundaries and the boundaries must be
continuous. Where this is the case, the source of the error may lie in cncoding Node
where an arc is independent of the polygon, or a digitizing error where the nodes
have not been connccted to form the polygon.
Dangling nodes:A common node error, called the dangling node, can be Pseudo Node Dangle
defined as a single node connected to asingle line entity. Dangling nodes
(usually called dangles) can result from threcpossible mistakes: failure to close a Figure 3.25
polygon, failure to connect the node to the objcct it was supposcd to be
the GIS for a
are intentionally input
connected to (called an undershoot), or going beyond the entity you were As with pseudo nodes,some dangles
nodes serve as indicators of something
Most often these
supposed to connect to (called an overshoot). particular purpose.
important at the end of aline or
arc.
96/ Introduction to Geographical Iyformation Systems
Basic Data Models / 97
Another type of common digitizing error occurs when the software uses a
vector data model that treats each polygon as a separate entity. In such cases the registration of each tile of the map, the map edges of
adjacent lines between polygons are required to digitize more than once. Filure match. Maps that are digitized at different times or adjacent sheets may not
at different coordinate
to place the digitizing puck at exactly the correct location for cach point along systems, also often appear distorted at their edges.
that line will often result in a series of tiny graphic polygons called sliver Before Edge Matching
polygons. After Edge Matching
Map Sheet A Map sheet B

Figure 3.26: Sliver Polygon


Figure 3.27: Edge Matching
Sliver polygons can also occur as a result of overlay operations or when
each of two adjacent maps is input through a separate projection. Problems Objects, such as roads, pipe lines and topographic features that cross from
associated with registration arise when map sheets overplayed on top of cach one map area to another can appear discontinuous or irregular. Edge matching
other have features do not match. The problem may lie in the digitizing process problems also occurs in raster systems,at least for those that operate on projected
or it may be caused by the original maps themselves where the map manuscripts surfaces, rather than simply in flat space. The problem of matching is solved by
were produced on non-stable material. Expansion, contraction, creases due to an editing procedure called edge matching and the digitized map in pieces
folding may produce difficulties with registration. In the digital map, however, presented as one continuous map sheet.
the non-registration of separate layers or coverages may be caused by errors in
the tic-points thus producing unacceptably large RMS (root mean square) errors. Rubber sheeting: Rubber sheeting is a non-uniform adjustment of adata set
A RMS is simply an indication of the calculated difference between the digitized based on the movement of known control points to new locations. For example.
location and the specified location on a map. The higher the value of the RMS data collected by aerial survey may be inaccurate because of flight alignment and
greater is the error. Typically an acceptable RMS error is of the order of 0.003 or camera inaccuracies. By comparing this data with accurate ground survey data,
the aerial data can be stretched or rubber sheeted over the accurate data using
less. Detection of sliver polygons is very difficult. The only way to find them, is control points common to both data sets.
search for suspect polygon boundaries, then zoom in to see the slivers.
Labelerrors : While polygons are digitized, a point inside each polygon acts as
a locator for a label displaying text information about the polygon. There is need
for only one label point per polygon. Two types of errors can occur relating to
label points in polygons: missing labels and too many labels. Both these errors
are most often caused by a failure to keep track of the digitizing process.
Weird polygons : Weird polygons are defined as polygons with missing nodes.
In this case the polygon is a graphic artifact that appears to be a true polygon but
is missing one or more nodes. Generally this occurs when two or more lines cross
over, producing the semblance of a polygon. The most frequent cause of this
eror is a point digitized in the wrong place or in the wrong order.
Rubber sheeting
Edge matching : Edge matching is the term given to the process that attempts to
create a seamless join across two or more maps. There may be instances where a G Fixed points
Original boundary
large map sheet may have to be partitioned or cut up' into individual Rubber sheeted
manageable-sized sheets in order to lay on a digitizing tablet. After digitizing all - Control points boundary
the individual sheets, these may be merged digitally to form the original large
map sheet. However, because of slight inaccuracies either in digitizing or Figure 3.28
98 / Introduction to Geographical Information Systemns Basic Data Models /99

Userubber sheeting only : VAdvantage of vector data


When absolutely necessary because it can severely affect the relative Vector storage takes less storage space and usually offers better storage
accuracy of the data; capabilities than raster formats. This is because vector features are defined and
stored only as nodes and vertices. Vector data structure seems to be the system of
" As a last resort after exhausting other methods of object editing and choice for many GIS users, particularly among those whose projects require
coordinate adjustment; control over data accuracy and outcome. Some users prefer the simpler and less
A typical example of application of rubber sheeting is stretching a new expensive raster format, especially for introductory training.
subdivision map into a preexisting parcel map. Raster versus Vector

/Possible sources of error Raster :


The primary focus of the raster data model is location.
Amajor commercial GIS should be able to provide these general topological
relationships that can be used to identify errors. A useful procedure for Raster is of limited use in describing non-continuous entities such as
comparing digitized entities and the original map document is to produce a
roads and boundaries (lines) or houses (points). Line- and point-data in
raster will always be less precise than in vector format.
monitor display. Many GIS systems provide a number of symbols that indicate
some erTOrs. The following reproduced table summarizes the sources of
error. Raster based GIS allows the satellite data to be readily incoporated. It
Table 3.5 : Common sources of error encountered in using a GIS is well suited to many spatial modeling operations such as optimum
corridor route selection, modeling surface storm runoff, and forest fire
Sources of Error spread
Stage
Data Errors in field data collection The raster data have reduced spatial accuracy. It means raster model
Collection Errors in existing maps used as source data represents a more generalized view. The raster model can also
represent gradual transition between features and surfaces, such as soil
Errors in the analysis of remotely sensed data classification and elevation
Data Input Inaccuracies in digitizing caused by operator and The raster model represents location as cells, also in a Cartesian
cquipment cOordinate system and store rows and columns of cell values.
Inaccuracies inherent in the geographic feature (for In araster model, each element of the raster can be considered a GIS
example, edges such as forest edges, that do not occur as "object", but with a highly structured definition, and very limited
sharp boundaries)* attributes.
Data Insufficient numerical precision Raster consists of evenly spaced data that cover a certain area. Because
Storage Insufficient spatial precision of distortion, however, data that is spaced evenly in one attribute may
in fact be unevenly spaced in another.
Data Inappropriate class intervals
Calculation of area and distances is easy but calculation of perimeters
Manipulation Boundary errors is more difficult.
Error propagation as multiple overlays are combined The raster data model answers the question, "What geographic
Slivers caused by problems in polygon overlay procedures phenomenon occurs at this location."
Data Output Scaling inaccuracies Raster uses a lot of space, because they contain a great deal of
Error caused by inaccuracy of the output device information. Hence, data-compression is used.
the
Error caused by instability of the medium The more compressible the raster, the less suited the model to
information. A truly variable surface cannot be easily compressed.
Chrisman (1994), said that while the "raster is faster, but vector is
Use of The infomation may be incorrectly understood corrector" is true to a certain extent
Results The information may be inappropriately used
Vector :
Source : Amoff (1989: 142). The primary focus of the vector data model is the geographic feature;
Systems
100 / Introduction to Geographical lnformation
question of "What do I
The vector data model is more suited to the
know about this geographic feature?
coordinates in a
The vector data model represents location as x, y
Cartesian coordinate systems.
The vector model represents feature shape accurately;
Observation units are "end points" and/or variable line or polygon
magnitudes. The emphasis is on relationship and distribution
of
geographic features. The vector model represents features with
wel)
defined boundaries;
Vectors are discrete, finite things, where values of an area remain the
same within aregion, it is more accuratelydescribed as a vector.
Calculation of area and distance and perimeter is easy and precise.
Vectors use minimum space for storage of spatial data
The vector model is used for high-quality cartography and where
accuracy and precision are important, such as for cadastral (property)
applications
Real-world observations and phenomena are generally related to
specific locations: point-measurements, lines, boundaries, and
defined areas. For most of these things, vector-models are most
appropriate.
Most standard GIS operations can be performed on both data models. Which
data model is appropriate depends on the application. For census and many other
socioeconomic applications, the vector model is more appropriate as it allows a
compact representation of Point and Polygons. The close connection to database
management systems supports socioeconomic applications that are characterized
by a large amount of attribute information - for example hundreds of census or
survey variables - that is tied to a fixed number of spatial features such as census
districts, villages, or survey clusters. Finally, printed output from vector GIS
databases resembles more closely maps produced using traditional cartographic
techniques.
Even so, the capability to handle raster data is of increasing importance in
population applications. Some of the input data that are useful for delineating
enumeration are boundaries some in raster form. Fortunately, the choice between
data models usually does not have to be either-or'. Many GIS packages now
support both types of spatial data. This, for example allows the use of raster data
as a background onto which line and polygon features can be drawn, Thus,
remotely sensed images or elevation surfaces can be displayed on a computer
screen together with other relevant information o aid the delineation of
enumeration areas.
In short a raster model is best suited for showing
geographic variation of
phenomena, while a vector model is best suited to documentation of phenomena.

You might also like