1 Data Entry & Presentation
Contents
a) Spatial Data Input
Direct Spatial Acquisition
Digitising Paper Maps
Obtaining Spatial Data Elsewhere
b) Data Preparation
1.1.1 Data Entry Process
1. Data Capture
2. Geocoding (Spatial Data)
3. Data Media Conversion (Digitizing, Scanning)
4. Data Registration
5. Data Format/Structure conversion (Vectorization & Rasterization)
6. Data Preparation, Cleaning & Integration
7. Data Validation-Accuracy & Quality
1.2 Data Capture vs Data Entry
Data Capture: Is the process of collecting data from various sources such as maps, satellite
images and fields surveys.
Data Entry: on the other hand, involves converting spatial and non-spatial data into digital format
using various techniques such as digitizing, scanning and GPS data collection. This maintains
accurate and up-to-date GIS database.
1. Geocoding (Spatial Data)
Geocoding involves assigning geographic coordinates to points. Perhaps the most basic form of
spatial data entry.
Spatial Data Input:
It is the Procedure of encoding data into a computer readable form and writing the data to the
GIS data base. In order to create and manipulate other geographical visualizations such as maps
GIS software provides a range of tools for editing & analyzing Data is input from GPS devices,
aerial photographs & satellite imagery. Involves Capturing, Acquiring and entering data that has
a geographic spatial component into a GIS.
1.2.1 Direct Spatial Data Acquisition
Direct observation of relevant geographic phenomena
Primary data is obtained.
With primary data, the core concern in knowing their properties is to know:
• the process by which they were captured,
• the parameters of any instruments used, and
• the rigor with which quality requirements were
observed.
[Link] Examples Include
1. Global Positioning System
Collecting location data using GPS devices, then importing the data into the GIS.
A GPS - a set of hardware and software designed to determine accurate locations on the
earth using signals received from selected satellites.
Location data and associated attribute data can be transferred to mapping and GIS.
2. Remote Sensing
This involves using satellite or airborne sensors to capture images of the earth’s surface and
then extracting spatial data from these images.
3. Photogrammetry
The science of making reliable measurements by the use of photographs: especially aerial
photographs
4. Surveying
Making relatively large-scale, accurate measurements of the Earth's surfaces using special
equipment e.g., Total Stations
1.2.2 Indirect Spatial Data Acquisition
This type of data is known as secondary data i.e., derived from existing sources & were collected
for other purposes, often not connected with the investigation at hand.
AKA Data Media Conversion. Data derived by:
scanning existing printed maps
data digitized from a satellite image
processed data purchased from data-capture firms/ international agencies
Pros & Cons of each method of acquisition
Direct Data Indirect
PROS PROS
May be time and cost saving
You are able to obtain the exact data you
need
It allows for the collection of accurate data
since it provides control over data quality
Creates opportunity for new insights
CONS CONS
In practice, it is not always feasible to other data are only available commercially
obtain spatial data by direct capture e.g. satellite imagery
Factors of cost and available time may be a High quality data remain both costly and
hindrance time consuming to collect and verify
Limited scope
1.3 Data Media Conversion
1.3.1 Spatial Data Elsewhere
Data in different formats and standards and with different quality.
This can be gotten using the internet/CD-ROM.
It is useful:
a) When conducting research for a selected area being compared with other areas for which
data is already accessible.
b) When the cost of collecting data primarily from the study area in question is higher than
the cost of acquiring it from data clearing houses.
c) When it is more convenient to have such data. You can search it unlike primary data.
It can be sourced from Spatial Data Clearing Houses in forms such as Base Map Data, Natural
Resources, Digital Elevation Data, Census-Related Data.
[Link] Spatial Data Clearing Houses
Centralized place where data providers dispatch their data creating a market place for potential
data users
Features:
a) Data available at a price
b) Price is a function of nature, scale & date of production
c) Provides searchable descriptions of available datasets
Examples:
i. [Link](US)
ii. Global Earth Observation System of Systems
iii. Geoportal Uganda
iv. Open Data Uganda
v. MUGIC
1.3.2 Digitisation
Involves use of a digitizer which converts analog maps, plans and images into digital format by
tracing features with a cursor or stylus. It is used when available data is gathered in formats that
cannot be immediately integrated with other GIS data.
[Link] Process of Digitizing
i. Create/use an existing data set to store the digitized [Link] the geometry type (point,
line or polygon) of the data set
ii. Load the source data into the GIS program (sometimes through scanning)
iii. Register the source to earth-based coordinates (if not already)
iv. Manually/Automatically digitize the features into the digital data set
v. Save the newly digitized features.
[Link] Ways of Digitising (How it is Done)
1. Manual
Human guided coordinate capture from a map or image source (tracing) using A PUCK (like
mouse). It uses a cursor to trace coordinates of point/line/polygon features, then relays the
coordinate to be stored in digital form on the computer.
Can be
Heads-up Digitizing (on-screen digitizing)/Semi-automatic
Done on a computer screen using a digital map or image as a backdrop.
Basically converting Raster -> Vector.
Used when there is a simple document with some interpretation required
Hardcopy Digitizing/ heads-down digitizing.
i. Original source is taped to a digitizing table which is connected to a computer.
ii. Puck does its job
iii. Converting the Hardcopy -> Vector
iv. Used when there is a complex document that requires interpretation
Similarities between the two methods:
i. Both Require a digitizer (mouse, pen/tablet) to trace over the features
ii. Both require a certain level of skill (mapping concepts) & expertise (tools used) &
experience (to produce accurate results)
iii. Both require post-processing/manual editing to finalize digitized map
iv. Both require use of georeferencing
Differences
ASPECT ON-SCREEN HARDCOPY
1. ACCURACY Higher because can zoom & Less accuracy and
make precise measurements precision
2. HANDLING THE MAP Not Required Basis of the process (Can
damage old maps)
3. TME REQUIRED (Not taking Faster Slower
into account size & complexity of
map)
4. EQUIPMENT NEEDED Digitizing Software & Digitizing Tablet
Computer
5. SKILL LEVEL Not so skilled Requires expertise in
drafting/digitizing
6. MAP QUALITY For new maps with clear & For older maps with
well defined features faded /distorted features
7. EXPENSE More expensive (software)
(software) Cheap (only digitizing
tablet)
8. COMFORT More comfortable for user Less
2. Scan/Automated
Where a computer program is trained to identify features on the input data source, and then
automatically traces those features. Uses algorithms and a stylus (cursor) with the capability of
signaling to a computer that a point has been measured.
[Link] What to Digitize as What
1. Points, Lines & Polygons
These are fundamental vector types in GIS
Employ use points and theirs associates X,Y cordinate pairs to represent the vertices of
spatial features.
Data attributes of these are then stored in a different database management system.
The spatial and attribute information are linked via a simple identification number that is
given to each feature on the map.
2. Points
Points are zero-dimensional objects that contain only a single coordinate pair.
Points are typically used to model singular, discrete features such as buildings, wells,
power poles, sample locations, etc.
Points have only the property of location.
Other types of point features include the node and the vertex.
Points can be spatially linked to form more complex features.
3. Lines
Lines are one-dimensional features composed of multiple, explicitly connected points.
Lines are used to represent linear features such as roads, streams, faults, boundaries,
etc.
Lines have the property of length.
Lines that directly connect two nodes are sometimes referred to as chains, edges,
segments, or arcs.
4. Polygons
Polygons are two-dimensional features created by multiple lines that loop back to create a
“closed” feature.
Polygons are used to represent features such as city boundaries, geologic formations,
lakes, soil associations, vegetation communities, etc.
Polygons have the properties of area and perimeter.
Polygons are also called areas.
[Link] Ways of Digitising (How it is Done)
1. Point-Mode Digitising
Process of creating a digital representation of a physical map or drawing by manually clicking on
the points that represent features or data points
How is it done?
1. User selects a point on the map or drawing
2. Then clicks on the corresponding point on the digital screen
3. Process is repeated for each point that needs to be digitized until the entire map or
drawing has been converted into digital format.
2. Stream-Mode Digitising
It is the digitizing of maps or drawings in which the user traces the features on the map or
drawing using a cursor or a pen tablet.
Advantage of this method
It allows for the continuous tracing of the features, such as lines or polygons without
having to stop.
More efficient and accurate digitization process for complex features with many curves
and angles.
How is it done?
1. a). User moves the cursor or pen along the feature that needs to be digitized
b). Simultaneously, the software records the movement and creates a digital representation of
the features.
[Link] Digitizing Errors in GIS
1. Attribute errors:
Where the information about the feature is inaccurate or false.
2. Positional errors:
Occurs when the feature is not captured properly.
a) Dangles/Dangling Nodes
Lines that are not connected but should be, leave gaps. Occur when a
digitized polygon doesn’t link back to itself, brings about an open polygon.
b) Switchbacks, knots & loops
Caused by unsteadiness of the digitizer’s hand. He moves the
cursor in a way that creates extra vertices/ nodes.
Switchbacks create extra vertices hence bent lines.
Knots and loops make the line fold on itself creating a weird polygon.
c) Overshoots & Undershoots
When the line digitized doesn’t connect properly with the neighboring line that it should intersect
with.
They occur when snap distance is either not set or too low for the scale being digitized.
In some instances such as a road GIS database, they lead to dead ends.
d) Slivers
A Sliver is an area where two adjacent polygons overlap in error.
where the adjoining polygons have gaps between them.
Caused by setting the wrong parameters for snap tolerance.
[Link] Factors to consider when choosing a digitizing technique to use
i. Accuracy/Precision
ii. Data Complexity
iii. Time Constraints
iv. Availability of appropriate software & tools
v. Quality
vi. Contents of the input document
Complex images- Manually digitized
Simple images- Automatically digitized
Topo maps & Aerial phots (full of detail & sumbols)- manually digitised
Fully automatic – maps with one type of information e.g., contour lines &
cadastral boundaries.
1.3.3 Scanning
A process of converting a physical map or document into a digital format that can be used in a
GIS. Involves scanning documents (maps, aerial photographs or other paper documents) into the
GIS as raster images.
i. Preparation
ii. Scanning
iii. Geo-referencing
iv. Digitizing
v. Quality control
[Link] Scanner Output
This refers to the digital image or data produced by a scanner. The scanner output is only
a digital copy of the source documents in raster cell values.
Can be in different formats depending on the user preferences. such as Jpeg, Tiff,
png,pdf.
This scanned image or document in a digital form can be edited, displayed or printed by a
normal computer.
Quality of the scanner output depends on the resolution, color depth and settings used
during the scanning process.
[Link] Scanning Options
1. Line Art
for scanning black and white images with no shading or
tonal variations.
Produces high contrast images suitable technical drawings,
maps and diagrams.
It can create crisp, clean images with sharp edges making
it useful for vector graphics.
Have only white and black pixels
2. Grey Scale
Captures images with varying level of brightness but no color.
It is useful for capturing images and diagrams with tonal
variations.
They provide more detail than line art but require more
storage.
256 grey values per pixel with white & black as extremes
3. Colour Mode
It captures images with full color including red, green and
blue channels.
It is useful for capturing colorful images, maps, photographs etc. that require a high level
of detail and accuracy.
They provide the most detail but require the most space.
256 different values of red, green and blue per pixel with combined values
Note: An image is a picture with pixels that can represent measured local reflectance of values in
some designated part of the electro-magnetic spectrum aka An image is a picture with pixels that
shows a particular part of the study area as a representation of its true colors that have been
measured by local reflectance.
A raster on the other hand is an interpreted image. (What the viewer deems most important of the
entire reflectance is what takes up the most space such that when the values average, it is only
that aspect that is represented.
[Link] Scanning Resolution
This refers to the level of detail captured when scanning.
It represents the number of pixels a scanner has captured.
Can be expressed in:
a) Millimeters
b) Microns
c) Dots per inch
A higher resolution results in higher level of detail and larger file and vice versa.
Digital scanners have a fixed maximum resolution expressed as the highest number of pixels
they can identify per inch, the unit is dots per inch.
Depending on the requirements for use:
i. Manual on-screen digitising of paper map (200-300dpi)
ii. Manual digitizing of aerial photographs (800dpi)
[Link] Point To Note:
The Scanner output is only a digital copy of the source document in raster cell value (as
earlier mentioned)
Data are NOT structured into classified & coded objects
To obtain such data, it has to be vectorized or further structured. Using additional ways to
recognize features and to associate categories and other thematic attributes with them.
1.4 Map Registration
Process of aligning two / more maps to a common coordinate system
It involves Identifying common reference points on different maps and applying
mathematical transformations to align them.
It is used when you need to analyze data from different sources (ensure aligned a
common coordinate system)
1.4.1 Methods of Map Registration
[Link] Ground Control Point (GCP)
Involves identifying common features on both maps
Used as a reference point to align both maps
[Link] Image Matching
Involves comparing features in two images
Finding the best transformation that aligns them
Algorithms are used
[Link] GPS Data
Provides accurate location information used to align maps
Where GCPs or other is unavailable
In remote areas (where traditional methods expensive)
1.4.2 Reasons why one would carry out Map Registration
1. For map projection transformations
To align maps that use different projections when converting one map projection to another
2. For accurate analysis & decision making
3. Framework for accurate digitization
Establishes reference points needed to align the digital version of the map with other
datasets/maps.
1.5 Data structure conversion
This refers to the process of transforming data from one normal format or structure to another,
while preserving the content and the meaning of the data.
GIS data can be stored in various formats such as vector and raster.
1.5.1 Reasons for carrying this out
1. Interoperability; different GIS software packages may use different data formats.
Converting data to a common format can enable data sharing and integration into
different systems.
2. Analysis; Certain GIS analysis tools may require data to be in a particular format and
therefore converting data to a required format can facilitate the analysis.
3. Data management; sometimes the data is converted to different formats so that it can
occupy less storage space
1.5.2 Methods
[Link] VECTORISATION
Raster data (geographic features as grid pixels.) converted into Vector data (points, lines &
polygons)
Useful when analyzing discrete features (discrete fields & objects) such as roads or buildings,
that are best represented by vector data.
Identifying features in the raster data and tracing their boundaries to create vector data.
Involves two steps:
1. Skeletonising/Thinning
i. Involves Thinning: As scanned lines may be several pixels wide, they are often first
thinned to retain the centerline.
ii. The remaining centerline pixels are converted to a series of (x,y)coordinate pairs ,
defining a polyline.
2. Feature Forming
Subsequently ,features, forms and attributes are attached to them (the coordinates).
i. It involves splitting lines to form line segments & nodes
ii. Joining line segments to form polygons & features
iii. Feature coding
Note: This process may be entirely automated or performed semi-automatically, with the
assistance of an operator.
[Link] Ways of carrying out Vectorization
1. Digitizing
Manually tracing over raster images to create vector features such as points, lines & polygons
2. Automatic Vectorization
Using software algorithms to automatically convert raster data to vector format. It is faster &
More efficient but less accurate especially for complex features
3. Hybrid Vectorisation
Manual & Automatic to achieve a balance between accuracy and efficiency
[Link] Importance of Vectorisation of data
The vectorisation of data is important due to the following reasons;
1. Vector data is more compact than raster data, which makes it easier to store process and
transfer.
2. Vector data is more precise than raster data, which makes it more suitable for many GIS
applications, such as spatial analysis and modelling.
3. Vector data is more precise than raster data which makes it more flexible than raster data.
4. Vector data can be easily integrated with other GIS data sets.
1.5.3 Rasterisation
Conversion from vector data to raster data. Assigning point, line and polygon attribute values to
raster cells.
[Link] Why would one want to do this?
Raster maps are better at continuous data,
Facilitates the integrating of the two data types. Discrete data, e.g. forestry stands, is
accommodated equally well.
The inherent nature of raster maps, e.g. attribute maps, is ideally suited for mathematical
modeling and quantitative analysis.
Due to the nature of the data storage technique data analysis is usually easy to program
and quick to perform.
When using raster-based output devices, e.g. electrostatic plotters, graphic terminals.
No geographic coordinates stored. The geographic location of each cell is implied by its
position in the cell matrix. Accordingly, other than an origin point, e.g. bottom left
corner, no geographic coordinates are stored.
Since most input data is in vector form, data must undergo vector-to-raster conversion.
[Link] How Is It Done?
1. vector to raster: point
node x,y assigned to closest raster cell
locational shift almost inevitable; error depends on raster size.
two points in one cell indistinguishable
not transitive; cannot retrieve original data without error
2. vector to raster: line
cells assigned if touched by line
stair step appearance of diagonal lines (called aliasing)
can be visually improved through anti aliasing: brightness of cells varied based on
fraction of cell covered by the line
Transitive: the ability to reproduce the original data after conversion.
1.6 Spatial Data Preparation
The process of cleaning, and editing spatial data before it can be used for analysis or
visualization.
1.6.1 Process of Spatial Data Preparation
The process of spatial data preparation typically involves several steps such as:
1. Data collection: collecting spatial data from various sources such as remote sensing, GPS,
surveys, or public database.
2. Data cleaning: Removing any errors, inconsistencies, or outliers from the spatial data to
ensure it is accurate and reliable.
3. Data integration: Combining different datasets to create a comprehensive spatial
database.
4. Data transformation: converting the data into a standardized format that can be used by a
variety of GIS software and tools
After that:
5. Data analysis: Using spatial analysis tools to explore and identify patterns, relationships
and trends in the data.
6. Data visualization: Presenting the spatial data in way that is easy to understand and
communicate to others.
1.6.2 Reasons for Spatial Data Preparation
Vector data may require editing, (removal of errors), & generating polygons.
Ensures spatial data is accurate, and reliable and can be used to make informed decisions
in variety of fields such as environmental science, urban planning, & public health.
Ensures spatial data is in a format that can be used for visualization.
Spatial data preparation is crucial because it ensures that the data is accurate, reliable and
appropriate for the intended analysis or application.
Images may need enhancements & (re)classification
1.6.3 Data validation/ checkup:
this involves checking the data for errors, inconsistencies and missing values. Data validation
may involve statistical analysis, visual inspection or comparing the data to external sources.
1.6.4 Data cleaning:
this involves correcting errors and inconsistencies in the data. Data cleaning may involve
removing duplicate records, filling in missing values, correcting errors and standardizing data
formats.
1.7 Data Quality
The quality of data in GIS determines the quality of system analysis and the success or failure of
the whole application. The analytical methods of spatial data provided by GIS are widely used in
various fields, and the quality requirements for data in decision-making fields should be known
or predictable.
Data quality in simple terms can be defined as degree to which data is accurate, complete,
reliable, and relevant for its intended purpose.
1.7.1 It involves
Ensuring data quality requires a range of activities, such as data profiling, data cleansing, and
data validation. It is an ongoing process that involves continuous monitoring and improvement.
Data quality is a pillar in any GIS implementation and application as reliable data are
indispensable to allow the user obtaining meaningful results.
Higher processing cost: The rule of ten states that it costs ten times as much to complete a unit of
work when the data is flawed than when the data is perfect
You cannot rely on just one metric to measure data quality. You can consider multiple attributes
of data to get the correct context and measurement approach to data quality
Data quality elements describe a certain aspect required for a dataset to be used and accurate.
GIS data has different components to its quality. As defined by the International Organization for
Standardization (ISO), these components include the following:
Completeness
Logical consistency
Spatial accuracy
Thematic accuracy
Temporal quality
Data usability
1.7.2 Components of Data Quality
[Link] Completeness
The presence or absence of features, their attributes, and relationships in a data model.
[Link] Logical consistency
A degree of adherence to preestablished rules of a data model's structure, attribution, and
relationships as defined by an organization or industry. Many industries follow standards that are
reflected in a geospatial data model as value domains, data formats, and topological consistency
of how the data is being stored.
Is the data which is inherently vector or raster data reflected as so? How about the attribute data,
can one tell by looking at the units? The geographical dimensions, have they been reflected as
per the standards of doing so?
[Link] Spatial accuracy -Geometrical
The accuracy of the position of features in relation to Earth. For consistent analysis the spatial
reference of data sets should be the same. (Datum, projection & coordinate) This is where you
ask Geometrically, are the coordinates correct? This is what necessitated map registration
[Link] Thematic accuracy -Semantic & Topological
The accuracy of attributes within features and their appropriate relationships. Topological
meaning comes in. Is the data making sense(semantic)? Land, climate, soils- do they
appropriately reflect the correct relationship(topolofical)?
Semantics: Semantics is a crucial aspect of data quality as it pertains to the meaning or
interpretation of the data. In other words, it's the extent to which the data accurately and
consistently represents the real-world concepts, relationships, and entities it's meant to describe.
Data that lacks proper semantics can lead to confusion, errors, and misinterpretation when used
for analysis, decision-making, or other purposes. For example, data that uses different terms to
describe the same concept or duplicates data can affect the accuracy and consistency of data.
Ensuring that data has high semantic quality involves standardizing the terminology and
definitions used in the data, checking for data inconsistencies and errors, and ensuring that the
data accurately reflects the real-world entities it describes. This is especially important in
industries such as healthcare, finance, and legal, where precise and accurate data is essential.
[Link] Temporal quality
The quality of temporal attributes and temporal relationship of features. If one is reflecting a
particular area, for which time is that representation accurate, 2008 Kampala after Chogam is not
the same as 2023 Kampala. That has to be reflected.
[Link] Data usability/Relevance
Adherence of a dataset to a specific set of requirements related to a use-case. The degree to
which data is useful and applicable to the purpose for which it is being used
1.8 Questions to ask the man:
1. How come in the processes of scanning & digitizing, each is found in the other? To
digitise automatically, you first scan, to scan (you digitise at a point)
2. Difference between image and a raster. Didn’t quite understand it or why it was relevant
to this topic.
3. Would like a clarification as to whether spatial data preparation is a step in data input