DeMers Chapter 6: Data Storage and Editing

advertisement
Data Storage and Editing
(Entity and attribute)
DeMers Chapter 6
http://www.iupui.edu/~jeswilso/g438/lecture5/
Introduction
• Any analysis performs must be based on good data,
correctly organized and in the proper format.
• In raster, we may need to display each coverage to
isolate illogical or out-of-place grid cells as we compare
them to the input document
• In vector systems, we may have to build in topology
after the initial data input, to pinpoint any digitization
errors
• In case of entity-attribute agreement, we may need to
output sample portions of our map for comparison
against the original input material
Storage of GIS Databases
• Raster: Attribute values for grid cells are the primary
data stored in the computer. Values make up the actual
grid and positions of grid cells catalogued relative to
the order in which they appear e.g., if you store the
origin of the grid, cell size, and number of rows and
columns, all you need is the cell values
• Vector:
Common for GISs to store vector entities
and associated attributes in separate files (reason for
RDBMS). For example, in ArcView shape file format,
entities are stored in one file, attribute in another, and
projection info in a third file and Arc/Info Coverage (
workspace, entity directory, info directory )
• Tiling - storage of individual sections (tiles) in
predefined subsections. The purpose is to reduce
volume of data needed for analysis of any particular
The Importance of Editing the GIS Database
• Most errors result from improper input
• Generally, at least some errors will always occur and
require editing, e.g., pushing the wrong digitizer
button (vertices instead of node), pushing the wrong
keyboard button when entering attribute information,
and position errors in digitizing (shaky hand)
• 3 general types of error
• Entity error - (position error), primarily associated
with vector model (missing entities, incorrectly placed
entities, disordered entities)
• Attribute error ( occurs in both vector and raster
models, typing errors, misspelling, etc.
• Entity-attribute agreement error ( a.k.a., logical
consistency, correctly type codes attached to wrong
entities)
Accuracy
• The degree to which information on a map or in a
digital database matches true or accepted values
• An issue pertaining to the quality of data and the
number of errors contained in a data set or map
• It is possible to consider horizontal and vertical
accuracy with respect to geographic position
• Attribute accuracy - conceptual, and logical accuracy
• Level of accuracy required for particular applications
varies greatly. Highly accurate data can be very
difficult and costly to produce and compile
• e.g., mapping standards employed by the United States
Geological Survey (USGS): "requirements for meeting
horizontal accuracy as 90 per cent of all measurable
points must be within 1/30th of an inch for maps at a
scale of 1:20,000 or larger, and 1/50th of an inch for
maps at scales smaller than 1:20,000."
Accuracy Standards for Various Scale Maps
•
•
•
•
•
•
•
•
1:1,200 ± 3.33 feet
1:2,400 ± 6.67 feet
1:4,800 ± 13.33 feet
1:10,000 ± 27.78 feet
1:12,000 ± 33.33 feet
1:24,000 ± 40.00 feet
1:63,360 ± 105.60 feet
1:100,000 ± 166.67 feet
Accuracy Standards for Various Scale Maps
•
•
•
•
•
•
•
•
1:1,200 ± 3.33 feet
1:2,400 ± 6.67 feet
1:4,800 ± 13.33 feet
1:10,000 ± 27.78 feet
1:12,000 ± 33.33 feet
1:24,000 ± 40.00 feet
1:63,360 ± 105.60 feet
1:100,000 ± 166.67 feet
Precision
• Refers to the level of measurement and exactness of
description in a GIS database (e.g., number of
decimal places)
• Precise locational data may measure position to a
fraction of a unit e.g. to the millimeter
• Precise attribute information may specify the
characteristics of features in great detail
• Important to realize, however, that precise data--no
matter how carefully measured--may be inaccurate
• Level of precision required for particular applications
varies greatly. Engineering projects such as road
construction require very precise information
measured to the millimeter. Demographic analyses of
marketing or electoral trends can often make do with
less, say to the closest zip code or precinct boundary
Why be concerned about error? - The
Problems of Propagation and Cascading
• Discussion focused to this point on errors that may be
present in single sets of data
• ”Doing" GIS usually involves comparisons of many sets
of data. If errors exist in one or all of the data layers,
the solution to the GIS problem generated from them
may itself be erroneous
• Inaccuracy, imprecision, and error may be
compounded in GIS that employ many data sources
DIGITIZATION-continue
Tic
3
2
1
Geographic
features
4
Error Propagation and Cascading
• Occurs when one error leads to another
• Means that erroneous, imprecise, and inaccurate
information will skew a GIS solution when
information is combined
• DeMers - "error prone data will lead to error prone
analysis"
• e.g., if a map registration point has been mis-digitized
in one coverage and is then used to register a
second coverage
• Result = the second coverage will propagate the first
mistake
• In this way, a single error may lead to others and
spread until it corrupts data throughout the entire
GIS project
Entity Errors: Vector
•
Six categories identified by DeMers/ESRI
–
All entities that should have been entered are
present
– No extra entities have been entered
– Entities are in the right place and are of the correct
shape and size
– Entities that are supposed to be connected to each
other are all polygons have a single label point
which identifies them
– All entities are within the outside boundary
identified
Nodes and Vertices
• Specific types of entity errors in vector GIS
– can involve points, lines, polygons, nodes, vertices,
label points
– nodes - denote ends of lines or point where polygon
closes on itself
– vertices - denote change or direction within a line
– points -> lines -> polys
• Nodes are used to show specific topological
relationships, e.g.:
–
intersection of roads or streams
– intersection between stream and lake
– node errors include pseudo nodes and
dangle nodes
Pseudo nodes
• Occur where lines connect with itself or other line
• A line connects with itself to form a polygon, a.k.a.
island pseudo node (fig. 6.1a, p. 161)
• Also occur where two lines intersect (rather than
crossing) (fig. 6.1b)
• Pseudo nodes are not necessarily errors, but indicate
the potential location of errors
• e.g., pseudo node in the middle of a line representing a
node can be used to separate road into two different
speed limit zones
• Others may indicate error, (pushed wrong button
when digitizing, placed cursor at wrong location)
Digitization errors- Pseudo node (Diamond)
Pseudo node connects two and only two arcs
Pseudo node Not representing a serious errors
Pseudo node
Error
Dangle nodes
• A single node connected to a single line
• Again, not necessarily and error, but may be
• Can result from three possible mistakes: (fig. 6.2, p.
162)
– Failure to close a polygon
– Undershoot
– Overshoot
– Sometimes result from incorrect placement,
sometimes from fuzzy tolerance and snapping
distance
• One method of general error detection is comparing
digitized to original document at equivalent scales
good for broad scale obvious errors, not for finer scale
errors
DIGITIZATION
• For linear features such as rivers, roads,
railways it is important to digitize each
section separately (start node and end node
at a specified section) or use Route latter
Node
Road1
Road2
Digitization errors - Dangle Error (square)
Dangle error
Overshoot
Closed polygon
Undershoo
t
Natural feature
Road
Acceptable dangle node
e.g. end of roads
Label point and sliver errors
• Polygon label point errors ( points -> lines ->
polys)
• Label point is used to associate a polygon with
attributes
• If label point is missing, or there are more than one,
indicates error e.g., fig. 6.4, p. 163
• Sliver polygon errors
• Commonly result from incorrect practice of double
digitizing
• Can also result from overlay or merging operations
which join coverages from different sources
• Can be removed manually or by dissolving polygons
less than a certain area and/comparing intended
number of polys with actual number (Fig. 6.5, p.164)
Digitization errors-Labels
Missing labels or too many labels
too many labels
missing labels
Sliver polygon errors
How to correct digitization errors?
• List digitization errors using the command
(Nodeerrors and Labelerrors)
• Using ARCEDIT to edit the coverage then
use the commands (edit feature (ef) e.g.
ef label, ef node, ef arc
• Use a series of commands such as
nodesnap, arcsnap, reshape, split, add,
delete, move, copy, rotate, extend, and
unsplit
• For labels use Createlabels
Topology
• Topology is the process of projecting
complex surfaces to a simple ones
• Topology is a procedure for explicitly
defining spatial relationships connecting
adjacent features (e.g., arcs, nodes,
polygons, and points).
• Different types of spatial relationships are
expressed as lists of features e.g.
• An area is defined by the arcs comprising
its border
• An arc is defined by set of points (X,Y)
Topology-Main Concepts
• The three major topological concepts are:
• Connectivity: Arcs connected to each other
at nodes
• Contiguity/Adjacency: Arcs have direction
and left and right sides
• Area Definition:: Arcs connected to
surround an area define a polygon (area)
Spatial Relationships
(Topology)
Area Definition
Connectivity
Adjacency
PolygonTopology
Advantages of Topology
 Check for digitization errors (overshoot,
undershoot, unclosed polygon, missing labels, too
many labels)
 Store data more efficiently
(eliminate data redundancy-normalization)
 Make spatial analysis more faster
Topology
• Topological data structures dominate GIS
software.
• Topology allows automated error detection and
elimination.
• Rarely are maps topologically clean when
digitized or imported.
• A GIS has to be able to build topology from
unconnected arcs.
• Nodes that are close together are snapped.
• Slivers due to double digitizing and overlay are
eliminated.
Creating topology in Arc/Info
• After digitization and correction to
digitization errors topology can be built
• The command BUILD is used for point,
line, or polygon coverages
• The command CLEAN is used for line and
polygon coverages
• CLEAN never create topology for point
coverage
• BUILD never detect intersection of arcs and
polygons
Topology commands
• C:\[ARC] CLEAN [in-cov] {out-cov}
{dangle-length} {fuzzy-tol}
• C:\[ARC] CLEAN road1 road2 # 3.4
• C:\[ARC] BUILD [in-cov] {POLY/ LINE/
POINT}
• C:\[ARC] BUILD cities POINT
• For features that have no intersection such
as contours, BUILD with line option can be
used
• For features that have intersection such as
roads and lots, it is better to first use
CLEAN and then use BUILD
Tables created by topology
• Arc Attribute Table (AAT)
• Polygon Attribute Table (PAT)
• Point Attribute Table (PAT)
Area and perimeter = 0
• Route Attribute Table (RAT)
• Feature Attribute Table (FAT)
• Node Attribute Table (NAD)
Hint for topology
• Make a copy of the original data before start
building topology
• Make a known strategy for naming of the
coverages
• For example, names of raw coverages start
with R e.g Rroads and Rlanduse
• Keep coverage names less than or equal 8
characters and without extension (8.3)
Coordinate Transformation
The tablet coordinates must be converted to
real world (map) coordinates
The commands that used for coordinate
transformation are:
 CREATE or GENERATE - used to create a
master coverage
The (X,Y) of the tic file (Tic.dbf) must be
set to map coordinates.
 TRANSFORM - used to transform the
coverage
Coordinate Transformation-continue
• Latitude (Ø) and longitude () must be
converted to Decimal degrees (DD) e.g.
Latitude = 13 deg+ 45 min/60+55/360
• Project the decimal degrees to plane
coordinate e.g. UTM
(50,80)
(5,8)
Digitizer
coordinate
(0,0)
Map
coordinates
(0,0)
Generate
• Generate can create a coverage from raw
coordinates (Id, X,Y) e.g. from GPS
• Create a file of tic coordinates e.g. Tic1
which is ACII with (TICID, X, Y)
• Create a file of polygon coordinates e.g.
poly1
• GENERATE: INPUT Tic1 :TICS
• GENERATE : INPUT Poly1: POLYS :Quit
Attribute Errors: Raster and Vector
• Attribute errors generally more difficult to detect
• Types include:
• Missing attributes
perhaps only kind of attribute error traceable without
comparison to source material e.g., plot all polygons
and color them according to a certain attribute, if color
is missing, attribute is missing
• Incorrect attribute values or text
more difficult to detect one method is to plot all
polygons and color them according to a certain
attribute, if only one polygon has a certain attribute
and there should be other, it may stick out, in general,
involves direct comparison with source material)
Dealing With Projection Changes
• Often times, regardless of input method, separate GIS
data input for a project will be based on different
projection systems
• Necessary to transform all data to common system
before use in integrated modeling examples in
ArcView
• Joining Adjacent Coverages: Edge Matching (Union)
• Joining two adjacent coverages (usually of the same
theme) together to produce a single data set that covers
a broader region edge matching also done in raster
systems
Conflation and Rubber Sheeting
• Conflation and Rubber Sheeting: Refers to the
registration (georeferencing) of two maps (vector or
raster) in a non-linear way (Ovelay two maps)
• Used to make maps of different sources spatially
correspond with one another. Most often used in raster
data using ground control points (GCPs). Conflation
and rubber sheeting are synonymous terms according
to DeMers (Figure 6.1, p. 174)
• The need to geo-reference internal objects themselves
not just the map corners (Rubber Sheeting)
• Templating: " cookie cutting"
• If you have multiple coverages of different extents, the
template is used to "cookie cut" them all to the same
extent
Exercise
•
•
•
•
•
•
•
Characteristics of data storage in raster and vector
3 general types of error in spatial databases
Accuracy vs. precision
Error propagation and cascading of error in GIS
Types of errors in vector GIS
Types of errors in attribute data
The concept of topology - what is it, what types of
Relationships are stored for point, line, and poly
features, why do we need it in GIS?
Download