FORMAT DAN STRUKTUR DATA

advertisement
Sistem Informasi Sumberdaya Lahan
KULIAH KE 4-5
FORMAT DAN STRUKTUR DATA
Data geografi miliki informasi yg t.a. 4
komponen :
-
posisi geografi (referensi spasial),
informasi atribut,
hubungan spasial dan
waktu.
Komponen ini dpt juga dinyatakan dgn
pertanyaan (QUERY) dimana, apa,
bagaimana hubungan dan kapan ?
1. Posisi Geografi
 Data geografi mrpkan btk data spasial.
· Tiap obyek miliki lokasi spesifik
· Posisi geografi, dpt disajikan dlm:
–
–
–
–
–
–
koordinat Cartesian atau azimuth,
hubungan identifikasi ke tetangga,
hubungan lokasi linier,
suatu ruang tertentu,
penyandian nama tempat tertentu,
bereferensi ke obyek tertentu
2. Atribut Geografi
 Atribut berfungsi menjelaskan data spasial.
Ciri atribut bersifat multidimensi;
 satu obyek mempunyai banyak identitas.
 Mis. lereng di posisi ttt, punya identitas:
panjang lereng, arah, bentuk, dst.
 Data atribut = data non-spasial = data
tabular
3. Hubungan Spasial
 Hubungan spasial = hub antara obyek sgt
beragam, kompleks, dan penting.
 Ada 4 kategori hubungan (atau disebut
mempunyai topologi) yaitu:
o keterkaitan (connectivity),
o orientasi,
o berdampingan (adjacency), dan
o posisi dlm suatu ruang
(containment)
.
4. Waktu
 Waktu bersifat dinamik
 Informasi KAPAN data diperoleh, tentukan
pemanfaatan data secara benar.
– Mis: daerah persawahan 1 th yl kini 
kawasan pemukiman/industri.
– Pola tanam MH beda dg MK
 Data waktu dideskripsikan dlm pengertian:
o ukuran lama; selang waktu dari data-base yg
ada,
o resolusi; selang waktu dikumpulkan/agregasi
waktu pengumpulan data,
o frekuensi dan kecepatan waktu pengumpulan
data.
•Ada 2 tipe Database:
• spatial database
– digital base maps describing earth’s surface, layered on
top one another
– individual layers called themes/coverages
• attribute database
– describes characteristics/qualities of features on base
maps
– linked to spatial database by common item
• The “where” of GIS is determined by
coordinate (map) data structures, but …
• The “what” of GIS is determined by tabular
(relational database) data structures
• Thus, tabular data are just as important as
coordinate data_
Database systems
• Three types of database models
•
•
•
•
Relational model (what we just learned)
Flat file
Hierarchical model
Network model
Hierarchical Model
• Model has a tree structure, with a root and children entities
• Encodes many-to-one or one-to-one relationships
• Advantages
– Very efficient searching for queries encoded within the tree
– Easy to understand and update
• Disadvantages
– Searches requiring more than one entity are not efficient
– Tree architecture must be optimized for types of queries to
be done
– Inflexibility of searching makes it less than ideal for GIS
systems, where queries are often exploratory and cannot be
predicted in advance
– Multiple parents are not allowed
Hierarchical model
Network Model
• Entities can have multiple parents as well as children
• Encodes many-to-one or one-to-one relationships;
many-to-many relationships can be indirectly
handled.
• Advantages
– Less redundant data storage then in hierarchical
model
– Provides greater flexibility in searching
– Relationships are encoded in the database for
quick retrieval of certain queries
• Disadvantages
– Additional complexity to store linkages, updating is
more time-consuming
– Still not as flexible as the relational data model
Network model
Relational Model
• No hierarchy of entities; records are simply stored as
a set of tuples
– A tuple is a set of permanantly related fields. Also called a
table, or file.
• One-to-one or many-to-one relationships are
encoded
• Queries can be done using any field
– I.e. there is no distinction between key and attribute fields
• Tuples can be linked together using any field that is
common to both
– Relate or Join: a temporary link between two tables
– Relational join: permanent combination of two tables into
one file
Relational model
Why use a Relational Model?
• Advantages
– Highest flexibility of the three models for processing and
queries
– Easily modified and added to
– Organization is simple to understand and communicate
– Same database can be represented with less redundancy
• Disadvantages
– More difficult to implement
– Slower performance due to absence of permanent links and
pointers
• Choice for GIS
– GIS systems overwhelmingly choose the Relational model for
its flexibility
– GIS analysis is typically exploratory, requiring the flexible
relational model
Relational Database Model & Attribute
Data Structures
•Attribute data are stored in database tables
•Tables are composed of:
– fields
and
– records_
Relational Database Model & Attribute Data Structures
• You may already be familiar with some
types of relational databases
–dBase
–rBase
–MS Access
–MS Excel (database functionality)
–Oracle, INFORMIX, INGRES, SQL Server
–MySQL, PostgreSQL
–INFO (in ArcInfo)_
Relational Database Model & Attribute Data
Structures
•
ArcView uses tabular data formats from
dBase, ASCII text, and INFO files
tables are stored on the disk as
•
•
•
.dbf files,
.txt files, or
as binary files in INFO directories_
Relational Database Model & Attribute Data
Structures
•Each vector data source has an attribute table
•Tables can be linked and joined (“related”)
by use of common values in fields
• Different types data that may have attribute tables in
ArcView
– Vector
•
•
•
•
•
•
•
point attribute
polygon attribute
line attribute
node attribute*
text attribute*
route & event*
CAD attributes
– Raster
• value attribute tables
•Relationship between tabular and map data
– one-to-one between features and records
•when a
selection is
made, both
the record
and feature
are selected
Spatial Data Models
• Vector Data Model
– Points, lines, polygons
– Stores topology
• adjacencies
• connectivity
• containment
• Raster Data Model
– Based on cells or pixels
– Each cell contains single attribute value (z)
Raster
Vektor
Bentuk dan Struktur Data Spasial
 Data spasial dlm SIG t.a. bbrp lapisan data
 Setiap lapisan t.a. 3 tipe entitas data yaitu :
titik, garis, poligon/bidang dan bentuk 3-d
(volume): elevasi/ruang
o titik : obyek geografi yg dikaitkan dg pasangan
koordinat (x,y). Mis: profil tanah, stasiun
cuaca.
o garis (end, arc, polyline) : semua unsur2 linier
yg dibangun menggunakan segmen2 grs lurus
yg dibtk oleh  2 titik koordinat . Mis: sungai,
jalan, pola kelurusan dll.
o poligon : batas wilayah atau batas danau, berupa
garis yg tertutup.
o obyek volume (blok, polihedral, dll)
libatkan unsur dimensi ke-3: ketinggian/
kedalaman.
• Penyimpanan obyek-obyek tsb dibuat dlm 2
btk utama: raster atau vector
Concept of
Vector and Raster
Real World
Raster Representation
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5 6 7 8 9
R T
R
T
H
R
R
R R
R
R
T T
H
R
T T
R
R
Vector Representation
point
line
polygon
Struktur Data Raster
o Data raster : btk data paling sederhana dlm SIG.
o Raster ditampilkan dlm btk pixel;
• titik : 1 pixel,
• garis : kumpulan pixel dlm deretan ke 1 arah
• Poligon: kumpulan pixel dlm berbagai arah dlm
1 bid.
o Raster dibentuk oleh kumpulan sel atau pixel.
Tiap pixel punya referensi sbg identitasnya yg
terikat oleh lokasi geografis atau kolom dan
baris,
–Tiap pixel punya nilai tertentu.
–Data yg dibtk oleh raster punya satuan 2-D.
Raster data model
1.1. Jaringan Raster Sederhana
o Posisi horizontal: diorientasikan ke arah
barat-timur (ikuti konsep dlm inderja:
dinomori dari kiri ke kanan).
o Posisi vertikal (kolom) dinomori dr atas ke
bawah, shg penomoran mulai dari bag.
atas kiri.
o Sistem raster ini punya 2 kelemahan.
1. sulit menentukan posisi absolut utk skala ttt (unsur
piksel ttt). Mis.: obyek yg terletak pd 1 atau 2 piksel,
obyek tidak tepat berada di antara kedua piksel.
2. sel yg berdampingan yg tidak terpisah; dlm hal ini
hasilnya tgt pendefinisian berdampingan.
Contoh (Gbr), pengertian jarak sel pusat dg tetangga
seb. kanan-kiri dan atas-bawah adalah 1, tetapi
jaraknya = 1.41 (akar 2) jika dikaitkan dg tetangga
ke arah diagonal.
Dibedakan:
– keterkaitan-4-tetangga
– keterkaitan-8–tetangga (jaraknya tidak sama, bbrp tetangga hanya
bersinggungan pada titik ujung).
2. Penyimpanan Data Raster yg Efisien
o Penyimpanan data RASTER perlu banyak
ruang.
Σ sel berpgrh pd ruang penyimpannya.
o Utk kurangi beban penyimpanan data, dpt
dilakukan dg
(1) run-length encoding dan
(2). chain codes atau value point encoding
o Run-length encoding: (lihat Gambar)
Nilai
A
Panjang
10
Baris
0
 chain codes atau value point encoding:
•
•
Ni lai
A
B
Titik
23
29
– Pixel bernilai A tdpt pd baris 0,1, dan 2, pixel yg
bernilai A terakhir pd kel ini tdp pd pixel 23 (baris 2
kolom 3)
– Pixel bernilai B, pd kel.ini berakhir pd baris 2 kolom
9.
• .
A. penyimpanan data tanpa pengompakan
B. run-length encoding .
Nilai
Panjang
Baris
A
10
0
– Obyek bernilai A pd baris 0 ada 10 piksel
C value point encoding
Ni lai Titik
A
23
B
29
– Pixel bernilai A tdpt pd baris 0,1, dan 2, pixel yg bernilai A
terakhir pd kel ini tdp pd pixel 23 (baris 2 kolom 3)
– Pixel bernilai B, pd kel.ini berakhir pd baris 2 kolom 9.
Struktur Data Raster Berhirarki
o Sistem data quad tree, hslkan tampilan
raster >padat krn gunakan bbg ukuran
grid. Resolusi tinggi hanya jika perlu
o Daerah homogen, gridnya besar; yg
heterogen, kecil.
o Nama quad tree dr pengurangan 4 x
ukuran piksel pd setiap lapisan (peta).
Struktur ini digambarkan sbg pohon (Gbr).
Struktur Data Vektor
Digunakan utk obyek yg memerlukan posisi
akurat.
Obyek di muka bumi diikatkan ke posisi peta
menggunakan sistem koordinat XY.
Obyek titik direkam sbg pasangan individual XY,
garis sbg rangkaian koordinat2 XY, dan area
sbg rangkaian pasangan koordinat XY yg
tertutup.
Ada 2 btk umum utk gambarkan data vektor yi:
- model data Spaghetti
– model data Topologi.
Spatial Data Structures
• The Vector Data Model
– Linking spatial and attribute data
– Spaghetti vs Topological data models
– Constructing Topology
– Advanced topological structures: Routes and
Regions
• Surfaces in vector models
Aronoff, 1993.
Raster vs Vector Data Structures
• Raster data stored as an array of values
– Georeferencing is implicit in the structure
– Usually defined by one corner of the image and
the cell size
– Attributes are defined by the cell values (no
character data!)
– One attribute for each raster file
Vector data
• Vector data are stored as a series of xy coordinates
–
–
–
–
Points are stored as single xy coordinate
Lines are a string of xy coordinates
Polygons are composed of one or more lines and a label
Attributes are attached to each feature through a unique
numeric code
– Many attributes may be stored in each vector file
Topological models
• Spaghetti models store only
the coordinate data
– Spatial relationships between
features is not recorded
– Finding two adjacent polygons
would require reading in all the
data and comparing the
coordinates of all the polygons.
– Thus, spaghetti models are very
inefficient for doing spatial
analysis
Topological models
• Topological models store
coordinate data and spatial
relationships
– Topology is the mathematical
method used to define spatial
relationships
– Spatial relationships between
features are explicitly
encoded, making it very easy
to determine if polygons are
adjacent, if arcs connect, etc.
– Highly desirable model if
spatial analysis is to be done
on the data
Topological Relationships
• Connectivity: property of being interconnected
– Applies to roads, streams, pipelines, electric grids
etc
– Used to evaluate travel or delivery routes for
efficiency, speed
– Used to optimize transportation scheduling, etc.
• Contiguity: the property of being adjacent
– Is this parcel of land adjacent to an industrial site?
– Adjacent habitats?
– Zoning districts?
– Evaluating clear cuts in a forest: don’t want two
adjacent sites cleared
Coding connectivity
Arc-polygon topology
• List arcs in each polygon
• Avoid storing coordinate data more than once
A = the world polygon
Coding Contiguity
• Telling if polygons are adjacent
• Arcs have direction
• Polys are adjacent if they share an arc
Need world polygon
for outside arcs
Constructing Topology
• Point data and line data are stored in files of spatial coordinates
• Additional tables are constructed to hold topology
– Method relies on each feature having a unique numeric code
• Polygon topology
– Polygon table lists the arcs that make up each polygon
– Each arc coordinate is stored once, even if it belongs to two
polygons
• Arc-node topology (establishes connectivity)
– Nodes are uniquely numbered
– Nodes are listed for each arc
– Arcs that share a node are connected
• Left-right topology (establishes contiguity)
– Arcs have direction established by assigning a from-node and a
to-node
– The arc topology stores the number of the polygons to the left
and right of the arc
Which is better?
• Spaghetti Model Advantages
– Data structures are simpler and more compact
• BUT
– More complex software needed to perform
spatial analysis
– Spatial analysis is more time-consuming
Which is better?
• Topological Models Advantages
– Spatial analysis does not require accessing
coordinate data, eliminating time-consuming
calculations to derive spatial relationships from
coordinates
• BUT
– Topology must be built or rebuilt whenever a map
is entered or edited. This process can be time
consuming for large coverages.
– Topological data models require more complex
data structures than the spaghetti models
– Editing must be done without topology (less
convenient), or with topology (slower and prone
to instabilities).
Networks (route systems)
– Routes are composed of sections. Each section is a
single arc, or piece of an arc.
– Allows different attributes to be stored along different
sections of the same arcs
– Allows attributes to be defined using length measures
(such as mileposts) instead of in x-y coordinates.
– Useful for utility, transportation, and hydrologic
modeling and analysis
Vector surface representations
• Contours
– Stored as an arc coverage with contour
values as attributes
– Familiar format for storing surfaces that
most people can read easily
– Difficult to perform analysis on contours
– Uses: Displaying surfaces on maps;
input to create TINs and raster surfaces
Sources of vector data
• Conversion from raster data (vectorization)
– Precision will never be better than original raster file
– Best when performed on very clean, simple scanned maps
• Digitizing of maps
• Importing data already in vector format
– DLGs = Digital Line Graphs, USGS (hypsography, hydrography,
boundaries, roads)
– TIGER = Census blocks and associated attribute data
– Data from other programs
• Classification of or mapping on raster images
– Including aerial photos, satellite images, remotely sensed data
• Generating coverages from lat-lon coordinates
– Creating a point coverage of wells from their lat-lon
coordinates
– Global Positioning Systems (GPS) are often used to
generate coordinates
Choose a vector model when
• Data are categorical and/or discrete, not
continuous
– distance to roads, versus soil types
• High precision is required
– land parcel databases--a meter matters!
• High quality map output is required
– vector files smaller, print better with less
memory involved
• Extensive attributes for features
• Analysis involves questions of contiguity,
adjacency, network behavior
Spatial Data Structures
• The Raster Data Model
–
–
–
–
How raster data are stored
The resolution problem
Assignment strategies
Compression techniques
• Surfaces
• Choosing a data model
Raster vs Vector Data Structures
• Raster data stored as an array of values
– Georeferencing is implicit in the structure
– Usually defined by one corner of the image and the cell
size
– Attributes are defined by the cell values (no character
data!)
– One attribute for each raster file
• Vector data are stored as a series of xy coordinates
– Points are stored as single xy coordinate
– Lines are a string of xy coordinates
– Polygons are composed of one or more lines and a
label
– Attributes are attached to each feature through a
unique numeric code
– Many attributes may be stored in each vector file
Storing Raster Data
• Layout of a raster file
– rows, columns, and cells (pixels)
• Representation of features
– Individual points, lines, or polygons
– Maps and surfaces
Cells or pixels
Raster georeferencing
– Must know coordinates of one corner, and the cell size
– Some systems require all layers to be precisely
matched
– Usually one attribute per layer
Raster data types
• Data types
– Ascii: stored as arrays of integers or floating point
characters
• Simple, readable by almost any software, but
INEFFICIENT
– Binary: stored in binary format
• Not as transportable, but very efficient
– Values may be stored as integers or floating point data.
– Floating point takes about 4 times as much space!
– Often worthwhile to convert floating point to integer.
Images vs grids
• In Arcview, images are simple raster data for display
only
• Grids are a proprietary format for storing raster data
– may be analyzed and manipulated
– Only available for use with Spatial Analyst
• A grid may be opened as an image, but not viceversa!
• Other GIS packages will have their own “grid”
formats
Images
• Raw image format (raster)
– basic binary data arranged in arrays,
one row per line
– each pixel usually one byte (0-255),
• 0 is black, 255 is white
– images may have one or more
bands
• bands usually represent brightness
in different color ranges
• true-color images have
red,green,blue bands
• others may have 7, 14, 225
bands!!!
– header file gives important size and
georeferencing info
• rows, cols, bands, Upper Left, pixel
size, projection, etc.
Single band grayscale image
Storing Raster Map Data
• Grid data are stored in special format for
display and analysis
• One grid layer per theme (not multiband)
– elevation, geology, rainfall, etc each in separate
file
Raster Data Resolution
• Rasterizing involves some idealization of the map or
objects
• Raster data are stored with a cell size that
determines the resolution
– Representing features precisely requires high
resolution (small cell size)
• Storage requirements increase by the square of the
image dimensions
– A 16x16 image has 256 pixels; a 32x32 image has
1024 pixels!
• Resolution can affect estimates of area and length
• The raster model is not suitable for applications
requiring high precision, such as land information
systems
Raster resolution
Vector map
16x16=256 bytes
32x32=1024 bytes
Sources of raster data
• Conversion from vector data (rasterization)
– Nearly always involves a loss of resolution and precision
• Scanning of maps on a B&W or color scanner
– Usually requires some processing to assign categories
– Especially problematic with color images due to slight
variations in hue on map
• Importing data already in raster format
– DEMs = Digital Elevation Models, USGS
– GIRAS = Landuse Maps from the USGS
– Data from other programs, such as GRASS, ERDAS, IDRISI,
ERMapper
• Importing images from other formats
– Includes aerial photos, satellite images, remotely sensed
data
Raster Assignment Strategies
• Conversion from vector data may be done directly or using a
lookup table
– Lookup tables must be used when converting character data
• Feature boundaries never coincide exactly with cell boundaries
– So you must use some strategy for assigning pixel values
• Majority-area method
– The cell is assigned the value of the class with the largest
area in the cell
• Priority-weighting method
– You can assign weights to each class so some small areas
take precedence over large areas
– Used to preserve small or narrow regions of high interest,
such as streams
Raster Compression Techniques
• Storage limits resolution, so strategies for compression are key
– Full Raster Encoding (no compression)
– Run-Length Encoding
– Value-Point Encoding
– Quadtrees
– Tiling (access strategy)
• Compression saves space, but requires time to save and extract
• Compression success varies with type of data
– Works best on data with low spatial variability and limited
possible values
– Works poorly with high spatial variability data or continuous
surfaces
• Thus, compression may actually increase storage space or access
time with some types of data.
Quadtree compression
• Add resolution only where needed by dividing into
quarters
Choosing the data model
• Raster advantages
– Simple data model
– Many spatial analysis functions often simpler and faster
– Efficient for data with high spatial variability
– Efficient for low spatial variability when compressed
– Easy to integrate with satellite and remotely sensed data
– Topological relationships are not explicitly encoded, some
analysis is more difficult
• Vector advantages
– Can store data efficiently with high precision
– Requires about 10% of space to store same data in raster
format
– Certain types of topological analysis are more efficient or
only possible with vector
– Gives much greater precision and accuracy
– Greater flexibility in storing and manipulating attribute data
Choosing a model for analysis
• Compare an arithmetic operation in raster and vector
– Raster
• Simply add two cells together
– Vector
• Must subdivide or intersect polygons first to build new
polygon coverage
• Then values in each new polygon may be added together
• Buffering (finding all areas within a certain distance of a
feature)
– Raster
• Change values of cells within specified range from the
target feature
– Vector
• Create circular polygons at regular intervals along the
feature arcs
• Intersect all the circles
• Dissolve arcs inside the circles
Choosing a model for analysis
• Operations best suited to raster analysis
– Overlays and arithmetic, boolean, and map algegra
operations
– Buffering
– Proximity, cost-distance
– Viewshed analysis (what parts of a surface can be seen)
– Any operations requiring continuous surfaces
– Projects involving data with high spatial variability
– Projects in which original data is raster (e.g. satellites)
• Operations best suited to vector analysis
– Connectivity, network modeling
– Point-in-polygon and Line-in-polygon overlays
– Overlays when many attributes are involved
– Evaluating contiguity
– Projects requiring high precision of stored data
– Projects in which attributes are primarily character data
Choosing a data model
Raster is faster
but
Vector is correcter
Choosing a data model
• Some GIS packages are primarily vector OR raster
– Raster: GRASS, IDRISI, MOSS
– Vector: Intergraph? MapInfo
– Integrated: Arc/Info, Arcview/Spatial Analyst
• Some allow you to convert between types with ease
• Often convenient to use one model primarily, but convert to the
other for certain operations
– Rasterization generally involves a loss of precision
– The precision loss is retained if data are re-converted to vector
Download