Sistem Informasi Sumberdaya Lahan KULIAH KE 4-5 FORMAT DAN STRUKTUR DATA Data geografi miliki informasi yg t.a. 4 komponen : - posisi geografi (referensi spasial), informasi atribut, hubungan spasial dan waktu. Komponen ini dpt juga dinyatakan dgn pertanyaan (QUERY) dimana, apa, bagaimana hubungan dan kapan ? 1. Posisi Geografi Data geografi mrpkan btk data spasial. · Tiap obyek miliki lokasi spesifik · Posisi geografi, dpt disajikan dlm: – – – – – – koordinat Cartesian atau azimuth, hubungan identifikasi ke tetangga, hubungan lokasi linier, suatu ruang tertentu, penyandian nama tempat tertentu, bereferensi ke obyek tertentu 2. Atribut Geografi Atribut berfungsi menjelaskan data spasial. Ciri atribut bersifat multidimensi; satu obyek mempunyai banyak identitas. Mis. lereng di posisi ttt, punya identitas: panjang lereng, arah, bentuk, dst. Data atribut = data non-spasial = data tabular 3. Hubungan Spasial Hubungan spasial = hub antara obyek sgt beragam, kompleks, dan penting. Ada 4 kategori hubungan (atau disebut mempunyai topologi) yaitu: o keterkaitan (connectivity), o orientasi, o berdampingan (adjacency), dan o posisi dlm suatu ruang (containment) . 4. Waktu Waktu bersifat dinamik Informasi KAPAN data diperoleh, tentukan pemanfaatan data secara benar. – Mis: daerah persawahan 1 th yl kini kawasan pemukiman/industri. – Pola tanam MH beda dg MK Data waktu dideskripsikan dlm pengertian: o ukuran lama; selang waktu dari data-base yg ada, o resolusi; selang waktu dikumpulkan/agregasi waktu pengumpulan data, o frekuensi dan kecepatan waktu pengumpulan data. •Ada 2 tipe Database: • spatial database – digital base maps describing earth’s surface, layered on top one another – individual layers called themes/coverages • attribute database – describes characteristics/qualities of features on base maps – linked to spatial database by common item • The “where” of GIS is determined by coordinate (map) data structures, but … • The “what” of GIS is determined by tabular (relational database) data structures • Thus, tabular data are just as important as coordinate data_ Database systems • Three types of database models • • • • Relational model (what we just learned) Flat file Hierarchical model Network model Hierarchical Model • Model has a tree structure, with a root and children entities • Encodes many-to-one or one-to-one relationships • Advantages – Very efficient searching for queries encoded within the tree – Easy to understand and update • Disadvantages – Searches requiring more than one entity are not efficient – Tree architecture must be optimized for types of queries to be done – Inflexibility of searching makes it less than ideal for GIS systems, where queries are often exploratory and cannot be predicted in advance – Multiple parents are not allowed Hierarchical model Network Model • Entities can have multiple parents as well as children • Encodes many-to-one or one-to-one relationships; many-to-many relationships can be indirectly handled. • Advantages – Less redundant data storage then in hierarchical model – Provides greater flexibility in searching – Relationships are encoded in the database for quick retrieval of certain queries • Disadvantages – Additional complexity to store linkages, updating is more time-consuming – Still not as flexible as the relational data model Network model Relational Model • No hierarchy of entities; records are simply stored as a set of tuples – A tuple is a set of permanantly related fields. Also called a table, or file. • One-to-one or many-to-one relationships are encoded • Queries can be done using any field – I.e. there is no distinction between key and attribute fields • Tuples can be linked together using any field that is common to both – Relate or Join: a temporary link between two tables – Relational join: permanent combination of two tables into one file Relational model Why use a Relational Model? • Advantages – Highest flexibility of the three models for processing and queries – Easily modified and added to – Organization is simple to understand and communicate – Same database can be represented with less redundancy • Disadvantages – More difficult to implement – Slower performance due to absence of permanent links and pointers • Choice for GIS – GIS systems overwhelmingly choose the Relational model for its flexibility – GIS analysis is typically exploratory, requiring the flexible relational model Relational Database Model & Attribute Data Structures •Attribute data are stored in database tables •Tables are composed of: – fields and – records_ Relational Database Model & Attribute Data Structures • You may already be familiar with some types of relational databases –dBase –rBase –MS Access –MS Excel (database functionality) –Oracle, INFORMIX, INGRES, SQL Server –MySQL, PostgreSQL –INFO (in ArcInfo)_ Relational Database Model & Attribute Data Structures • ArcView uses tabular data formats from dBase, ASCII text, and INFO files tables are stored on the disk as • • • .dbf files, .txt files, or as binary files in INFO directories_ Relational Database Model & Attribute Data Structures •Each vector data source has an attribute table •Tables can be linked and joined (“related”) by use of common values in fields • Different types data that may have attribute tables in ArcView – Vector • • • • • • • point attribute polygon attribute line attribute node attribute* text attribute* route & event* CAD attributes – Raster • value attribute tables •Relationship between tabular and map data – one-to-one between features and records •when a selection is made, both the record and feature are selected Spatial Data Models • Vector Data Model – Points, lines, polygons – Stores topology • adjacencies • connectivity • containment • Raster Data Model – Based on cells or pixels – Each cell contains single attribute value (z) Raster Vektor Bentuk dan Struktur Data Spasial Data spasial dlm SIG t.a. bbrp lapisan data Setiap lapisan t.a. 3 tipe entitas data yaitu : titik, garis, poligon/bidang dan bentuk 3-d (volume): elevasi/ruang o titik : obyek geografi yg dikaitkan dg pasangan koordinat (x,y). Mis: profil tanah, stasiun cuaca. o garis (end, arc, polyline) : semua unsur2 linier yg dibangun menggunakan segmen2 grs lurus yg dibtk oleh 2 titik koordinat . Mis: sungai, jalan, pola kelurusan dll. o poligon : batas wilayah atau batas danau, berupa garis yg tertutup. o obyek volume (blok, polihedral, dll) libatkan unsur dimensi ke-3: ketinggian/ kedalaman. • Penyimpanan obyek-obyek tsb dibuat dlm 2 btk utama: raster atau vector Concept of Vector and Raster Real World Raster Representation 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 R T R T H R R R R R R T T H R T T R R Vector Representation point line polygon Struktur Data Raster o Data raster : btk data paling sederhana dlm SIG. o Raster ditampilkan dlm btk pixel; • titik : 1 pixel, • garis : kumpulan pixel dlm deretan ke 1 arah • Poligon: kumpulan pixel dlm berbagai arah dlm 1 bid. o Raster dibentuk oleh kumpulan sel atau pixel. Tiap pixel punya referensi sbg identitasnya yg terikat oleh lokasi geografis atau kolom dan baris, –Tiap pixel punya nilai tertentu. –Data yg dibtk oleh raster punya satuan 2-D. Raster data model 1.1. Jaringan Raster Sederhana o Posisi horizontal: diorientasikan ke arah barat-timur (ikuti konsep dlm inderja: dinomori dari kiri ke kanan). o Posisi vertikal (kolom) dinomori dr atas ke bawah, shg penomoran mulai dari bag. atas kiri. o Sistem raster ini punya 2 kelemahan. 1. sulit menentukan posisi absolut utk skala ttt (unsur piksel ttt). Mis.: obyek yg terletak pd 1 atau 2 piksel, obyek tidak tepat berada di antara kedua piksel. 2. sel yg berdampingan yg tidak terpisah; dlm hal ini hasilnya tgt pendefinisian berdampingan. Contoh (Gbr), pengertian jarak sel pusat dg tetangga seb. kanan-kiri dan atas-bawah adalah 1, tetapi jaraknya = 1.41 (akar 2) jika dikaitkan dg tetangga ke arah diagonal. Dibedakan: – keterkaitan-4-tetangga – keterkaitan-8–tetangga (jaraknya tidak sama, bbrp tetangga hanya bersinggungan pada titik ujung). 2. Penyimpanan Data Raster yg Efisien o Penyimpanan data RASTER perlu banyak ruang. Σ sel berpgrh pd ruang penyimpannya. o Utk kurangi beban penyimpanan data, dpt dilakukan dg (1) run-length encoding dan (2). chain codes atau value point encoding o Run-length encoding: (lihat Gambar) Nilai A Panjang 10 Baris 0 chain codes atau value point encoding: • • Ni lai A B Titik 23 29 – Pixel bernilai A tdpt pd baris 0,1, dan 2, pixel yg bernilai A terakhir pd kel ini tdp pd pixel 23 (baris 2 kolom 3) – Pixel bernilai B, pd kel.ini berakhir pd baris 2 kolom 9. • . A. penyimpanan data tanpa pengompakan B. run-length encoding . Nilai Panjang Baris A 10 0 – Obyek bernilai A pd baris 0 ada 10 piksel C value point encoding Ni lai Titik A 23 B 29 – Pixel bernilai A tdpt pd baris 0,1, dan 2, pixel yg bernilai A terakhir pd kel ini tdp pd pixel 23 (baris 2 kolom 3) – Pixel bernilai B, pd kel.ini berakhir pd baris 2 kolom 9. Struktur Data Raster Berhirarki o Sistem data quad tree, hslkan tampilan raster >padat krn gunakan bbg ukuran grid. Resolusi tinggi hanya jika perlu o Daerah homogen, gridnya besar; yg heterogen, kecil. o Nama quad tree dr pengurangan 4 x ukuran piksel pd setiap lapisan (peta). Struktur ini digambarkan sbg pohon (Gbr). Struktur Data Vektor Digunakan utk obyek yg memerlukan posisi akurat. Obyek di muka bumi diikatkan ke posisi peta menggunakan sistem koordinat XY. Obyek titik direkam sbg pasangan individual XY, garis sbg rangkaian koordinat2 XY, dan area sbg rangkaian pasangan koordinat XY yg tertutup. Ada 2 btk umum utk gambarkan data vektor yi: - model data Spaghetti – model data Topologi. Spatial Data Structures • The Vector Data Model – Linking spatial and attribute data – Spaghetti vs Topological data models – Constructing Topology – Advanced topological structures: Routes and Regions • Surfaces in vector models Aronoff, 1993. Raster vs Vector Data Structures • Raster data stored as an array of values – Georeferencing is implicit in the structure – Usually defined by one corner of the image and the cell size – Attributes are defined by the cell values (no character data!) – One attribute for each raster file Vector data • Vector data are stored as a series of xy coordinates – – – – Points are stored as single xy coordinate Lines are a string of xy coordinates Polygons are composed of one or more lines and a label Attributes are attached to each feature through a unique numeric code – Many attributes may be stored in each vector file Topological models • Spaghetti models store only the coordinate data – Spatial relationships between features is not recorded – Finding two adjacent polygons would require reading in all the data and comparing the coordinates of all the polygons. – Thus, spaghetti models are very inefficient for doing spatial analysis Topological models • Topological models store coordinate data and spatial relationships – Topology is the mathematical method used to define spatial relationships – Spatial relationships between features are explicitly encoded, making it very easy to determine if polygons are adjacent, if arcs connect, etc. – Highly desirable model if spatial analysis is to be done on the data Topological Relationships • Connectivity: property of being interconnected – Applies to roads, streams, pipelines, electric grids etc – Used to evaluate travel or delivery routes for efficiency, speed – Used to optimize transportation scheduling, etc. • Contiguity: the property of being adjacent – Is this parcel of land adjacent to an industrial site? – Adjacent habitats? – Zoning districts? – Evaluating clear cuts in a forest: don’t want two adjacent sites cleared Coding connectivity Arc-polygon topology • List arcs in each polygon • Avoid storing coordinate data more than once A = the world polygon Coding Contiguity • Telling if polygons are adjacent • Arcs have direction • Polys are adjacent if they share an arc Need world polygon for outside arcs Constructing Topology • Point data and line data are stored in files of spatial coordinates • Additional tables are constructed to hold topology – Method relies on each feature having a unique numeric code • Polygon topology – Polygon table lists the arcs that make up each polygon – Each arc coordinate is stored once, even if it belongs to two polygons • Arc-node topology (establishes connectivity) – Nodes are uniquely numbered – Nodes are listed for each arc – Arcs that share a node are connected • Left-right topology (establishes contiguity) – Arcs have direction established by assigning a from-node and a to-node – The arc topology stores the number of the polygons to the left and right of the arc Which is better? • Spaghetti Model Advantages – Data structures are simpler and more compact • BUT – More complex software needed to perform spatial analysis – Spatial analysis is more time-consuming Which is better? • Topological Models Advantages – Spatial analysis does not require accessing coordinate data, eliminating time-consuming calculations to derive spatial relationships from coordinates • BUT – Topology must be built or rebuilt whenever a map is entered or edited. This process can be time consuming for large coverages. – Topological data models require more complex data structures than the spaghetti models – Editing must be done without topology (less convenient), or with topology (slower and prone to instabilities). Networks (route systems) – Routes are composed of sections. Each section is a single arc, or piece of an arc. – Allows different attributes to be stored along different sections of the same arcs – Allows attributes to be defined using length measures (such as mileposts) instead of in x-y coordinates. – Useful for utility, transportation, and hydrologic modeling and analysis Vector surface representations • Contours – Stored as an arc coverage with contour values as attributes – Familiar format for storing surfaces that most people can read easily – Difficult to perform analysis on contours – Uses: Displaying surfaces on maps; input to create TINs and raster surfaces Sources of vector data • Conversion from raster data (vectorization) – Precision will never be better than original raster file – Best when performed on very clean, simple scanned maps • Digitizing of maps • Importing data already in vector format – DLGs = Digital Line Graphs, USGS (hypsography, hydrography, boundaries, roads) – TIGER = Census blocks and associated attribute data – Data from other programs • Classification of or mapping on raster images – Including aerial photos, satellite images, remotely sensed data • Generating coverages from lat-lon coordinates – Creating a point coverage of wells from their lat-lon coordinates – Global Positioning Systems (GPS) are often used to generate coordinates Choose a vector model when • Data are categorical and/or discrete, not continuous – distance to roads, versus soil types • High precision is required – land parcel databases--a meter matters! • High quality map output is required – vector files smaller, print better with less memory involved • Extensive attributes for features • Analysis involves questions of contiguity, adjacency, network behavior Spatial Data Structures • The Raster Data Model – – – – How raster data are stored The resolution problem Assignment strategies Compression techniques • Surfaces • Choosing a data model Raster vs Vector Data Structures • Raster data stored as an array of values – Georeferencing is implicit in the structure – Usually defined by one corner of the image and the cell size – Attributes are defined by the cell values (no character data!) – One attribute for each raster file • Vector data are stored as a series of xy coordinates – Points are stored as single xy coordinate – Lines are a string of xy coordinates – Polygons are composed of one or more lines and a label – Attributes are attached to each feature through a unique numeric code – Many attributes may be stored in each vector file Storing Raster Data • Layout of a raster file – rows, columns, and cells (pixels) • Representation of features – Individual points, lines, or polygons – Maps and surfaces Cells or pixels Raster georeferencing – Must know coordinates of one corner, and the cell size – Some systems require all layers to be precisely matched – Usually one attribute per layer Raster data types • Data types – Ascii: stored as arrays of integers or floating point characters • Simple, readable by almost any software, but INEFFICIENT – Binary: stored in binary format • Not as transportable, but very efficient – Values may be stored as integers or floating point data. – Floating point takes about 4 times as much space! – Often worthwhile to convert floating point to integer. Images vs grids • In Arcview, images are simple raster data for display only • Grids are a proprietary format for storing raster data – may be analyzed and manipulated – Only available for use with Spatial Analyst • A grid may be opened as an image, but not viceversa! • Other GIS packages will have their own “grid” formats Images • Raw image format (raster) – basic binary data arranged in arrays, one row per line – each pixel usually one byte (0-255), • 0 is black, 255 is white – images may have one or more bands • bands usually represent brightness in different color ranges • true-color images have red,green,blue bands • others may have 7, 14, 225 bands!!! – header file gives important size and georeferencing info • rows, cols, bands, Upper Left, pixel size, projection, etc. Single band grayscale image Storing Raster Map Data • Grid data are stored in special format for display and analysis • One grid layer per theme (not multiband) – elevation, geology, rainfall, etc each in separate file Raster Data Resolution • Rasterizing involves some idealization of the map or objects • Raster data are stored with a cell size that determines the resolution – Representing features precisely requires high resolution (small cell size) • Storage requirements increase by the square of the image dimensions – A 16x16 image has 256 pixels; a 32x32 image has 1024 pixels! • Resolution can affect estimates of area and length • The raster model is not suitable for applications requiring high precision, such as land information systems Raster resolution Vector map 16x16=256 bytes 32x32=1024 bytes Sources of raster data • Conversion from vector data (rasterization) – Nearly always involves a loss of resolution and precision • Scanning of maps on a B&W or color scanner – Usually requires some processing to assign categories – Especially problematic with color images due to slight variations in hue on map • Importing data already in raster format – DEMs = Digital Elevation Models, USGS – GIRAS = Landuse Maps from the USGS – Data from other programs, such as GRASS, ERDAS, IDRISI, ERMapper • Importing images from other formats – Includes aerial photos, satellite images, remotely sensed data Raster Assignment Strategies • Conversion from vector data may be done directly or using a lookup table – Lookup tables must be used when converting character data • Feature boundaries never coincide exactly with cell boundaries – So you must use some strategy for assigning pixel values • Majority-area method – The cell is assigned the value of the class with the largest area in the cell • Priority-weighting method – You can assign weights to each class so some small areas take precedence over large areas – Used to preserve small or narrow regions of high interest, such as streams Raster Compression Techniques • Storage limits resolution, so strategies for compression are key – Full Raster Encoding (no compression) – Run-Length Encoding – Value-Point Encoding – Quadtrees – Tiling (access strategy) • Compression saves space, but requires time to save and extract • Compression success varies with type of data – Works best on data with low spatial variability and limited possible values – Works poorly with high spatial variability data or continuous surfaces • Thus, compression may actually increase storage space or access time with some types of data. Quadtree compression • Add resolution only where needed by dividing into quarters Choosing the data model • Raster advantages – Simple data model – Many spatial analysis functions often simpler and faster – Efficient for data with high spatial variability – Efficient for low spatial variability when compressed – Easy to integrate with satellite and remotely sensed data – Topological relationships are not explicitly encoded, some analysis is more difficult • Vector advantages – Can store data efficiently with high precision – Requires about 10% of space to store same data in raster format – Certain types of topological analysis are more efficient or only possible with vector – Gives much greater precision and accuracy – Greater flexibility in storing and manipulating attribute data Choosing a model for analysis • Compare an arithmetic operation in raster and vector – Raster • Simply add two cells together – Vector • Must subdivide or intersect polygons first to build new polygon coverage • Then values in each new polygon may be added together • Buffering (finding all areas within a certain distance of a feature) – Raster • Change values of cells within specified range from the target feature – Vector • Create circular polygons at regular intervals along the feature arcs • Intersect all the circles • Dissolve arcs inside the circles Choosing a model for analysis • Operations best suited to raster analysis – Overlays and arithmetic, boolean, and map algegra operations – Buffering – Proximity, cost-distance – Viewshed analysis (what parts of a surface can be seen) – Any operations requiring continuous surfaces – Projects involving data with high spatial variability – Projects in which original data is raster (e.g. satellites) • Operations best suited to vector analysis – Connectivity, network modeling – Point-in-polygon and Line-in-polygon overlays – Overlays when many attributes are involved – Evaluating contiguity – Projects requiring high precision of stored data – Projects in which attributes are primarily character data Choosing a data model Raster is faster but Vector is correcter Choosing a data model • Some GIS packages are primarily vector OR raster – Raster: GRASS, IDRISI, MOSS – Vector: Intergraph? MapInfo – Integrated: Arc/Info, Arcview/Spatial Analyst • Some allow you to convert between types with ease • Often convenient to use one model primarily, but convert to the other for certain operations – Rasterization generally involves a loss of precision – The precision loss is retained if data are re-converted to vector