Khartoum University Faculty of Postgraduate Geo statistics Instructor : Dr.Samir Mahmoud Email: samiradm59@yahoo.com Phone: 0909864181 1 Course administration ⚫ Lectures: Tusday 03:30 – 5:30 ⚫ Classroom: Number ⚫ Course Grading :- Homework& Attendance 20% Midterm 20% Final Exam 60% ⚫ Homework : Assignments , Researches ⚫ Tests:Midterm Exam: Final Exam: 2 Course Objectives ⚫ To provide an overview of the Basic Components of Geostatistics which includes: ⚫ Mesuearuring geographical Distributions ⚫ – (Semi)variogram analysis – characterization of spatial correlation ⚫ – Kriging – Optimal interpolation; generates best linear unbiased estimate at each location; employs semi variogram model References •P. Goovaerts, 1997, Geostatisticsfor Natural Resources Evaluation, Oxford University Press, 483 pp. •E.H. Isaaksand R.M. Srivastava, 1989, An Introduction to Applied Geostatistics, Oxford University Press, 561 pp. •P.K. Kitanidis, 1997, Introduction to Geostatistics: Applications in Hydrogeology, Cambridge University Press, 249 pages. •R.A. Olea, 1999, Geostatisticsfor Engineers and Earth Scientists, Kluwer Academic Publishers, 303 pp. •Webster, R. and Oliver, M.A., 2007. Geostatisticsfor Environmental Scientists. Statistics in Practice. John Wiley & Sons, Chichester, 330 pp. •RossiterD.G., 2008. Spatial analysis and Geostatistics, lecture notes, ITC. •Hengl, T. 2009, A Practical Guide to GeostatisticalMapping. The Office for Official Publications of the European Communities Press, Luxembourg (ISBN: 978-92-7906904-8), 270 pp. Course Contents 1.Basic geostatistics 2.Measures of central tendency and dispersion 3.Probability 4.Measuring Geographic Distributions 5-Samplinig concept 6.Point pattern analysis 7.Spatial Analysis-Geo statistical Analyst 8.Experimental Variogram & Variogram Modeling 9.Geostatistical Estimation (Kriging& Co-Kriging) 1-Basic Geo statistics 1.1-Starting of Geo statistic ⚫ Geo statistics originated from the mining and petroleum industries, starting with the work by Danie Krigein the 1950's ⚫ Further was developed by Georges Matheronin the 1960's. ⚫ In both industries, geo statistics is successfully applied to solve cases where decisions concerning expensive operations are based on interpretations from sparse data located in space. 1.2-Geo statistic fields Geo statistics has since been extended to many other ¯fields in or related to the earth sciences, e.g.: hydrogeology, hydrology, meteorology, oceanography, geochemistry, geography, soil sciences, forestry, landscape ecology, Geology , Mining and others 1.3-What is Geo statistics? 1.3.1-Branch of statistical sciences: “Geo statistics can be defined as the branch of statistical sciences that studies spatial/temporal phenomena and locations of spatial relationships to model possible values of variable(s) at un observed, un sampled locations”(Caers, 2005) 1.3.2-Study of phenomena “Geostatistics: study of phenomena that vary in space and/or time” (Deutsch, 2002). 1.3.3-Way of describing the spatial continuity “Geos tatistics offers a way of describing the spatial continuity of natural phenomena and provides adaptations of classical regression techniques to take advantage of this continuity.”(Isaaksand Srivastava, 1989) 1.3.4-Analysis and interpretation of geoData ⚫ “Geostatistics is a subset of statistics specialized in analysis and interpretation of geographically and temporally referenced data” ⚫ “Geo statistics is an analytical tool for statistical analysis of sampled field data 1.4-Statistics & Geostatistics 1.4.1-Classic statistics Classic statistic :is generally devoted to the analysis and interpretation of uncertainties caused by limited sampling of a property under study. Classic statistics: examine the statistical distribution of a set of sampled data, geostatistics in corporates both the statistical distribution of the sample data and the spatial correlation among the sample data 1.4.2-Geostatistics Is not tied to a population distribution model that assumes, for example, all samples of a population are normally distributed and independent from one another. Most of the earth science data (e.g., rock properties, contaminant concentrations) Often do not satisfy these assumptions as they can be highly and/or possess spatial correlation 1.5-Geo statistic and Auto correlation Auto correlation: correlation between elements of a series and others from the same series separated from them by a given interval. (Oxford American Dictionary) (i.e., data values from locations that are closer together tend to be more similar than data values from locations that are further apart). Example of spatially auto-correlated parameters: in hydrogeology: thickness, porosity, hydraulic conductivity 1.6-Geo statistics and GIS They both deal with Spatial Data •There are tools allowing GIS+statistics integration •Gis software provides tools for geo statistical analysis Like spatial pattern analysis ,Spatial Analysis & Geo statistical Analyst 1.7-Spatial Variable in Space How does a variable vary in space What controls its variation in space? Where to locate samples to describe its spatial variability? How many samples are needed to represent its spatial variability? What is a value of a variable at some new location? What is the uncertainty of the estimate 1.8-Nature of variables •Many variables directly refer to processes and are expressed in quantity per time units e.g. mm of rainfall per year. •In ecology: objects of interest (individual plants or animals), often immeasurable in quantity animal species change their location dynamically, often in unpredictable directions and with unpredictable spatial patterns (nonlinear trajectories); occurrence records 1.9-Examples of environmental variables Biology :-(distribution of species and biodiversity measures) Soil science :- (soil properties and types) Vegetation science:- (plant species and communities, land cover types) Climatology:- (climatic variables at surface and benith/above) Hydrology:- (water quantities and conditions) Geology :-(rock types, rock porosity, element concentration) 1.11-Spatial Variability Is a result of complex processes working at the same time and over long periods of time, rather than an effect of a single realization of a single factor. Sum of two components: (a) the natural spatial variation (b) the inherent noise. 1.10-Classification of Spatial variability It may be classified into • Geographical variation (2D) • Vertical variation (3D) • Temporal variation • Variation at different scales (support size) 2.-Measures of central tendency and dispersion 20 Objective To learn how to find measures of central tendency in a set of raw data. Relevance To be able to calculate the most appropriate measure of center after analyzing the context of a study that might or might not contain extreme values. Central Values – Many times one number is used to describe the entire sample or population. Such a number is called an average. There are many ways to compute an average. ⚫ There are 4 values that are considered measures of the center. 1. Mean 2. Median 3. Mode 4. Midrange ⚫ Mean – the arithmetic average with which you are the most familiar. ⚫ Mean: Ex 2 ⚫ Mode values most found or distributed ⚫ Find the mode. ⚫ A. 0, 1, 2, 3, 4 - no mode ⚫ B. 4, 4, 6, 7, 8, 9, 6, 9 - 4 ,6, and 9 Midrange ⚫ The number exactly midway between the lowest value and highest value of the data set. It is found by averaging the low and high numbers. Example ⚫ Find the midrange of the set. ⚫ 3, 3, 5, 6, 8 Example a) Compute the mean for the entire sample. Measures of Variation ⚫ There are 3 values used to measure the amount of dispersion or variation. (The spread of the group) 1. Range 2. Variance 3. Standard Deviation Range ⚫ The range is the difference between the lowest value in the set and the highest value in the set. ⚫ Range = High # - Low # Example ⚫ Find the range of the data set. ⚫ 40, 30, 15, 2, 100, 37, 24, 99 ⚫ Range = 100 – 2 = 98 Variance (Array) ⚫ Variance Formula Standard Deviation ⚫ The standard deviation is the square root of the variance. Example – Using Formula ⚫ Find the variance.6, 3, 8, 5, 3 Find the standard deviation Find the standard deviation ⚫ The standard deviation is the square root of the variance. 3.Probability dispersion 36 3.1-Probability Definition ⚫ Definition: the chance or relative frequency of occurrence of the event ⚫ •It ranges between 0 ~1 ⚫ •The probabilities of all possible (mutually exclusive) events of an experiment must sum to 1 Binomial Probability Distribution A binomial random variable X is defined to the number of “successes” in n independent trials where the P(“success”) = p is constant. Notation: X ~ BIN(n,p) Binomial distribution General Formula : . Example on Probability using binomial distribution ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ If the Percentage pass of student in a course is 0.8 And we have 15 students find the :Probability for all student to pass Probability of 8 student Probability of 6 student Probability of no student q=1 - 0.8=0.2 Binomial distribution: example ⚫ If I toss a coin 20 times, what’s the probability of getting of getting 2 or fewer heads? 4.Measuring Geographic Distributions Definition ⚫ Measuring the distribution of a set of features allows ⚫ ⚫ ⚫ ⚫ you to calculate a value that represents a characteristic of the distribution, such as the center, Directional Distribution , Linear Directional Mean , Mean Center , Median Center & Standard Distance The Measuring Geographic Distributions toolset addresses questions such as: Where's the center? What's the shape and orientation of the data? How dispersed are the features? Measuring Geographic Distributions toolset Tool Description Central Feature Identifies the most centrally located feature in a point, line, or polygon feature class. Directional Distribution Creates standard deviational ellipses to summarize the spatial characteristics of geographic features: central tendency, dispersion, and directional trends. Linear Directional Mean Identifies the mean direction, length, and geographic center for a set of lines. Mean Center Identifies the geographic center (or the center of concentration) for a set of features. Median Center Identifies the location that minimizes overall Euclidean distance to the features in a dataset. Standard Distance Measures the degree to which features are concentrated or dispersed around the geometric mean center. Central Feature (Spatial Statistics) Definition: ⚫ Identifies the most centrally located feature in a point, line, or polygon feature class. • Syntax: • CentralFeature_stats (Input_Feature_Class, Output_Feature_Class, Distance_Method, {Weight_Field}, {Self_Potential_Weight_Field}, {Case_Field}) Parameter Explanation Parameter Explanation Data Type: Input_Feature_ Class Containing a distribution of features from which to identify the most centrally located feature. Feature Layer Output_Feature_ Class Contain the most centrally located feature in the Input Feature Class Feature Class Distance_Method Specifies how distances are calculated from each feature to neighboring features. EUCLIDEAN_DISTANCE —The straight-line distance between two points MANHATTAN_DISTANCE —The distance between two points measured along axes at right angles String Weight_Field (Optional) The numeric field used to weight distances in the origin-destination distance matrix. Field Self_Potential_ Weight_Field (Optional) The field representing self-potential—the distance or weight between a feature and itself. Field Case_Field (Optional) Field used to group features for separate central feature computations. (integer, date, or string) type. Field Directional Distribution Definition : Creates standard deviational ellipses to summarize the spatial characteristics of geographic features: central tendency, dispersion, and directional trends Syntax: DirectionalDistribution_stats (Input_Feature_Class, Output_Ellipse_Feature_Class, Ellipse_Size, {Weight_Field}, {Case_Field}) Parameter Explanation Parameter Explanation Data Type Input_Featur e_Class A feature class containing a distribution of features for which the standard deviational ellipse will be calculated. Feature Layer Output_Ellip se_Feature_C A polygon feature class that will contain the output ellipse feature. lass Ellipse_Size The size of output ellipses in standard deviations. The default ellipse size is 1; valid choices are 1, 2, or 3 standard deviations. •1_STANDARD_DEVIATION •2_STANDARD_DEVIATIONS •3_STANDARD_DEVIATIONS Feature Class String Weight_Field The numeric field used to weight locations according to their (Optional) relative importance. Field Case_Field (Optional) Field used to group features for separate directional distribution calculations. The case field can be of integer, date, or string type. Field Parameter Explanation Data Type Linear Directional Mean Definition : Identifies the mean direction, length, and geographic center for a set of lines • Syntax • DirectionalMean_stats (Input_Feature_Class, Output_Feature_Class, Orientation_Only, {Case_Field}) Parameter Explanation Parameter Explanation Input_Featur The feature class containing vectors for which e_Class the mean direction will be calculated. Output_Feat ure_Class A line feature class that will contain the features representing the mean directions of the input feature class. Data Type Feature Layer Feature Class •DIRECTION —The From and To nodes are Orientation_ utilized in calculating the mean (default). Boolean Only •ORIENTATION_ONLY —The From and To node information is ignored. Case_Field (Optional) Field used to group features for separate directional mean calculations. The case field can Field be of integer, date, or string type. Parameter Explanation Data Type Mean Center Identifies the geographic center (or the center of concentration) for a set of features. Syntax MeanCenter_stats (Input_Feature_Class, Output_Feature_Class, {Weight_Field}, {Case_Field}, {Dimension_Field}) Parameter Explanation Parameter Explanation Data Type Input_Feature_ Class A feature class for which the mean center will be calculated. Feature Layer Output_Feature _Class A point feature class that will contain the features representing the mean centers of the input feature class. Feature Class Weight_Field (Optional) The numeric field used to create a weighted mean center. Field Case_Field (Optional) Field used to group features for separate mean center calculations. The case field can be of Field integer, date, or string type. Dimension_Fiel d (Optional) A numeric field containing attribute values from which an average value will be calculated. Field Parameter Explanation Data Type Median Center Definition : Identifies the location that minimizes overall Euclidean distance to the features in a dataset. Syntax MeanCenter_stats (Input_Feature_Class, Output_Feature_Class, {Weight_Field}, {Case_Field}, {Dimension_Field}) Parameter Explanation Parameter Explanation Data Type Input_Feature_ Class A feature class for which the mean center will be Feature calculated. Layer Output_Feature _Class A point feature class that will contain the features representing the mean centers of the input feature class. Feature Class Weight_Field (Optional) The numeric field used to create a weighted mean center. Field Case_Field (Optional) Field used to group features for separate mean center calculations. The case field can be of integer, date, or string type. Field Dimension_Fiel d (Optional) A numeric field containing attribute values from Field which an average value will be calculated. Parameter Explanation Data Type Standard Distance Measures the degree to which features are concentrated or dispersed around the geometric mean center. Syntax StandardDistance_stats (Input_Feature_Class, Output_Standard_Distance_Feature_Class, Circle_Size, {Weight_Field}, {Case_Field}) Parameter Explanation Parameter Explanation Data Type Input_Feature_ Class A feature class containing a distribution of features for which the standard distance will be calculated. Feature Layer Output_Standa rd_Distance_Fe ature_Class A polygon feature class that will contain a circle polygon for each input center. These circle polygons graphically portray the standard distance at each center point. Feature Class The size of output circles in standard deviations. The default circle size is 1; valid choices are 1, 2, or 3 standard deviations. •1_STANDARD_DEVIATION •2_STANDARD_DEVIATIONS •3_STANDARD_DEVIATIONS String Weight_Field (Optional) The numeric field used to weight locations according to their relative importance. Field Case_Field (Optional) Field used to group features for separate standard distance calculations. The case field can be of integer, date, or string type. Field Parameter Explanation Data Type Circle_Size 5-Sampling Concept 57 5.1-Sampling Concept ⚫ We want to know the attributes of some population Mean, median, range, variance, distribution. . . ⚫ •It is usually not possible to observe all individuals •Sampling means that we only observe a portion of the population 5.2-Reasons for Sampling ⚫ Sampling can save money. ⚫ Sampling can save time. ⚫ For given resources, sampling can applied in be scope of the data set. ⚫ Because the research process is sometimes destructive, the sample can save product. ⚫ If accessing the population is impossible; sampling is the only option. © 2002 Thomson / South-Western Slide 7-59 5.3-Reasons for Taking a Census(calculate populations) ⚫ Eliminate the possibility that a random sample is not representative of the population. ⚫ The person authorizing the study is uncomfortable with sample information. © 2002 Thomson / South-Western Slide 7-60 5.4-Population Frame ⚫ A list, map, directory, or other source used to represent the population ⚫ Over registration -- the frame contains all members of the target population and some additional elements Example: using the chamber of commerce membership directory as the frame for a target population of member businesses owned by women. © 2002 Thomson / South-Western Slide 7-61 Under registration ⚫ -- the frame does not contain all members of the target population. Example: using the chamber of commerce membership directory as the frame for a target population of all businesses. 5.5-Random & Nonrandom Sampling ⚫ Random sampling • Every unit of the population has the same probability of being included in the sample. • A chance mechanism is used in the selection process. • Eliminates bias in the selection process • Also known as probability sampling © 2002 Thomson / South-Western Slide 7-63 Nonrandom Sampling • Every unit of the population does not have the same probability of being included in the sample. • Open the selection bias • Not appropriate data collection methods for most statistical methods • Also known as nonprobability sampling 5.6-Random Sampling Techniques ⚫ Simple Random Sample ⚫ Stratified (layer)Random Sample ⚫ Proportionate ⚫ not Proportionate ⚫ Systematic (have methodology)Random Sample ⚫ Cluster (or Area) Sampling © 2002 Thomson / South-Western Slide 7-65 5.7-Simple Random Sample ⚫ Number each frame unit from 1 to N. ⚫ Use a random number table or a random number generator to select n distinct numbers between 1 and N, inclusively. ⚫ Easier to perform for small populations ⚫ dificults for large populations © 2002 Thomson / South-Western Slide 7-66 5.8-Stratified (classified)Random Sample ⚫ Population is divided into non overlapping sub ⚫ ⚫ ⚫ ⚫ populations called strata A random sample is selected from each stratum Potential for reducing sampling error Proportionate -- the percentage of thee sample taken from each stratum is proportionate to the percentage that each stratum is within the population Disproportionate -- proportions of the strata within the sample are different than the proportions of the strata within the population © 2002 Thomson / South-Western Slide 7-67 5.9-Cluster( grouped in area) Sampling ⚫ Population is divided into non overlapping clusters or areas ⚫ Each cluster is a miniature(small pieces), or microcosm, of the population. ⚫ A subset of the clusters is selected randomly for the sample. ⚫ If the number of elements in the subset of clusters is larger than the desired value of n, these clusters may be subdivided to form a new set of clusters and subjected to a random selection process. © 2002 Thomson / South-Western Slide 7-68 5.10-Cluster Sampling Advantages and disadvantages ● ● Advantages • More convenient (suitable)for geographically dispersed populations • Reduced travel costs to contact sample elements • Simplified administration of the survey • Unavailability of sampling frame prohibits using other random sampling methods Disadvantages • Statistically less efficient when the cluster elements are similar. • Costs and problems of statistical analysis are greater than for simple random sampling © 2002 Thomson / South-Western Slide 7-69 5.10-Sampling Property :To know the attributes of some population Mean, Median, Range, Variance, Distribution . . . It is usually not possible to observe all individuals , Sampling means that we only observe a portion of the population 5.10.2-Sampling allow inferences The law of large numbers: The larger the sample compared to the population, the more the sample parameters approach the population parameters 2. The concept of probability sampling: Each individual has some defined chance of being sampled 3. The concept of representativeness: selected individuals are “typical” of the population. 5.10.3-Spatial Sampling Sampling in space: When the location of the sampled individual is recorded and used in the analysis. Extra inferences possible from spatial sampling: • Prediction at un sampled locations • Inference of spatial dependence: local or regional trends • Point-patterns: Dispersion / clustering • Directional statistics: alignment 9.Geostatistical Interpolating , Kriging& IDW(inverse distance weighted) Interpolation concept ⚫ Interpolation predicts values for cells in a raster from a limited number of sample data points. It can be used to predict unknown values for any geographic point data, such as elevation, rainfall, chemical concentrations, and noise levels. Interpolating an elevation surface A typical use for point interpolation is to create an elevation surface from a set of sample measurements. In the following graphic, each symbol in the point layer represents a location where the elevation has been measured. By interpolating, the values for each cell between these input points will be predicted. Kriging origin • The origin of the word kriging is from D.G. Krige, a South African mining engineer who in the 1950’s developed empirical methods for predicting grades at un sampled locations using the known grades of sampled at nearby sites How Kriging works ⚫ Kriging is an advanced geostatistical procedure that generates an estimated surface from a scattered set of points with z-values. ⚫ Unlike other interpolation methods in the Interpolation toolset, to use the Kriging tool effectively involves an interactive investigation of the spatial behavior of the phenomenon represented by the z-values before you select the best estimation method for generating the output surface. The IDW (inverse distance weighted) ⚫ The IDW (inverse distance weighted) and Spline interpolation tools are referred to as deterministic interpolation methods because they are directly based on the surrounding measured values or on specified mathematical formulas that determine the smoothness of the resulting surface. Kriging Assumption ⚫ Kriging assumes that the distance or direction between sample points reflects a spatial correlation that can be used to explain variation in the surface. ⚫ The Kriging tool fits a mathematical function to a specified number of points, or all points within a specified radius, to determine the output value for each location. ⚫ Kriging is a multistep process; it includes exploratory statistical analysis of the data, variogram modeling, creating the surface, and (optionally) exploring a variance surface. ⚫ Kriging is most appropriate when you know there is a spatially correlated distance or directional bias in the data. It is often used in soil science and geology. The kriging formula ⚫ Kriging is similar to IDW in that it weights the surrounding measured values to derive a prediction for an unmeasured location. The general formula for both interpolators is formed as a weighted sum of the data: Creating a prediction surface map with kriging(PRCATCAL) ⚫ To make a prediction with the kriging interpolation ⚫ ⚫ ⚫ ⚫ ⚫ method, two tasks are necessary: Uncover the dependency rules. Make the predictions. To realize these two tasks, kriging goes through a two-step process: It creates the variograms and covariance functions to estimate the statistical dependence (called spatial autocorrelation) values that depend on the model of autocorrelation (fitting a model). It predicts the unknown values (making a prediction). Kriging methods(PRACTICAL) ⚫ There are two kriging methods: ordinary and universal. ⚫ Ordinary kriging is the most general and widely used of the ⚫ ⚫ ⚫ ⚫ ⚫ kriging methods and is the default. It assumes the constant mean is unknown. This is a reasonable assumption unless there is a scientific reason to reject it. Universal kriging assumes that there is an overriding trend in the data—for example, a prevailing wind—and it can be modeled by a deterministic function, a polynomial. This polynomial is subtracted from the original measured points, and the autocorrelation is modeled from the random errors. . Variography model Fitting a model, or spatial modeling, is also known as structural analysis, or variography. In spatial modeling of the structure of the measured points, you begin with a graph of the empirical semivariogram, computed with the following equation for all pairs of locations separated by distance h: The semivariogram formula Semivariogram(distanceh) = 0.5 * average((valuei – valuej)2) ⚫ The formula involves calculating the difference squared between the values of the paired locations. ⚫ The image below shows the pairing of one point (the red point) with all other measured locations. This process continues for each measured point Calculating the difference squared between the paired locations ⚫ compute the average semivariance for all pairs of points that are greater than 40 meters apart but less than 50 meters. ⚫ The empirical semivariogram is a graph of the averaged semivariogram values on the y-axis and the distance (or lag) on the x-axis (see diagram below). Empirical semivariogram graph example(PRACTICAL) Spatial autocorrelation quantifies a basic principle of geography: things that are closer are more alike than things farther apart. Fitting a model to the empirical semivariogram ⚫ To fit a model to the empirical semivariogram, select a function that serves as your model—for example, a spherical type that rises and levels off for larger distances beyond a certain range (see the spherical model example below). ⚫ There are deviations of the points on the empirical semivariogram from the model; some points are above the model curve, and some points are below. ⚫ However, if you add the distance each point is above the line and add the distance each point is below the line, the two values should be similar.