Soil Science Division Digital Soil Mapping Primer Tom D’Avello|Suzann Kienast-Brown 2.22.2018 USDA is an equal opportunity provider, employer, and lender. Principles and Concepts • • • • • • • Overview Why digital soil mapping? Soil-landscape models Environmental covariates Training data Prediction of soil classes or properties Validation and uncertainty Overview Digital Soil Mapping – The generation of geographically referenced soil databases based on quantitative relationships between spatially explicit environmental data and measurements made in the field and laboratory (McBratney et al., 2003) – The spatial prediction of soil classes or properties from point data using a statistical algorithm Overview Digital Soil Map – Raster composed of 2-dimensional cells (pixels) organized into a grid – Each pixel has a specific geographic location and contains soil data Overview Conventional soil mapping – “Where is the boundary between two soils?” – Focus on the marginal areas Digital soil mapping – Central concept is well defined – Variation expressed across the landscape Why? Digital soil mapping Illustrate spatial distribution of soil classes or properties Can document the uncertainty of the prediction Documentation of tacit knowledge Reproducible and consistent Can be used for • Initial soil maps • Update or refine existing soil maps • Generate specific interpretations • Assess risk – Rapid inventory, re-inventory, and project-based management of lands in a changing environment • FLEXIBLE – – – – – Soil-Landscape Models Factors of Soil Formation (Jenny 1941) S = f (Cl, O, R, P, T) – Soils are a function of 5 environmental factors • Climate • Organisms • Relief • Parent Material • Time – Conceptual model (tacit knowledge) Soil-Landscape Models SCORPAN (McBratney et al., 2003) S = f (S, C, O, R, P, A, N) + ε – Soil, at a specific point in space and time • Soil classes, Sc • Soil attributes, Sa – Empirical quantitative function of environmental covariates • Soil (class, or directly or remotely sensed property) • Climate • Organisms • Relief • Parent Material • Age • N = Spatial Position – Plus an estimation of error or uncertainty Soil-Landscape Models SCORPAN facilitates – Quantification of the relationships between environmental covariates and soil classes or properties in a spatial context – Estimation of error or uncertainty of the spatial prediction of soil classes or properties Environmental Covariates Choosing covariates – What factors influence soil formation? • Link to pedological knowledge – What data do I have? Want? Need? – Data should represent the multiple SCORPAN covariates that influence soil development in your project area Multi-path smoothed wetness index Relative Position Environmental Covariates S = f (S, C, O, R, P, A, N) – Soil • Legacy soils data, field measurements – Climate • Temp, precip, ET, solar radiation, etc. – Organisms • Land cover, NDVI, spectral data and derivatives – Relief • Elevation derivatives – Parent material • Geology, spectral data and derivatives – Age • Landform models – Spatial position • Inherent in georeferenced data layers Elevation and spectral data/derivatives Environmental Covariates Choosing covariates – Consider resolution (spatial, spectral, temporal, radiometric) and what is required to meet the project needs • Target soil feature – Compare needs to the range of data available – Elevation and spectral derivatives are a powerful combination for predicting soil classes or properties in most areas • Project needs and physical features of the area may require only one of these data sources Sampling for Training Data Digital soil mapping is dependent upon the relationship between – Predictor variables (covariates) – Target soil feature (soil class or property) True for all modeling methods Samples of covariates representing the distribution of the target soil feature are required – Training data • “Trains” the model to predict similar occurrences Sampling for Training Data Prediction is successful when – Precise, observed locations of typical soil members are available Directed (purposive) field samples – Knowledge-based classification approaches – Should not be used exclusively Random or stratified sampling – More robust – Less prone to bias – Sampling design appropriate for modeling objectives Sampling for Training Data DIGITAL SOIL MAPPING = FIELD WORK Soil scientist engaged in digital soil mapping Predict Soil Classes or Properties Optimal set of SCORPAN covariates chosen Training data collected Apply a model to the data and predict soil classes or properties Predicting Soil Classes Classification is the process of predicting discrete classes – Sorting pixels into finite number of classes based on data values and distribution in feature space – If a pixel satisfies class criteria, then it’s assigned to that class Classification Unsupervised – Algorithm identifies clusters it the data • Non-informational clusters – Analyst must interpret resulting clusters • Informational classes – No training data required – Exploratory – K-means, ISODATA Supervised – Analyst selects training sites – Algorithm creates decision boundaries from class statistics to cluster data – Training data required – Beyond exploratory – Knowledge-based classification (ArcSIE), fuzzy classification, predictive modeling (random forests, neural networks) Unsupervised Classification ISODATA – Shows natural clustering in the data and how potential classes may be distributed across the landscape – Covariates include both terrain and spectral data derivatives Boundary Water Canoe Area Wilderness, MN Supervised Classification Predictive modeling (machine learning) – Classes and probability surface for one class from random forests (tree-based) – Covariates include both terrain and spectral derivatives Boundary Water Canoe Area Wilderness, MN Predicting Soil Properties Regression and interpolation methods predict continuous soil properties Regression – Estimates the relationship between the dependent (soil observations) and independent variables (covariates) – Linear regression, logistic regression, random forests Interpolation – Models spatial patterns based on values at known locations and the assumption that locations that are closer to one another are more similar than those that are further apart – Geostatistics (kriging) Regression Predictive modeling (machine learning) – Properties and associated probability surface from generalized linear model (regression) Beef Basin area, UT Predicting Soil Properties Raster stack – Continuous raster soil properties – Key soil property layers at fixed depths • 0-5cm, 5-15cm, 15-30cm, 30-60cm, 60100cm and 100-200cm • Target soil properties Total profile depth (cm) Plant exploitable soil depth (cm) Organic carbon (g/kg) pH (x10) Sand (g/kg) Silt (g/kg) Clay (g/kg) Gravel (m3 m-3) ECEC (cmolc/kg) Bulk density of fine earth (<2mm) fraction (excluding gravel) (Mg/m3) – Bulk density of whole soil (includes gravel) (Mg/m3) – Available water holding capacity (mm) – – – – – – – – – – – Prediction uncertainty Concept soil property 0-5 cm 5-15 cm 15-30 cm 30-60 cm Predicting Soil Properties Continuous soil property raster stack – Interpretations for management and use • The data stack becomes the database – Add slope, climate, etc. layers needed for calculating interpretations – Class data – taxonomic or technical Interpolation Geostatistics – Ordinary kriging applied to predict soil K concentrations Salt Lake Valley, UT Validation and Uncertainty All soil mapping methods rely on models – Conventional soil mapping • Qualitative • Relies on conceptual model (CLORPT) – Digital soil mapping • Quantitative • Relies on quantitative model (SCORPAN) All models approximate reality and are subject to error and uncertainty Validation and Uncertainty Quantitative nature of digital soil mapping – Lends itself to quantitative assessments of accuracy and uncertainty Communication of accuracy and uncertainty associated with soil spatial predictions is imperative Integral part of delivering modeled products given their use in resource management decision making and risk assessment Accuracy Predicted values on a map will deviate from true values to some extent Accuracy – Difference between predicted value at a location and measured value at same location – High prediction accuracy • Small difference between predicted and observed values Accuracy measures quantify prediction quality using a validation data set Accuracy Soil classes – User’s, producer’s, overall accuracy • Confusion matrix – Kappa, Tau index (weighted Tau), Brier score Soil properties – MSE, RMSE, standard error, modified R2 Uncertainty Sources – Positional accuracy of pedon location – Covariate accuracy (e.g., vertical uncertainty of DEM) – Soil class or property measurement (e.g., taxonomic classification or lab measurement) – Model structure (e.g., using linear model for curvilinear data) Uncertainty Soil classes – Memberships or probabilities – Class confusion measures (confusion index, distance files, entropy) Soil properties – Prediction intervals Uncertainty measures can be generated during the modeling process or calculated from results, and represented spatially Uncertainty Soil classes – confusion index As CI approaches 1, confusion between classes increases Similar measures: • Distance files in ERDAS IMAGINE • Entropy Powder River Basin, WY Brungard et al., 2015 Uncertainty Soil properties – prediction interval – Often shown as lower, mean, upper prediction interval (similar to low, RV, high) – Prediction interval width shows spatial variability of uncertainty Lower 90% Prediction Interval Beef Basin area, UT Mean Upper 90% Prediction Interval Prediction Interval Width Digital Soil Mapping Summary – Predicting soil classes or properties from field observations and environmental covariates using a statistical algorithm – Spatial representation of classes or properties – Includes estimates of accuracy and uncertainty – Consistency and reproducibility – Flexible product to meet variety of user needs Digital Soil Mapping Focus – Fundamental pedology • Knowledge of the soil resource as a natural body • Existing and newly acquired – Field component – Latest technological resources • Applied adaptively throughout process and in combination with soil knowledge Digital Soil Mapping Training Foundational Prerequisites Taken in the Following Order: 1. Spatial Analyst Workshop (NRCS-NEDC-000271) 2. Statistics for Soil Survey Part 1 (NRCS-NEDC-000400) 3. Intro to Digital Soil Mapping (NRCS-NEDC-000272) Digital Soil Mapping with ArcSIE Statistics For Soil Survey Part 2 Remote Sensing for Soil Survey Applications (NRCS-NEDC-000273) (NRCS-NEDC-000332) (NRCS-NEDC-000244) •Prerequisites •All 3 foundational prerequisites •Prerequisites •Statistics for Soil Survey Part 1 • Prerequisites • All 3 foundational prerequisites • Intro to Digital Remote Sensing (available on-line from Michigan State University)