New Method for automatic lineament extraction. Case Study: Turcianska Kotlina, Slovakia Osnova dle pokynů od Elsevier pro Computers & Geosciences: 1. 2. 3. 4. 5. 1. Introduction Material and methods Results Discussion Conclusions Introduction State the objectives of the work and provide an adequate background, avoiding a detailed literature survey or a summary of the results. The aim of this paper is to present a new method for automated lineament extraction. The method is tested on study area and compared with results of previous geomorphological research. The analysis is made in two scales to prove scale resistance of method. The DEM or different derived surfaces from DEM were used for automatic lineament delineation by other authors, especially shaded relief (Abdullah et al., 2010, Masoud, Koike, 2011, Jordan, Schott 2005), second derivatives of DEM (Wladis, 1999) and pure DEM (Vaz, 2011, Mallast et al. 2011). Authors use different methods of (semi)automated lines extraction. The most authors use image pre-processing (edge enhancing, thresholding) followed by edge linking methods (Hough Transform). In some cases, the pre-processing is part of the extraction (closed software modules). Pradhan, 2010 used manually extraction method based on automatically pre-processed images with enhanced edges. Abdullah et al., 2010 uses software PCI Geomatica with module LINE, which is used for extraction of linear features from raster images. Mallast et al., 2011 uses software ERDAS Imagine modules and PCI Geomatica. Argialas, Mavranza 2004 uses optimized Hough Transform method developed by Fitton and Cox, 1998. Pinto et al., 2013 uses Hough Transform and software LESSA developed by Zlatopolsky, 1992. The comparison of extracted lines with known reference data is commonly made in many papers. The geological fault lines or expert lineaments are frequently used as reference data. Although, the methodology of comparison differs paper to paper, the subjective visual assessment method is used by all authors. Some papers apply more objective quantitative methods. Abdullah 2010 computes simple statistic of count and length of lineaments to compare different datasets. The raster comparison approach is implemented in Vaz 2012. The distance between lineaments and reference point data (wells) is calculated as a comparison metric in Mallas et al. 2011 to prove correlation with hydrologic data. 2. Data The goal of this article is to compare automatically extracted lines with data from previous geomorphologic research made by Minár, Sládek 2009. The results were compared in two scales 1:50 000 (Area A on Figure 1) and 1:10 000 (Area B on Figure 1). Fig 1. The study area is located in Slovakia in Turcianska kotlina. Two locations with different scales are partially overlapped. 2.1. Scale 1:50 000 The dataset Expert corresponds to expert lineament map delineated from topographic map 1:50 000. The dataset Expert generalized is made by simplification and emphasizing main directions of dataset Expert (Minár, Sládek 2009). The dataset Geology fault lines corresponds to detected and expected fault lines from Geology map 1:50 000. The datasets Auto 30 m and Auto 50 m presents lines extracted by algorithm described in this paper. The length and count statistic is shown in table 1. Tab. 1: The descriptive statistic of datasets 1:50 000 2.2. Scale 1:10 000 There are analogous datasets with the difference that geology dataset is missing due to insuficient data coverage of study area with detailed geology maps (1:10 000). The table 2 summarize the descriptive statistic of datasets 1:10 000. 3. Tab. 2: The descriptive statistic of datasets 1:10 000 Method of automated lineament extraction Provide sufficient detail to allow the work to be reproduced. Methods already published should be indicated by a reference: only relevant modifications should be described. This paper presents a new method of automated lineament extraction based on the raster image analysis of digital elevation model (DEM) derived surfaces and post-processing results to obtain the final lineament map. The method is composed of six steps (Figure 1): a) creation of DEM, b) deriving hillshades from DEM, c) line extraction based on edge detection, d) noise removing, e) cluster line analysis, f) classification of lineaments, . Fig. 1: The workflow of new method for automated lineaments extraction The results are then compared with geomorphologic research data from the study area to evaluate the usefulness of the automatic method. The method of comparison is described in the section 4. 3.1. Creation of DTM The parameters of DEM (source and spatial resolution of DEM) crucial influence the quality and scale of the results. The source of DEM impacts on spatial accuracy and noise of the results and directly determines the maximal value of the spatial resolution parameter. Depending on scale of the analysis, the spatial resolution should be chosen. In our study area, two DEMs interpolated from contours of topographic maps 1:10 000 and 1:50 000 were used. The spatial resolution was chosen 30 m. After choosing the appropriate DEM, pre-processing operations must be done. Some part of algorithm is based on hydrological principles, thus the operation of filling the cells with an undefined drainage direction (Fill algorithm from ESRI, 2013) is necessary to be done. 3.2. Derived surfaces of DTM Authors using shaded relief for lineaments extraction mentioned the dependency of results on illumination azimuth. Mallast et al., 2011 and Abdullah et al., 2010 tend to avoid this azimuth bias using combination of differently illuminated rasters to one raster which is used to extract results. In this paper, this bias behaviour is taken as an advantage. The differently illuminated shaded reliefs are used to extract different results which are processed separately. The parameters which influence the shaded relief are spatial resolution, altitude of illumination (height of light source) and azimuth of illumination (angle of light source). The resolution of hillshade raster is the same as input DEM. Although the altitude of illumination influences the image contrast, the changing the value minimally effects the results. The value of 30° was chosen for our study area. The azimuth of illumination has distinctive impact on results, thus the variations of this parameter is used. The value of this parameter ranges from 0 to 360° using step 15°. Every from 24 hillshades rasters is input for line extraction module. The edge enhancing filters are commonly applied to rasters before edge detection methods. In this paper, the enhancing filters are part of the line extraction algorithm – module LINE from the software PCI Geomatica. 3.3. Line extraction The software PCI Geomatica with module LINE and variant of Hough Transform (HT) used by Fitton and Cox, 1998 were tested for line extraction. The results from PCI software were much more corresponding with the dataset of expert lineaments than the results from HT thus the PCI software was chosen to fulfil the line extraction step. The detail description of workflow and parameter setting of LINE module is already written in internal PCI Help and papers Abdullah et al., 2009 and Mallast et al., 2011. Quoted authors used linking and generalization abilities of LINE module. In our tests, this options made noisy and meaningless results (see Figure X with example), which led to switch off this parameters. The other parameters were set based on trial-and-error experiments as following: Binary threshold GTHR = 10, Length threshold LTHR = 10, Radius of Gaussian filter: RADI = 10, No generalization: FTHR = 1, Switch off the line linking: ATHR = 0, DTHR = 0. 3.4. Noise reduction The most of authors reduce noise using different techniques. Eg. length thresholding (citations), morphological operation closing (dilation and erosion) for raster images (Mallast et al., 2011). In the present work, the data to reduce was represented by huge amount of vector lines. The algorithm removes all non-relevant lines. In this case, the relevance is defined in the following manner: line is relevant if lies on specific place where sufficient number of other lines with similar length and orientation is located. The sufficient number of lines is called as frequency threshold and is depended on total number of line sets. (It is recommended to choose this parameter interactively based on visual interpretation of results after applying noise reduction.) Finding of relevance for each line is based on raster approach. To remove lines which are standing alone is the main principle. To achieve this, all vector line sets are converted to binary rasters. The conversion is applied to buffers created around each line in order to set spatial tolerance. The buffer size is chosen with respect to raster spatial resolution and character of the study area. The raster values in the corresponding cells are counted to create one raster called raster of relevance. The high values represent areas with high occurrence of lines (see Figure X.). The information from raster of relevance is transferred to every line of all sets. The frequency threshold is applied to every line. Only lines with higher value of raster of relevance are preserved. The logical core of this method is used also for classification of final lineaments (see section 3.6). Although, the raster approach is faster than process set line by line, not every line is well evaluated. (see examples). The noise removing is necessary step before the cluster analysis to decrease total number of lines to speed up the cluster analysis. 3.5. Cluster line analysis As it is seen from the Figure x, the lines create clusters. The meaning of this step is to recognize all the clusters and replace the bundle of lines with representative single line. The bundles are obvious easily seen by eye, but the computer must use the cluster line algorithm. Mention Cluster analysis examples from other authors (KIV). But there is no analysis for this specific task thus own algorithm for cluster line analysis was developed. Figure X. Sets of lines obviously creates clusters The basic principle is to trace every line and explore its local neighbourhood to identify similarly oriented lines. To facilitate the analysis, all line sets are merged to one layer and the statistic of azimuth and length are calculated. The merged set is sorted descendant by line’s length and for each line is made the following workflow: 1. 2. 3. 4. 5. Choose first (longest) line from the set Make buffer around chosen line Select all lines which are completely within the buffer Fine selection to lines which has azimuth in the range +/- 20° from the chosen line If the selection contains more than 4 lines continue to step 6, otherwise continue to step 7 6. Create buffer around selected lines (=cluster) with the following attributes: count of selected lines, average length, average azimuth 7. Delete all selected lines from the set 8. Repeat from the step 1 The setting of algorithm’s variables (buffer size, count threshold, azimuth condition) was made based on analysis of line bundles (average width, count, azimuth dispersion). The cluster line analysis results in identifying clusters. Each cluster is saved as polygon feature with all necessary attributes. The post-processing makes average line from each cluster using centroid of the polygon and average azimuth and average length. The set of average lines is final lineament layer. 3.6. Classification of lineaments The extracted lines have origin in discontinuities of raster image. The geomorphological meaning of extracted lines is given using of shaded relief raster as input to the analysis. To classify different geomorphological structures, the positive and negative lineaments are defined. The positive lineaments represent ridges and negative lineaments represent valleys. To interpret these types separately, it is necessary to distinguish these classes. (Citations about different geomorphological meaning of these types). Abdullah et al., 2010 says that using of specific azimuth of illumination is able to distinguish positive and negative lineaments. This hypothesis is possibly true only for specific location. The tests proved that it is not generally valid. Other authors classified positive and negative lines in these ways … (citations). The own algorithm was developed to classify ridges and valleys lines. The main principle is to assess the lineaments vicinity to the drainage network. Logically, the vicinity is high in the case of negative and low in the case of positive lineaments. Figure X. Classification of lineaments to positive and negative classes In step 3.4, the algorithm which assesses the frequency of lines was presented. The same algorithm is adapted here. Instead of raster of relevance, the water accumulation raster is used as an input. For each line, the mean and median value of water accumulation is computed. These two statistics are used to differentiate between line on the ridge and line in the valley. The median is required to avoid uncertainty in some specific cases where mean could be influenced by few pixels with high value of accumulation. 4. The comparative methods In this paper, the quantitative and qualitative methods are used to assess the automatically extracted lines. The quantitative methods statistically describe datasets and their relation. The qualitatively methods visually assess the datasets and make interpretation based on results of quantitative methods. 4.1. Correlation method The vector based comparison is proposed. The main principle is to imitate visual assessment, in other words the method tries to find if the line from the test dataset has corresponding line in the reference dataset. The proposed quantitative geometric method does not replace visual assessment. More than geometric values are needed to evaluate geomorphologic quality of lineaments. But this method offers good comparison metric. The proposed method computes mutual correlation of two vector datasets - dataset A and dataset B. Firstly, dataset A is compared to B, then dataset B compared to A, both correlation index are important to evaluate the mutual correlation. For each line from dataset A, the algorithm finds similar lines from dataset B. The similarity is defined by spatial vicinity and azimuth tolerance. The correlation index is computed like length ratio of founded lines. The ratio is always less or equal? than 1. The workflow of the comparison method is illustrated on the figure X. The method is driven by two parameters, the search radius (size of line’s buffer) and azimuth tolerance. The search radius parameter depends on study area and scale of the research. It expresses the maximum distance between two lines to be considered as similar. The azimuth tolerance parameter expresses the maximal deviance between line azimuths to be considered as similar. In this study, the search radius 250 m and azimuth tolerance 25° was used for comparison parameters. Figure X. The workflow of comparison algorithm 4.2. Directional analysis The morphometric statistics are widely used by authors (citations). Statistics which support the lineament interpretation are rose diagrams, length and azimuth distribution (citations). The algorithm which computes length and azimuth form coordinates was applied for each line. Software like GEORIENT can be used to plot rose diagrams from this data. (citations for Georient). The rose diagrams are usually plotted using classes 5 or 10°. It means that numbers of lines from intervals 0-4°, 5-9°... are cumulated to classes before plotting. This approach has one disadvantage. Changing the start of interval (1-5°, 6-9°...) produce differently appeared rose diagrams which could lead to different interpretation (see Figure X a and b). Figure X c) shows rose diagram independent to start interval (interval size is 1°). But this diagram is too detailed and difficult to interpret. The solution of this problem is to use moving average line to smooth the detailed data and emphasis the main trends. The software GEORIENT is not capable to plot trend lines, thus histogram graph in Excel was chosen as appropriate solution. For each angle in x-axis, the graph shows the relative count of lines on y-axis. The graph is interspersed by 7 values moving average line computed from three previous, current and three next values (see Figure X d). Fig X: Methods of directional analysis. Top: Rose diagrams of one dataset, a) 5° classes, start interval 0°, b) 5° classes, start interval 1°, c) 1° classes. Bottom: histogram with moving average 5. Results Results should be clear and concise. Results of lineament extraction on study area. 5.1. Scale 1:50 000 Comparison of methods Quantitative comparison Tab. X: Results of statistical comparison of datasets 1:50 000 Interpretation Figure X. Dataset to compare 1:50 000 Figure X. Results of directional analysis of datasets 1:50 000 with markers of main directions. 5.2. Scale 1:10 000 Tab. X: Results of statistical comparison of datasets 1:10 000 Figure X. Dataset to compare 1:10 000 Figure X. Results of directional analysis of datasets 1:10 000 with markers of main directions. Subjective qualitative comparison 6. Discussion This should explore the significance of the results of the work, not repeat them. A combined Results and Discussion section is often appropriate. Avoid extensive citations and discussion of published literature. Vliv parametrů na výsledky, možnosti ovlivnění výsledků uživatelem 7. Conclusions The main conclusions of the study may be presented in a short Conclusions section, which may stand alone or form a subsection of a Discussion or Results and Discussion section. Zde tech. detaily, aplikace v jiném článku References MINÁR, J., SLÁDEK, J. (2009). Morphological network as an indicator of a morphotectonic field in the central Western Carpathians (Slovakia) In Z. Geomorph. N.F. Berlin, p. 23-29.