Research Project Report Localized Forgey Detection in Hyperspectral Document Images Paul Luo 1. Abstract: This research aims to detect document forgery in handwritten notes based on ink spectral information. Multiple criterion methods, based on the local information of the target window, to determine ink numbers were conducted. The comparison of the results generated from different anomaly detection methods was presented. The accuracy results showed that the methods implemented in the project were helpful to determine the number of different inks in a window. Finally, analysis of the results and possible future improvements were stated. 2. Introduction: There has been a significant growth in the need of fraud identification, to assess whether a certain document is a forged. Forensic experts investigate either the writing styles or types of inks in a questioned document. In the latter case, if a writing consists of more than one type of ink, investigators can be certain that some manipulation towards the document has taken place. However, when different types of inks with the same color are similar to each other in appearance; investigators can neither recognize the difference through naked eye nor with plain RGB analysis. In [1], Khan et al. proposed a hyper spectral document analysis method to recognize the different types of inks by analysing the image spectral responses in 33 different bands. Based on the information collected from this method, applying certain clustering algorithms, investigators can detect the presence of different types of inks in a handwritten note. This method can be applied to the whole image, however, in some scenarios, only a fraction of the handwritten note under question is extracted. Under all circumstances, the information about the number of the inks is not available. On the other hand, traditional clustering algorithms, such as k-means, need specification of the number of expected clusters as an input parameter. As a result, traditional clustering algorithms are not feasible for real forgery detection scenario. In this report, we present a method based on k-means to determine the number of inks in a handwritten note that can be used for writing inks mismatch detection. Unlike the system described in [1], this work focuses on how to determine the number of different inks in a local region of interest. We introduce a couple of different techniques to improve the results, and the effects of the techniques vary. 3. Implemented methods: Determining the number of different inks in a given local image region is, in fact, to recognize the number of different clusters in a given dataset. But what would be the key differences between a two clusters dataset and a one cluster dataset? Thus, the key to solve this problem is to identify the indicator for … 1). Schwarz’s criterion: Moore et al. introduced the Schwarz’s criterion [2]. Schwarz’s criterion is based on k-means clustering method, it’s a simple and universal method to decide the number of centers in a given dataset. The criterion determines k, the number of clusters, by measuring the distance between data points and their corresponding cluster centers. This is done by choosing a k value that minimizes the sum of mean center-to-points distances and a factor concerning the number of clusters. Furthermore, the predicted number of different inks in a given region of interest. The following criteria (Eq. 1) is used for selecting the number of ink types. k arg min( Ek means mk log N ) k (Eq. 1) Where Ekmeans is the average point-to-center distance in k-means, λ is a weighting factor, m is the length of each data point x, and N is the total number of data points. In these parameters, the weighting factor λ needs to be manually estimate, where smaller the λ, more likely the larger k would be the result. Its validity is built on the assumption that average distance in a dataset of two clusters declines sharper as k grows compared to a data of single cluster. However it may not be feasible in case when data in different clusters are highly similar. As a result, the Schwarz’s criterion didn’t show promising result in our experiment, for large value of λ, the area judged as k 2 was too narrow compared to the ground truth, whereas for small value of λ, the criterion resulted in a large area of confusion, as shown in Figure 13. Figure.1 Ground Truth Figure.2 Schwarz’s result (λ too large) Figure.3 Schwarz’s result (λ too small) 2). Point-to-point distance criterion As the Schwarz’s criterion doesn’t effectively solve the problem, we came up with another criterion. This criterion is also based on k-means, however, point-to-point (P2P) distances are calculated instead of center-to-point distances. In this criterion, we presume that the given region consists of two different inks and apply k-means algorithm toward the dataset. K-means would generate two clusters of data, calculate the average value of the minimum distances between each data point and the opposing cluster. The corresponding equation (Eq.2) is used to calculate the criterion value. dist avg (min Dp1 C2 min Dp2 C1 ) (Eq. 2) p1 , p2 Where p1 and p2 each belongs to different clusters C1 and C2, D calculates the distance. The outcome of this equation would be the average minimum distance between the two clusters divided by k-means. This criterion generates the value that is later used to determine k value, instead of directly predicting number of ink types in the handwritten note. This criterion shows promising result for most ink combinations, as shown in figure 4-5, in which higher values of dist reflect higher probability of the window containing 2 different inks. As can be observed in figure 4 and figure 5, the basic shape is similar between the Figure.4 minimum pair distance Figure.5 Ground Truth computed result and the ground truth. All left to do is to set a threshold. However, due to the absence of global data, we can’t base our assumption on values outside the local image region. As a result, the threshold must be manually set according to the observation of investigators. 3). Feature selection: Though the criterion described above shows acceptable results, there are certain ink combinations that are not very applicable to this criterion, as shown in figure 6 and figure 7. The reason may be that the difference between certain ink combinations is not significant enough to be spotted with this criterion. To solve this issue, feature selection (FS) method was introduced in [1], in which certain spectral bands that have greater differences between ink types were selected to represent the entity. With the aid of FS, the characteristics of each ink become more distinguishable, as shown in figure 7 and figure 8. Meanwhile, Schwarz’s criterion can also benefit from FS. Figure 6. Result without feature selection P2P distance criterion with FS shows promising results, yet it’s worth noting that Figure 7. Result with feature selection Figure 8. Ground truth the edge area, where only a small fraction of the window belongs to the other type of ink, is generally not very recognizable due to the insignificant difference. In this scenario, anomaly detection algorithms, due to their ability to detect the anomaly, are comparatively more suitable to recognize the presence of different types of ink. Three different anomaly detection algorithms, including LOF[3], COF[4] and INFLO[5], were combined with the previous methods, but the result varies. INFLO algorithm shows distinguishable outcomes at the edge area while the rest two proves to be not feasible to this kind of data. 4). INFLO anomaly detection: INFLO (INFLuenced Outlierness) is an anomaly detection algorithm introduced in order to handle the case where clusters with varying densities are in close proximity. Using this algorithm, investigators can calculate a score for each data point, if the score is higher than a certain threshold, it can be considered as an anomaly, which in this case, a pixel written in another type of ink. The INFLO score can be calculated with Eq. 3. INFLOk ( p ) denavg ,k ( ISk ( p )) denk ( p ) (Eq. 3) Where IS is the influence space of the given data point p, denavg ,k (S) is defined as Eq. 4 denavg ,k (S) Where denk (i ) is defined as Eq. 5 iS denk (i ) |S| (Eq. 4) denk ( p ) 1 k dis tan ce(p ) (Eq. 5) The INFLO score of an anomaly would be higher than usual, therefore, a peak of INFLO scores is expected around the edge area, where the small amount of data points belonging to the other type of inks would be considered as anomaly.It is shown in figure Figure 9. Maximum of INFLO scores Figure 10. Ground truth 9. The accuracy result would correspondingly improve if the peaks are properly recognized. In figure 9, the two images shown were the maximum values of the last 20 scores and the first 20 scores of eachlocal image region. Compare figure 9 and figure 10, the peaks in figure 9 roughly correspond to the two edges in figure 10. 4. Data and corresponding analysis: Due to the absence of global information, thresholds of the methods mentioned above need to be manually fixed. As a result, the thresholds would be the same for different combinations of inks. Consequently, the thresholds being used may not be optimal, which means possible future improvements on the result. On the other hand, due to the great difference between the thresholds of blue and black ink, the color of the ink in the window image needs to be predetermined. This process can be done by simple analysis on RGB data. To assess the effectiveness of different methods, we calculate the accuracy of each individual criterion. The accuracy measures the number of window images in which the number of inks is correctly predicted, divided by the total number of window images. This accuracy measure ranges from 0 to 1. The accuracy is given as Accuracy 1) Blue ink: Correct Windows Total number of Windows (Eq. 6) Figure 11 compares the accuracy of different methods for blue inks, including P2P distance criterion (Eq. 2) without FS, Schwarz’s criterion (Eq. 1) with FS and P2P distance criterion with FS. According to the figure, P2P distance criterion with FS shows the most accurate result, whilst the other two methods are comparatively more reckless. On the other aspect, ink combinations C12, C14, C25, C35, C45 show accuracy result as high as Figure 11. Comparison on accuracy around 97%, while combinations C13, C15 show accuracy result around 85%. It’s worth noting that for combinations C23, C24, C34, the accuracy is around 80%, which is the accuracy result where investigators simply assume that there is only one type of ink in the window. Figure 12 compares different methods used to enhance the performance of the basic detection method, which is point-to-point distance criterion with feature selection. The three methods are respectively kmeans method, kmeans combined with Schwarz’s criterion and kmeans combined with Schwarz’s criterion together with INFLO anomaly detection. Combining other methods means that only if the data meets certain condition for all the methods would the window image be considered to be consists of two inks.To combine different criterions is to conditionally integret the outcomes of them so as to combines the very best of these methods. As shown in figure 12, for most of the ink combinations, INFLO method and Schwarz’s criterion improve the performance and prove themselves to be useful. However, due to the fact that INFLO only works on the edge area, the improvement made by INFLO seems trivial. In ink combinations 2 and 4, which would be C13 and C15, the accuracy improves greatly. However, in ink combination 7, which would be C25, instead of enhancing, the accuracy declines sharply after implementing the improving methods. The possible reason could be that the pixel values of C25 were confusing to the Schwarz’s criterion, more misjudges were made in C25, which consequently results in the declination of accuracy. In ink combinations 5 and 6, which would be C23 and C24, the low accuracy grows even smaller. In conclusion, combining Schwarz’s and INFLO anomaly detection algorithm shows some improvement than original point-to-point distance detection method. Besides, selecting a better set of parameters may be very helpful to the result. Figure 12. Comparison on accuracy 2) Black ink: 5. Problems and future improvements: The accuracy results presented prove the criterion to be very helpful for investigators to determine if there is an ink mismatch in the document. However, as shown in the figures above, there are some problems in the system and a great space for future improvements. Firstly, without the aid of global information, the thresholds must be manually fixed. It results in the fact that the thresholds may not be of optimum. Thus, a better set of thresholds may have significant improvements. One way to improve this is by investigating local-global hybrid criterions. Secondly, the information extracted from methods used in the experiments were rough and primary, a great improvement may be made if the extraction technique can be polished. Last but not the least, an alternative may be to replace k-means with a more suitable clustering algorithm. Because the capability of k-means is limited and a better algorithm might cluster the dataset accurately. References: [1] Khan Z, Shafait F, Mian A. Hyperspectral imaging for ink mismatch detection[C]//Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013: 877-881. [2]A. W. Moore. K-means and hierarchical clustering. Lectures in datamining, CMU, 2001 [3] LOF [4] COF [5] INFLO