Document 13448745

advertisement
 Determining the Effective Entropy of a Visual Hash System Josh Montegna May 22nd, 2015 Thesis Advisor: Professor David Roundy 1 TABLE OF CONTENTS 1. ABSTRACT ............................................................................................................................ 3 2. INTRODUCTION .................................................................................................................... 3 2.1. BACKGROUND .......................................................................................................................... 3 2.2. THEORY .................................................................................................................................. 4 2.2.1 SSH and Public-­‐Key Encryption ...................................................................................... 4 2.2.2 Theoretical Entropy and Effective Entropy .................................................................... 5 2.2.3 Numerical and Visual Hashes ........................................................................................ 6 2.2.4 Python and Data Creation ............................................................................................. 9 2.2.5 Bayesian Analysis and Effective Entropy Calculation ................................................... 10 3. METHODS .......................................................................................................................... 12 3.1. SUMMARY ............................................................................................................................. 12 3.2 VISUAL HASH COMPARISON PROGRAM ....................................................................................... 12 3.3 DATA COLLECTION ................................................................................................................... 14 3.4 BAYESIAN ANALYSIS AND COMPUTATIONAL ESTIMATE ................................................................... 15 4. RESULTS ............................................................................................................................. 16 5. DISCUSSION ....................................................................................................................... 23 6. CONCLUSION ..................................................................................................................... 24 7. REFERENCES ....................................................................................................................... 24 2 1. ABSTRACT Fractal visual hashes have the potential to replace traditional hexadecimal hashes for SSH applications with the goal of increasing user recognition of identities of remote computers. Rather than rely on human users manually comparing two hexadecimal hashes as part of SSH's public-­‐private key encryption process, users would compare fractal images generated from the original hexadecimal hashes to determine if the keys match. A visual hash comparison and differentiation game was developed that evaluated the ability of users to differentiate between two images visually. Users were presented with pairs of either flag hashes, t-­‐flag hashes, identicon hashes, hexadecimal hashes, or fractal hashes and were asked to decide if the images were the same or different. The hashes generated were created in a way so that hashes being compared were very similar in their visual characteristics to increase the number of images that were hard to distinguish. The data collected was analyzed through Bayesian analysis wherein the user input and the hash details were evaluated to the find the highest probability effective entropy and user error of the system. The fractal visual hash achieved the highest amount of effective entropy among the tested hash types and provided a foundation for further research to be conducted on visual hash systems and the validity of fractal hashes. 2. INTRODUCTION 2.1. BACKGROUND In modern computing applications the process of accessing a remote computer over an unsecured network is becoming increasingly common and with that the vulnerability to data theft increases as well. The most effective method to prevent data theft over an unsecured network is to use an encryption protocol on the data being sent over the network. One such encryption protocol is SSH, Secure Shell, which creates a secure network channel between two computers for data to go through. This protocol specifically uses public-­‐key cryptography as a way to authenticate the remote computer over the network, a main component of which is dependent on users visually confirming that two strings of hexadecimal numbers do indeed match. 3 The string of numbers, a hash, has a unique correspondence to the much larger prime number used in the encryption of the data. These numerical hashes are often hard to visually match for a human user as people usually have poor number matching abilities. However, people's image matching abilities are often quite strong, such that the task of recognizing when two images are matching or not is often trivial. From this comes the concept of converting a numerical hash into an image, a visual hash, that allows users to easily recognize when two hashes are either the same or different. The visual hash is created from the original numerical hash and is mathematically unique to it, however some visual hashes may look very similar though they have different initial hashes. In encryption, a system will have an entropy value, such as 128 bit, wherein there are 2128 unique numerical instances within that system. Were each of these instances, numerical hashes, to be converted into visual hashes there would theoretically be 2128 mathematically unique procedurally generated images, but visually some of them may appear to users as identical. This overlap of visual hashes reduces the entropy for encryption systems to the effective entropy of the visual hashes, or how many visually unique hashes exist. This reduction in entropy is significant, as cryptographic systems such as SSH needs a high entropy value to be secure. To determine if a visual hash system is viable for SSH, the effective entropy of the visual hash system needs to be found. For the purposes of creating a large number of mathematically unique procedurally generated images fractals were chosen as the image form as they provide enough variables that can be changed. 2.2. THEORY 2.2.1 SSH AND PUBLIC-­‐KEY ENCRYPTION SSH allows for a computer to connect to a remote computer over an unsecured network to share or transfer data in a secure manner. This is done by encrypting the data going in between the computers such that if it were to be intercepted it would be locked, or unreadable, to people without the correct key. Public-­‐key encryption works by creating two numerical keys for every user, a public key and a private key. The public key is used to encrypt the data and send it over the network and the private key is used to decrypt the data at the other end of the network. The public key encrypts 4 the data is such a way that only the private key can decrypt it, thus the data will be secure even if it is intercepted. With this process public keys are exchanged between computers so that the users can send secure data encrypted with the respective private keys, however this creates the problem of verifying that a public key belongs to the correct user. To verify that a remote computer is indeed the computer it claims to be, SSH requires that a user check that a remote computer's public key matches the one belonging to the intended remote computer[2]. To do this, a user must independently acquire the correct remote computer's public key and then visually verify it to the presented public key when connecting to the remote computer. As this is a lengthy process it only occurs the first time two computers are being connected, as SSH remembers the corresponding keys from then on. Because public and private keys are extremely long numbers they cannot be written down and compared, thus key fingerprints are used to represent the keys. These fingerprints are much shorter hexadecimal strings that differ in length but typically correspond to the amount of entropy in a system. These fingerprints are referred to as hashes. 2.2.2 THEORETICAL ENTROPY AND EFFECTIVE ENTROPY Entropy, in cryptography, is the number of possible states or instances that can exist within a system. In particular, the codes to encrypt data are often created from systems that have very high entropy, thus giving the code high entropy and thus more security. The security comes from the fact that were someone or some computer to try and break the code, they would have to try keep trying different states, or codes, to try and come across the one specific code that works. Mathematically an encryption system with X number of bits will have 2X possible unique states, assuming a binary system as is used with computers. This is known as the theoretical entropy and is the entropy seen by computers. The effective entropy is the amount of entropy observed by a user using an encryption system. When a computer uses an encryption system it sees every instance as numerically unique and thus the theoretical and effective entropy are the same. When a human user uses a visual hash encryption system (viewing the hashes) some of the visual hashes appear to be the same, thus reducing the number of unique states, and thus the effective entropy is lower than the theoretical entropy. This effective entropy must be at least the desired amount of entropy by a user for visual hashes to be an acceptable substitute for 5 numerical hashes. The problem of humans viewing a system and mistaking unique states for each other happens to all hashes, numeric and visual, and for visual hashes to be viable they must have a higher effective entropy than numerical hashes. 2.2.3 NUMERICAL AND VISUAL HASHES The original hash is a numerical hash in hexadecimal form, however any numbering system qualifies and all numeric hashes are closely related in form. As illustrated in Fig 2.1, the numeric hash is simply a string of numbers and letters. Figure 2.1: An example of a 32 bit hexadecimal numeric hash, created randomly from a random number generator. With visual hashes, however, there are multiple types including identicons, fractals, flags, and t-­‐
flags. Identicons, created in 2007 by Don Park, are visual representations of numerical hashes that were designed to identify a user and protect their privacy simultaneously. Figure 2.2 is an example of a basic identicon created by a pattern of colored shapes arranged in a three by three grid. 6 Figure 2.2: A basic identicon featuring a singular color of squares arranged onto a predetermined grid size. Flags as visual hashes are squares divided into 4 rectangles, each a different color, and T-­‐flags are squares divided into 8 smaller rectangles, each with a different color [1], Figure [2.3,2.4]. These methods for visual hashes are rudimentary and support only small amounts of entropy if used in a system, specifically 24 bits of entropy for Flag and 32 for T-­‐flag. Figure 2.3: An example of a flag visual hash. 7 Figure 2.4: An example of a T-­‐flag visual hash. The fractal visual hash is a new way to graphically represent a numerical hash, incorporating a fractal generation algorithm that creates an image mathematically unique to the numeric hash [3]. The algorithm uses color, symmetry, geometry, and opacity to generate a complex visual representation of a number while minimizing the loss of any entropy due to simplification or redundancy. This gives the fractal visual hash form a distinct advantage over the other visual hash forms as it retains the desired entropy mathematically. The fractal-­‐generating algorithm uses a strong random number generator that puts an upper bound of 512 bits of entropy on the system. Due to the inherent complexity of the fractal hashes, details and subtle differences between different fractals can be lost to the observer, thus rendering the images identical. Fig 2.5 represents an assortment of typical fractal hashes, demonstrating the diversity and complexity present in the form. 8 Figure 2.5: Examples of fractal visual hashes and the varying structure of the form. With the five different forms of a hash (hexadecimal, fractal, identicon, flag, and t-­‐flag) being tested, the users' ability to differentiate between unique images will be analyzed for each hash type, thus giving a comparison to judge whether or not any of the hashes are superior. 2.2.4 PYTHON AND DATA CREATION The visual hash comparison program is written in the python programming language and implemented graphically through Kivy software and code. In this hash comparison program, users are shown two unique or identical images side by side, or one after the other, and then asked whether the images were the same or different. These images, the visual hashes, are created from modules in the program that convert an arbitrary random string into whichever visual hash form is called upon. Once one image is created, the program then picks another random number between 0 and 1, and this new random number determines the fraction of bits that can be changed in the second image. Each image has a certain amount of bits that determine its visual properties, akin to DNA, and if a bit is changed, the whole image will change according to how that specific bit altered the structure. Each bit is an on and off switch for certain features in the visual hash image and as such when features are turned on or off the image becomes very different. When the random number that dictates the fraction of bits to be 9 changed is used, the program will go through that amount of bits and then randomly pick between 0 and 1, turning bits on or off. This gives the possibility that no bits are changed and that the second image created is identical to the first, an important result that will reveal whether users can even identify identical visual hashes at all. A second algorithm is used to shift the grouping of the random numbers used to pick the ratio of bits that can be changed number. This shift centers the vast majority of the random numbers around .01 with a semi-­‐Gaussian distribution, thus creating images that are very closely related to each other, as only a few bits will change each time. When the images are closely related visually, the test to determine uniqueness will be more relevant as the users are forced to differentiate between two images that could likely be nearly identical, thus providing a better understanding of where the visual hash form breaks down and becomes unreliable. This point of unreliability corresponds to the confidence interval of the entropy of the system with each visual hash form having a different breaking point, giving each a different amount of entropy. As each pair of images is tested, data is collected on the hash type, the user response, the images' uniqueness, the alikeness of the images, and the ratio value. This data is stored in a .csv file in the directory where the program is stored. 2.2.5 BAYESIAN ANALYSIS AND EFFECTIVE ENTROPY CALCULATION Bayesian analysis differs from traditional statistical analysis, frequentist analysis, in that it requires evidential probabilities, wherein a hypothesis has a probability of occurring[4]. This differs from the frequentist view of a hypothesis to be tested having either a 100% chance of occurring or none at all. In Bayesian analysis, the probability of a hypothesis occurring is updated as data is collected, thus the probability of the hypothesis occurring given certain evidence is a function of the prior probability of that hypothesis occurring before the current evidence was taken. After the current probability of the hypothesis with the current evidence is taken, it becomes the new prior probability and the analysis starts again. This statistical loop uses Baye's Theorem (1), which states that the posterior probability, 𝑃 𝐻 𝐸 , is a consequence of two antecedents, a prior probability, 𝑃 (𝐻), and a likelihood function, 𝑃 𝐸 𝐻 . The likelihood function gives the compatibility of the evidence with the given hypothesis. The marginal 10 likelihood factor, 𝑃 (𝐸), is a normalization factor . 𝑃 𝐻𝐸 =
! !!
!(!)
𝑃(𝐻) (1) To understand Bayesian analysis better, consider a horse race between Horse A and Horse B. These two horses have raced 12 times before with Horse A winning 5 of the races and Horse B winning 7. With the traditional frequentist analysis, the odds of Horse A winning a race today would be at 5/12 and Horse B at 7/12. However, it is raining today and of the 12 prior races 4 have been when it is raining. In the 4 races when it is raining Horse A has won 3 and Horse B only 1. Thus given the prior evidence that Horse A wins 75% of its races when raining, the odds of it winning today are 3/4 rather than 5/12. This analysis methodology accounts for prior evidence being used to further understand the probability of the selected hypothesis occurring. In this example we are trying to find 𝑃 𝐴 𝑤𝑖𝑛𝑠 𝑟𝑎𝑖𝑛𝑠 , or the probability that Horse A wins given that it is raining. 𝑃 𝐴 𝑤𝑖𝑛𝑠 𝑟𝑎𝑖𝑛𝑠 =
𝑃 𝑟𝑎𝑖𝑛𝑠 𝐴 𝑤𝑖𝑛𝑠
𝑃(𝐴 𝑤𝑖𝑛𝑠) 𝑃(𝑟𝑎𝑖𝑛𝑠)
𝑃 𝐴 𝑤𝑖𝑛𝑠 𝑟𝑎𝑖𝑛𝑠 =
3/5
∗ 5/12 = 3/4 4/12
By using Bayesian analysis in the program, a true measure of the probability of the entropy can be found. When each new data point is added, the probability of the effective entropy of the system is updated, with the prior probability from prior data accounted for. To find the probability of the effective entropy, a mathematical relationship between the probability (P), the effective entropy (H), ratio of bits coefficient (f), and the user error coefficient (A) is used. This equation, (2), gives the probability of the system existing given the entropy, bit ratio coefficient, and user error coefficient. 𝑃 (𝑓, 𝐻, 𝐴) = (1 − .5 ∗ 𝑓)! (1 − 𝐴) (2) This prior probability is used in the Bayes Theorem as well as the user input in (3,4) where the prior probability of the system is multiplied by the system probability and set to be the new prior probability. Equation (4) is used when the user inputs a 1, and equation (4) is used when the user inputs a 0. This updates the new probability according to the user's inputs and 11 accounts for how the probability changes due to the user answering that the images do not match, an input of 0. 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑡𝑦 ∗ 𝑃(𝑓) (3) 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 1 − 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑡𝑦 ∗ 𝑃(𝑓) (4) 3. METHODS 3.1. SUMMARY Using the visual hash comparison program created by Professor David Roundy, pairs of visual hashes were generated and presented to users who then chose whether the images were the same or different. These decisions were collected along with technical data about the pair of visual hashes and analyzed through the Bayesian analysis program that determined the most probable effective entropy given the system. 3.2 VISUAL HASH COMPARISON PROGRAM The mechanism through which our hypothesis was tested and examined was the program that compared visual hashes to each other and recorded the users' ability to differentiate between the images. This program is composed of two main program files with supplementary support files called in the two main files, main.py and bayes-­‐binary.py. The fist program, main.py, contains the code that operates and controls the graphical portion of the program interface; specifically it is modeled as a game that users play, though there is no incentive to succeed. In the game users can either view a pair of visual hashes at once and then determine if they are different, called the pairs game (Fig. 3.1), or view one visual hash and then another immediately after the first hash leaves the screen, called the memory game. Users view 4 types of visual hashes, flag, t-­‐flag, identicon, and the fractal hash, when playing the game, and one hexadecimal hash. These different hash types test the users' ability to not only differentiate 12 between the fractal hashes but also the other visual hash alternatives and the original hash, hexadecimal, thus providing context for which the effective entropy of fractal visual hashes can be viewed in. Figure 3.1: A screenshot of the pairs game where users view two visual hashes at once and decide if they are the same or different. The pairs of visual hashes are not entirely random, though they start out that way. Initially one visual hash, of any kind, is generated from the algorithms and then the second hash is created from the first. Taking the bit ratio coefficient f and attempting to change that percentage of bits in the original hash creates this second hash. As mentioned previously, f is biased towards extremely low ratios to give a small percentile of bits that can be changed, as this creates barely perceptible differences in the hashes, both mathematically and visually, Figure 3.2. This second hash, which is now a mutant of the first hash, is shown alongside the first, original, hash for the pairs game and immediately following the original hash for the memory game. Once the user has seen both of the hashes, they then decide whether or not the images were the same or different by clicking the buttons titled "same " or "different". When the user decides that the images are different a 1 is recorded, and conversely a 0 recorded when the images are deemed 13 the same. This data point, along others, is recorded locally in either pairs.csv or memory.csv, which is then processed by the analysis program, bayes-­‐binary.py. The second main program, bayes-­‐binary.py, is the analytical program that processes that data collected in the .csv file and generates the probability of the effective entropy and the corresponding figures. Figure 3.2: The teal function is a joining of the green and red functions, which are different models for the distribution of bit ratio values. The teal function is used to bias the bias ration coefficient f to small values. 3.3 DATA COLLECTION When the user plays the game, information about their choices is recorded into the data .csv files. Specifically, a time stamp, the time between plays, the hash type, the f value, the user input of either 1 or 0, the program input of either 1 or 0, and the alikeness coefficient. The time stamp is simply when that game was played, allowing the data to be organized by time if needed. The time between plays is actually seconds since an arbitrary date and each subsequent play adds more seconds to that count. This count is used because it is a usable string 14 of numbers unlike the time stamp, and to find the amount of time in between plays all that would be needed is to subtract one time count from another. This linear scale allows us to see how long each play takes and if users are spamming the game by playing at a speed beyond reason. The hash type allows for indexing of the data into different hash types for analytical purposes later, a crucial component of the entropy comparison. The bit fraction value of f is recorded because it is required to find the entropy probability with the algorithms. The user input is the only user generated data point and is either a 1 or 0. The machine's input of either 1 or 0 is simply whether or not the images are indeed different or the same. This value is the mathematical part of the hash comparison and is what the user's input is judged against to determine if they selected correctly or incorrectly. The last data point is an alikeness coefficient which attempts to give a sense of how alike two images are. This data point is a rough estimate and is generated from a simple color sampling of random points in the two images. All of the data is recorded, though not all of it is used, as the time stamp, time count, and alikeness coefficient are all unused in the program. These data points are not used in the analysis program code due to the intention of eventually writing code to incorporate them in, but initially the code analyzing the other data points was of a higher priority. 3.4 BAYESIAN ANALYSIS AND COMPUTATIONAL ESTIMATE When the data is collected and stored in the .csv files the bayesian-­‐binary.py program is ran, wherein the data points are extracted and analyzed to fin the Bayesian probability of the effective entropy of each ahs type. The entropy H and the user error coefficient A are not found from an explicit formula, but from taking the Bayesian probability of the system with different H and A values. An algorithm takes the f value, and both the user and machine inputs, and uses those known values to find the Bayesian probability at values of H ranging from 0 to 400(in .05 increments) and values of A ranging from 0 to 1(in .001 increments). Where the probability is high, those values of H and A are very close to the real values of the entropy and user error for the system. This 15 computational method is used because it is impossible to analytically solve for the entropy and user coefficients, but it is not impossible to check where those values are very likely to exist. This process generates a "heat map" where H is the y-­‐axis and A is the x -­‐axis and the probability of the system existing is shown through colors. Darker colors, blues, represent regions of very low probability, and brighter colors, red and yellow, represent regions of high probability (Fig 3.3). Figure 3.3: An example of a heat map for the Bayesian probability of the effective entropy and user error coefficient. 4. RESULTS Over 1,000 data points were collected by various users playing the pairs game, unfortunately the memory game was not playable by many of the users as it was claimed to be too slow to collect data. The highest probability effective entropy of each hash type was found from the analysis of the data system, along with the user error coefficient of each hash type. For each hash type an effective entropy/user error heat map was generated to show where the highest probable value existed. Also generated was a plot of the probability of the effective entropy for each hash type. The fractal hash type had the highest effective entropy with 129.5 bits of entropy, with 128 bit hexadecimal numerical hashes having the second highest effective entropy with 91.5 bits, and 16 flag hashes having 24 bits, t-­‐flags having 28 bits, and identicons having 16.5 bits of entropy. Table 4.1 shows the tabulations of how many times each hash was played, with the Same/Different split, and each hash type's effective entropy and user error coefficient. Hash Type Same Different Total Effective User Error Entropy H) Coefficient (A) Flag 21 47 68 24 ± 0 0 T-­‐Flag 33 68 101 28 ± 0 0 Identicon 26 79 105 16.5 ± .67 0.041 Hexadecimal 62 57 119 91.5 ± 0 0 Fractal 370 238 608 129.5 ± 1.9 0.015 Table 4.1: The identicon hash had the lowest entropy, with flag and t-­‐flag having marginally higher entropies. The hexadecimal hash had the second highest entropy level, but was still significantly lower than the fractal hash entropy level. ± Each hash type had a range of entropy that fell within the region of higher probability. The distributions of these highly probably entropies are very localized, though there is variation form hash type to hash type, with the larger entropy hashes having a wider range of high probability bits, Figures 4.2-­‐4.6. 17 Figure 4.2: The probability of each bit of entropy for the flag hash type. The entropy is localized around 24 bits of entropy, the highest probability effective entropy value. Figure 4.3: The probability of each bit of entropy for the t-­‐flag hash type. The entropy is localized around 28 bits of entropy, the highest probability effective entropy value. 18 Figure 4.4: The probability of each bit of entropy for the identicon hash type. The entropy is localized around 16.5 bits of entropy, the highest probability effective entropy value. Figure 4.5: The probability of each bit of entropy for the hexadecimal hash type. The entropy is localized around 91.5 bits of entropy, the highest probability effective entropy value. 19 Figure 4.6: The probability of each bit of entropy for the fractal hash type. The entropy is localized around 129.5 bits of entropy, the highest probability effective entropy value. The heat maps of each hash type shows where in the H and A range the highest probable values exist, Figures 4.7-­‐4.11. The areas with a lower Bayesian credibility percentile represent a higher probability of existing for the effective entropy and user error coefficient. Figure 4.7:The Bayesian probability heat map for the flag hash type. Areas with a lower credible percentile have a higher probability of existence. 20 Figure 4.8:The Bayesian probability heat map for the t-­‐flag hash type. Areas with a lower credible percentile have a higher probability of existence. Figure 4.9:The Bayesian probability heat map for the identicon hash type. Areas with a lower credible percentile have a higher probability of existence. 21 Figure 4.10:The Bayesian probability heat map for the hexadecimal hash type. Areas with a lower credible percentile have a higher probability of existence. Figure 4.11:The Bayesian probability heat map for the fractal hash type. Areas with a lower credible percentile have a higher probability of existence. 22 5. DISCUSSION The purpose of the visual hash experiments was to evaluate whether a fractal visual hash is a viable alternative to traditional visual hashes and most importantly as an alternative to hexadecimal hashes. This evaluation is based on how much effective entropy the fractal visual hash method has compared to the other hash types, because if the effective entropy is too low then the methodology is incompatible with cryptographic processes. Also important to the viability of fractal hashes is the user error coefficient, which reflects how accurate users are when comparing two fractal hashes together. This coefficient needs to be small enough to not impact the usability of the visual hash as a means of remote computer identification in a cryptographic process such as SSH. Table 4.1 shows that the fractal visual hash has the most entropy of any hash type tested and it has 41.5% more effective entropy then the second highest effective entropy hash, hexadecimal. The comparison between fractal hashes and hexadecimal hashes is the most important because currently hexadecimal hashes are the standard hash used in SSH and other programs. While the error coefficient of fractal hashes is higher than the error coefficient of hexadecimal hashes the difference is only 0.015. It is important to recognize that the maximum entropy of the hexadecimal hash is 128 bits, and had it had the maximum entropy of the fractal hash, 512 bits, it might have had a higher effective entropy. However, that high of a maximum entropy would lead to a significant increase in the number of digits that the user would have to analyze and compare. The three lower entropy hashes, flag, t-­‐flag, and identicon, are tested because they are the standard visual hashes currently employed in programs and the fractal hash would need to have greater entropy than those hashes to be considered a viable replacement. The fractal hash has a much higher maximum entropy compared to the prior visual hashes thus the effective entropy is also higher with 129.5 bits of effective entropy compared to 24 bits, 28 bits, and 16.5 bits of entropy for flag, t-­‐flag, and identicon hashes respectively. Evidently, the fractal hash is a suitable replacement for the prior visual hashes in terms of having enough entropy. This data is not conclusive however, as only one of the two game types was working. The memory game was unresponsive and unable to collect data from enough willing users to generate analysis. This game is an important test for the users' ability to identify unique visual hashes because in this game the hashes are not shown side by side but rather one after the 23 other. This requires the user to remember what the visual hash looks like and apply that memory to a second visual hash, which is a harder test and a more realistic test of the ability for fractal hashes to be used in a cryptographic program. What the data from the pairs game shows is that people can identify between two images when given the opportunity to study the images simultaneously, which is not a common occurrence in real world applications. Although over 1000 points of data were taken and analyzed, the number of users who provided that data was no larger than 12. Because of the extremely small sample size of the users, the data and the subsequent analysis is not generalizable to a larger scale. In order to gain a more comprehensive picture of how well the fractal hash would work a larger number of users is required. Putting the test online with data being sent back to a server would allow for more users, preferably over 1,000 individual users with over 25,000 data points. This amount of data would account for statistical anomalies and some users being unusually good at visual differentiation. 6. CONCLUSION The primary goal of this project was to evaluate the viability of fractal visual hashes by determining the effective entropy of the hash system and comparing it to the effective entropy of prior hash systems. Users of the game evaluated whether or not pairs of hashes were the same or different, leading to a Bayesian analysis of the probability of the effective entropy. The value of the effective entropy where the Bayesian probability is highest is the estimated value of the hash system. The fractal visual hash system was found to have an effective entropy of 129.5 bits while hexadecimal had 91.5 bits of effective entropy. In comparison to the other visual hash systems the fractal hash had on order of magnitude more entropy and an error rate of marginal difference. However, the data collection is biased by the memory game not working and by the small user pool size. The data is a preliminary result, but it does lead to the conclusion that the fractal visual hash method is sound and can work as a replacement. Future research will be needed to fully test a diverse user pool of significant size, and to test the memory game portion. 7. REFERENCES 24 [1] Hsu-­‐Chun Hsiao, Yue-­‐Hsun Lin, Ahren Studer, Cassandra Studer, King-­‐Hang Wang, Adrian Perrig, Hung-­‐Min Sun, Bo-­‐Yin Yang, Hiroaki Kikuchi. "A Study of User-­‐Friendly Hash Comparison Schemes," Proceedings of Annual Computer Security Applications Conference (ACSAC), December, 2009. [2] Micheal Farb, Yue-­‐Hsun Lin, Tiffany Hyun-­‐Jin Kim, Jonathan McCune, Adrian Perrig, "Easy-­‐to-­‐Use and Secure Public-­‐Key Exchange," CMU, Dec, 2011 [3] Scott Draves, Erik Reckase, "The Fractal Flame Algorithm," September, 2003 [4] Herzog, Stefan, and Dirk Ostwald. "Experimental biology: Sometimes Bayesian statistics are better." Nature 494, no. 7435, February 7, 2013 25 
Download