Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification by Burl Amsbury S.M., Electrical Engineering (1988) S.B., Electrical Engineering (1988) Massachusetts Institute of Technology Submitted to the System Design and Management Program in Partial Fulfillment of the Requirements for the Degree of Master of Science in Engineering and Management at the Massachusetts Institute of Technology June 2000 @ 2000 Burl Amsbury The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part. A x" Author: Bur Amsbury (- -- n^,;rn onnr-hnnnmnment Proarafn Thesis Sapervis'or: DrfDaniel Frey V Professor of Aeronautics & Astronautics LFM/SDM Co-Director: Dr. Thomas A. Kochan George M. Bunker Professor of Management 1NTI1T LFM/SDM Co-Director: Dr. Paul A. Lifga&e Professor of Aeronautics & Astronautics and Engineering Systems MASAHSET MASSACHUSET TS INSTITUTE OF TECHNOLOGY JENO JUN 1 4 0 LIBRARIES Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification by Burl Amsbury Submitted to the System Design and Management Program June 2000 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Engineering and Management ABSTRACT Tomorrow's design teams will be diverse and distributed. They will be integrated via the Internet, using engineering and management tools that allow them to transmit among themselves legally binding and safety-critical data and documents. It is essential that an Internet infrastructure that provides a high level of security be in place. Presently, the weak link in network security is the password. When documents are digitally signed using a private key as part of an asymmetric encryption algorithm, access to that key is typically gained with knowledge of a password only. This thesis proposes a system whereby biometric data becomes part of the digital signature, tying the sender's identity to the digital signature in real time. A trusted third party-a Biometric AuthorityTM-verifies the identity of the sender by comparing the biometric data contained in the digital signature with that in a proprietary database. Since biometric data is, in the end, just a number, it would be possible to hack a biometric data collection system and falsify the biometric data just like a password or private key. For biometric data to really be secure, it should be data that varies from valid sample to valid sample. Handwritten signatures are the most practical example of this type of data. If two samples of handwritten signatures are identical, it is virtually certain (especially when signatures are nominally captured dynamically) that at least one of them is a forgery. To be considered valid, a signature should not be too different from an enrollment population of samples, nor should it be identical to any previous samples. This thesis outlines technology that may be adapted to optimally choose the features of a dynamically captured handwritten signature in such a way that the identification error requirements and the network security requirements may be met. This algorithm follows from an application of Robust Design methodologies. Thesis Supervisor: Daniel D. Frey Professor of Aeronautics and Astronautics 2 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 Table of Contents I Introduction.................................................................................................................5 1.1 1.2 2 Problem s w ith PKI.......................................................................................... Biom etric Authority TM (BA) ............................................................................. 6 8 N etwork Security ..................................................................................................... 9 2.1 Exam ple Network Security Applications......................................................... 2.2 Public K ey Infrastructure (PKI) ...................................................................... 2.2.1 W eb of Trust ............................................................................................... 9 10 11 2.2.2 Digital Certificates ...................................................................................... 12 2.2.3 Certificate Authorities............................................................................... 2.3 Digital Signatures........................................................................................... 2.4 Docum ent Encryption and Decryption .......................................................... 3 Problem s with Current Status ............................................................................... 12 13 17 22 4 Proposed Solution................................................................................................. 4.1 Biom etric Authority M ................................................................................... 23 25 4.2 Enrollm ent...................................................................................................... 4.3 Identity V erification...................................................................................... 4.4 Inform ation Flow .......................................................................................... 5 Technology ............................................................................................................... 5.1 Existing Biom etric Technologies.................................................................. 5.1.1 Handwritten Signature V erification (H SV)............................................... 5.1.2 Voice V erification...................................................................................... 27 27 28 30 30 30 31 5.1.3 Face Geom etry .......................................................................................... 5.1.4 Finger Scan ................................................................................................. 5.1.5 Hand Geom etry.......................................................................................... 5.1.6 Iris Recognition........................................................................................... 5.1.7 Retina Scan ................................................................................................. 5.1.8 K eystroke Dynam ics................................................................................. 5.1.9 A Comparison of Biom etric Technologies ................................................... 5.2 Robust Design Methods Applied to Pattern Recognition............................. 5.2.1 Function of the Im age Recognition System ............................................... 5.2.2 Extracting Features U sing W avelets.......................................................... 5.2.3 Classifying Im ages Using Mahalanobis Distances .................................... 5.2.4 Robust Design Procedure........................................................................... 5.2.5 Results........................................................................................................... 5.2.6 Confirm ation............................................................................................. 5.2.7 Sum m ary................................................................................................... A M ore Robust Classification M ethod ............................................................. 5.3 5.3.1 Feature Extraction...................................................................................... 5.3.2 Feature Evaluation ..................................................................................... 5.3.3 Sum m ary ................................................................................................... Handwritten Signature Verification (HSV) Technology ............................... 5.4 5.4.1 Challenges................................................................................................. 5.4.2 N ew Technology........................................................................................... 6 Certificate Authority Business M odel ................................................................... 6.1 Entrust ............................................................................................................... 3 31 31 32 32 33 33 33 34 35 37 38 40 43 45 46 46 47 48 58 58 58 59 61 61 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 6.2 VeriSign ............................................................................................................ 6.3 Som e Other N etwork Security Com panies ................................................... 6.3.1 Pretty Good Privacy................................................................................... 6.3.2 InterTrust Technologies ............................................................................ 6.3.3 SAFLINK Corporation ................................................................................. 7 Proposed Biometric AuthorityTM Revenue M odel................................................. 8 Summ ary................................................................................................................... 9 Bibliography ............................................................................................................. 10 Acronym s.................................................................................................................. 4 62 63 63 63 63 65 67 68 70 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 "We need the killer application for PKI. Just what that will be, I don't know."' Introduction 1 As networks become ubiquitous in our increasingly "connected" economy, the security of the data being transmitted on them is becoming more and more a concern. Organizations are coming to rely on networks-extranets, intranets, virtual private networks (VPNs), the Internet-in the normal course of business. Networks are less an option and more a necessity for many... increasingly so. In particular, it is difficult to imagine a healthy organization today that does not have access to, and take advantage of, the Internet. But as soon as something becomes necessary, it also becomes a target for crime. And besides criminal misuse of data, many individuals (especially in the United States) are concerned with issues of personal privacy. In general, linking a document indelibly to an individual sender affords the following advantages2 * Authenticity. Assures the receiver that the sender is who he says he is. e Integrity. Affords both sender and receiver the opportunity to verify that the document received is exactly the same as the one sent. e Nonrepudiation. Denies the sender the ability to deny that he actually sent the document. * Legal significance. It serves to remind the sender of the importance and legal implications of the document in question, thereby presumably reducing frivolous document transmission. Alex van Someren, president and CEO, nCipher, quoted in "Security Roadmap", SC Magazine, www.scmagazine.com, December 1999. 2 Turban, Efraim; Lee, Jae; King, David; Chung, H. Michael, 2000, Electronic Commerce: A Managerial Perspective, Upper Saddle River, New Jersey: Prentice-Hall. 5 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 The existing Public Key Infrastructures (PKIs) meet these needs with some success. However, in reality a PKI provides a link between an originating document and the password, Personal Identification Number (PIN), or private key of the alleged sender. If the key is compromised (or more commonly the password used to access the encryption software that uses the key), the link is not valid. Biometric data can be used to add an additional layer of security by verifying the sender's identity with something he is (fingerprint, signature, iris pattern, etc.) rather than with something he knows (password or PIN) or has (smart card or token). 1.1 Problems with PKI New products will be developed by just-in-time collaborations of globally distributed teams linked seamlessly by Web-based tools and processes. The collaborations will be formed by means of a "services marketplace" where lead firms will find the world's best "knowledge purveyors"-suppliers of information, components, and support services.3 A fundamental requirement for the widespread acceptance the model outlined in the vision statement (see lead quote for this section) for MIT's Center for Innovation in Product Development (CIPD) is the ability of team members to believe beyond a reasonable doubt that their teammates are who they say they are. Other examples of enterprises that will require high levels of network security are: many forms of ecommerce; distance-learning ventures; and government agencies, such as the military forces, Patent and Trademark Office (PTO), and voter registration administration. Today, network security is dealt with using a PKI, which depends on asymmetric encryption technology. Digital signatures are meant to prove to the receiver of a digitally signed document that the sender is who he claims to be. The document and the sender's private encryption key are used to create the digital signature. The private key has a counterpart, called a public key, which can be used to decrypt data encrypted with the 3 Vision statement on the back of every CIPD student's business card. 6 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 private key, but cannot be used to encrypt as a replacement for the private key. The idea is that the receiver, using the sender's public key, decrypts the digital signature that could only have been encrypted by the owner of the associated private key. The problem is that it could have been the holder of the private key instead the original owner. The private key could have been compromised-it is just a file stored on a (often portable) storage medium. To use the key in an attempt to masquerade as the true owner only requires knowledge of the owner's password. "Security? You want the truth? Okay. Sure you're going to have hackers here and there. And viruses. But you want to know what's going to hurt more? Passwords. I mean it... Someone has to get rid of them. I don't know what with. I know it can't be smartcards. People lose those too." 4 Passwords are not considered very secure. The weak link in today's network security infrastructure is the password. An analogy that underlines the lack of true security provided with a system like the one described above: It is like accepting an already-signed check (one not signed in the presence of the vendor) from a customer that shows a social security card as proof of identity. To tie a person to his digital signature in real time, a biometric solution is needed. The biometric data used today include fingerprints, voiceprints, dynamically captured handwritten signatures, iris scans, retina scans, hand geometry, face geometry, and key patterns. Incorporation of such data and devices into the existing infrastructure has been slow. Reasons for this include a lack of technology maturity, the cost of the hardware required, the fact that hardware is required (many people are not anxious to add another peripheral to their systems), and privacy issues. There are really two types of privacy concerns. One is with the ownership and privacy of the data itself. The second is the physical invasiveness of some of the technologies, particularly the retina and iris scans. 4 An IT officer, speaking on condition of anonymity, quoted in "Security Roadmap", SC Magazine, www.scmagazine.com, December 1999. 7 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Bur Amsbury, June 2000 Another problem often not mentioned in existing literature is the fact that biometric data is just a number and as such, can be stolen as any other number or password. What is needed is biometric data that is consistently associated with a single person, but varies enough from capture to capture that an algorithm can tell it is unique to a person and unique to a particularcapture. 1.2 Biometric AuthorityTM (BA) I have argued that, in cases where a high level of security is desired, the system needs to provide for minimum delay between physical identity verification and digital signature validation. And I have argued that the physical identity verification must be done biometrically. These requirements may be answered with the creation of what we are calling a Biometric AuthorityTM. The function of the BA is to take as input a biometric data sample and return either yes/no verification or a confidence level. The data will be sent to the BA by the entity that wishes to verify a digital signature (the receiver of a digitally signed document). The BA will compare and contrast this sample data with other samples it owns in a central database. These samples are initialized with enrollment samples provided by the sender at the time of enrollment. Enrollment is also when initial physical identity verification is done. The enrollee is asked to supply several samples of biometric data corresponding to his local system's capabilities. For example, the biometrics may be electronically captured handwritten signatures, thumbprints, or voice prints. The Biometric AuthorityTM is envisioned as a new key piece of Internet security infrastructure. It would be implemented as an Application Server Program (ASP) and likely prove most useful initially serving a business-to-business (B2B) role. 8 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 2 Network Security "In terms of its security infrastructure, the World Wide Web is rotten at the core."5 2.1 Example Network Security Applications Examples of networks that may require levels of security not reachable with the existing PKIs are: e Virtual Private Networks (VPNs) that transport commercial design data among extended team members. It is important that design changes (to software code or hardware) be traceable at the time of transmissionto the authorized designer. This can be a matter of legal import, both in matters of product liability and in matters of intellectual property. * In addition, VPNs carry an increasing amount of proprietary information. In 1999, Fortune 1000 companies sustained losses of more than $45 billion from thefts of their proprietary information. Online copyright theft is rising to epidemic proportions, threatening the creative industries while inhibiting the development of electronic commerce. Losses due to Internet piracy are estimated at nearly $11 billion each year.6 * Matters of national security. So-called tactical networks that carry data in real time to members of our armed forces engaged in battle must be protected aggressively. * Government agencies that deal with huge networks of personal private data. For example, the PTO does an increasing amount of business over networks. The Census Bureau and the Internal Revenue Service (IRS) are two more examples. 5 Randy Sandone, president and CEO of Argus Systems Group, quoted in "Computer Crime Spreads", SC Magazine, www.scmagazine.com, April 2000. Taken from a PricewaterhouseCoopers report, quoted in "Security Roadmap", SC Magazine, www.scmagazine.com, April 2000. 6 9 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 The U.S. Postal Service may soon become involved in matters of Internet security as a certificate authority (CA). e Medical information is more and more commonly shared among hospitals and other medical facilities over networks. There is considerable concern regarding the privacy of this data. Another similar example is insurance data. * Electronic commerce, in business-to-consumer form and in business-to-business, is a clear example of a situation that begs for better security. The FBI has reported that cases of computer-related security breaches have risen by almost 250 percent in the past two years. Dollar losses, associated with various computer crimes and theft of intellectual property, were estimated to be in the $250 billionrange in 1997.7 2.2 Public Key Infrastructure(PKI) There are two broad categories of electronic encryption: symmetric and asymmetric. In symmetric schemes, the "key" or code number used to encrypt the data is the same as the key used to decrypt it. Therefore, in order to give someone the ability to decrypt a secured document, one must also give him or her the power to encrypt a document. Such a system is useful only when there are a very few mutually trusting key holders. Asymmetric schemes have a separate key for encrypting and decrypting. It is therefore possible to give someone the ability to decrypt a message without giving away the encryption key. There are two main ways to use asymmetric keys. They both involve a private key and a public key. They differ in which is used to encrypt and which is used to decrypt. To create a digital signature, the sender uses his private key to encrypt a hashed version of the document being signed. The receiver uses the public key (available publicly, of 7 Armstrong, Illena, April 2000, "Computer Crime Spreads", SC Magazine, www.scmagazine.com. 10 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 course) is used to decrypt (verify) the signature. A public key can only decrypt data encrypted with the matching private key. The idea is that the private key can only be used by the entity to which it was issued, thereby serving as proof that the sender is who he claims to be. Note that the original document is sent "in the clear." It is not encrypted. The purpose of a digital signature is merely to allow the receiver to verify the identity of the sender. To send an encrypted document, theoretically one could encrypt it using the public key of the intended receiver. The only entity able to decrypt the document would then be the holder of the receiver's private key and associated password. In practice, it is not done quite like that (see Section 2.4), but the complementary nature of the two ways to use asymmetric key pairs holds true. 2.2.1 Web of Trust When the receiver of a digitally signed document decrypts the signature and finds that the name of the person associated with the private key is the same as the name claimed by the sender, she is said to have verified the signature. In fact, though, she has verified reliably that the sender used a private key which at one point was issued to a single person (or entity). She has not really verified that the present holder of the key is the one to whom it was issued. That is the nature of the essential problem with PKI and the issue that this thesis addresses. Moreover, how does the receiver even know that the private key in question was really issued to whom she thinks? She did not see it happen, in general. She has to go on the word of someone who did. The "word" of someone who did is typically just another message saying so. Well, how does she know she can trust the word of this third party? By verifying the third party's digital signature. Clearly, this is circular reasoning, but the result, given enough players in the network, is what is known as a "web of trust." If everyone is careful to only sign the public keys of individuals that they physically saw receive the associated private key, then eventually the web of trust is inter-networked enough to provide reasonable assurance that the private key was really issued to the right person. 11 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program For some of its low-end personal security products, the company Pretty Good Privacy (PGP) relies on a web of trust made up of everyday citizens. No individual has fundamentally more or less authority or credibility than another. This web of trust, in one form or another, is the backbone of PKIs. 2.2.2 Digital Certificates In higher-end schemes, particularly those relied upon in business-to-business transactions, the trusted third party may not be a peer. There are companies whose business it is to act as the trusted third party. They issue digital documents to applicants in exchange for solid (sometimes physical) identification and money. These documents (called "digital certificates") contain at least the recipient's name, public key, the expiration date of the certificate, and the digital signature of the issuer. The issuer in these cases is called a certificate authority (CA). 2.2.3 Certificate Authorities A handful of commercial organizations have managed to win over enough certificate customers that the organization's assurance of a digital signer's identity is widely considered proof that the private key was issued to the proper person. (There may still be some question as to the identity of the current holder of the private key.) One example of such an organization is VeriSign. There are others and together they form their own web of trust. For example, the certificates issued by VeriSign are digital documents that need to be digitally signed. VeriSign's digital signature may be verified using another digital certificate from, say, Entrust. For relatively lower levels of security, VeriSign may sign its own certificates. The receiver is then left to decide for himself what level of trust he places in VeriSign. Likewise, some large well-known corporations-Ford Motor Company, for example-often sign their own certificates. 12 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 A hierarchy of CAs is envisioned and partly in place, but it is not clear who will act as higher authorities in such a hierarchy. Cross-certification is becoming popular for international EC. 2.3 Digital Signatures A digital signature, in conjunction with a digital certificate, is meant to provide a means for the receiver of a document to authenticate the identity of the sender. It also provides assurance to the receiver that the sender cannot later claim that the document was unauthorized (renouncing or repudiating the document). This works because the sender's identity is in theory linked to the document by the signature. Figure 1 illustrates the idea. 8 Turban, Efraim; Lee, Jae; King, David; Chung, H. Michael, 2000, Electronic Commerce: A Managerial Perspective, Upper Saddle River, New Jersey: Prentice-Hall. 13 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program Receiver Sender Document Document Hash Function Hash Function ############# Hash Value (or "Message Digest") ~~re_ Compar If same, signature is valid. (?) 0-*~ 4 / Encrypt with Sender's Private Key / / Decrypt with Sender's Public Key 4k / I I Digital Signature Digital Signature '4 (copy of) Sender's Digital Certificate Sender's Digital Certificate Figure 1. Digital Signatures [Turban 2000] 14 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 On the sender's end, the user supplies a document file, a certificate file, and a password to software that performs the following functions: 1) The document is "hashed" using a hash function. The result is a hash value or "message digest." The hash value is a number of fixed length. It is typically much shorter than the document, but long enough that it is extremely unlikely for two different documents to result in the same hash value. 2) The hash value is encrypted using the sender's private key, to which the software has access when supplied with a) the physical location of the key (for example, Drive 'C', Directory "Secret") and b) the user's password. The result of the encryption is the digital signature. The original document, the digital signature, and a copy of the sender's digital certificate are sent to the intended receiver via a network connection. The receiver also gets the unencrypted original document (via email, for example). To verify that the document is actually from whom it purports to be, she enters into her verification software the received document file, the sender's certificate, and the signature. Her software performs these functions: 1) The document is hashed using the same hash function. Note that there is a requirement for compatibility between the two users' software. 2) The sender's public key, taken from his digital certificate, is used to decrypt the signature. The result of the decryption is also a hash value. 3) If the two hash values created are the same, then the sender's document is unquestionably linked to the sender's password and the file that contains his private key. The assumption is that the sender himself is in turn linked to the password and private key file. A physical-world metaphor for the process is shown in Figure 2. 15 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 Sender Receiver Signed Document .. ' Signed Document Compare 123-45-6789 John Doe (copy of) Social Security Card 123-45-6789 9 John Doe (copy of) Social Security Card Figure 2. An Analogy for Digital Signatures In this metaphor, a document is signed (with a real signature), but not in the presence of the intended receiver. The document is sent to the receiver along with a signed copy of the sender's social security card, which is issued by a trusted authority (the United States government) and contains the sender's name and identifying code number (social security number). The receiver is left to compare the signature on the document to the one on the card to verify the identity of the sender. She may choose to interrogate the government to confirm that the social security number on the card is indeed linked to the name of the alleged sender. For many transactions, this level of security is acceptable. For others, it is not. 16 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 2.4 Document Encryption and Decryption Document encryption is used to keep the contents of a document secret, unlike digital signatures, which are used to authenticate the identity of the sender but allow transmission of the document in the clear. Encryption is an additional layer of security. It would not make sense to use document encryption without including a digital signature. The basic scheme for encryption and decryption is shown in Figure 3 and Figure 4. 17 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 Sender Functionc ## Hash VlHash Hash Value (or "Message Digest") Document Encryp~t with Sender's Private Key igi# Sn a Digital Signature Sender's Digital Certificate From Receiver Receiver's Digital Certificate - Encrypt with Symmetric Key Encrypt with Receiver's Public Key ................ 1 Digital Envelope To Receiver Encrypted Document Figure 3. Document Encryption (Sender) [Turban 20001 18 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 Receiver Hash Function ############# Hash Value (or "Message Digest") To Sender Document Receiver's Digital Certificate :. Decrypt with Onetime Symmetric Ke L Co m p a r .....-.........-........... Encrypted Document Senders Digital Certificate From Sender Digital Envelope Decrypt with Receiver's Private Key Digital Signature Decrypt with Sender's Public Key Figure 4. Document Decryption (Receiver) [Turban 20001 19 Hash Value Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program Refer to Figure 3. Document Encryption (Sender). 1) A digital signature is created in the same manner described previously. 2) The document, the digital signature, and the sender's digital certificate are concatenated and encrypted using a standard symmetric encryption key. The result is the encrypted document. A symmetric key is typically used only once, and then discarded. Documents are encrypted this way because the encryption algorithms using symmetric keys are more efficient (faster) than the ones using asymmetric keys. For large documents, the time difference can be significant. 3) The symmetric key itself, a number much shorter than the document, is encrypted with the receiver's public key. The result is called a digital envelope. No one other than the holder of the receiver's private key and password can access the key that will be used to decrypt the document. 4) The digital envelope and the encrypted document are sent to the receiver. For an illustration of the logistics on the receiving end, refer to Figure 4. Document Decryption (Receiver). 1) The digital envelope is decrypted using the receiver's private key. The result is the one-time symmetric key. 2) The symmetric key is used to decrypt the encrypted document. The result of that is: a) the document, b) the sender's digital certificate, and c) the sender's digital signature. 3) The digital signature is verified in the normal way. The advantage of dealing with a Biometric AuthorityTM is not affected by whether or not a document is encrypted. The BA adds a layer of security to the signature verification process whether or not the associated document was encrypted. Therefore, this thesis does not describe biometric authorization in the context of document encryption, only in 20 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 the context of digital signatures. It is enough to say that the two concepts (document encryption and biometric authorization) are compatible. 21 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program 3 Problems with Current Status The receiver of a digitally signed document may be assured of the signature's validity by a trusted third party. The third party, whether it is a well-known certificate authority or another individual in the receiver's "web of trust", cannot in general vouch for the identity of the person using the sender's signature at the moment the document was signed. All that a forger really needs is a copy of the file containing the alleged sender's private key and the password with which to gain access. The file is no more or less difficult to obtain than any physical item, like a real key. The password is also not particularly secure. According to a survey conducted at a 1996 hackers conference, 72 percent of the hacker respondents said that passwords were the "easiest and most common hack" used.9 This is because people are often not particularly careful with them-they write them down, use them for multiple purposes, and fail to change them often. In short, the weak links in PKIs are the password or PIN and the physical security of the encryption key. It is not the number of bits in the key or the type of algorithm used to encrypt. It comes down to the fact that there is a time delay between a) the physical linking of signature and person, and b) the validation process by which the signature is verified. The time delay may be as much as several months. During that time, the key and password have been exposed to compromise. The more that exposure time can be reduced, the more secure the system becomes. Ideally, the time would be a matter of seconds. For systems that require a level of security such that access may be granted in exchange for something the person has and something the person knows, PKI is perfectly adequate. For systems that require more, real-time validation-something the person is-is all that is really available. For that, biometrics are required. 9 "Body Parts", SC Magazine, www.scmagazine.com, February 2000. 22 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 4 Proposed Solution "I think a lot of companies or people believe security is about encryption, and in fact, that's really the easiest part of security. What really drives the Internet is the ability to know who you're dealing with, ensure it's confidential, and then have people confirm transactions that they can't get out of."'* In the physical world, a picture ID with a signature on it is required as proof of identity. Two such IDs are required if the transaction is really important. We propose that an equivalent layer of security be added to the existing Internet security infrastructure. Figure 5 illustrates what such a layer would look like in the abstract. 10 Entrust CEO John Ryan, quoted in "Entrust CEO John Ryan Discusses the Rapid Growth of the Internet Security Market on The IT Radio Network", Business Wire, www.businesswire.com, March 30, 2000. 23 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program Sender BiOfl Receiver Database lash Function H a sh Finction Compare and Biometric Data Contrast Authentiae m.t I 2.2 *~.! t -pro I II XI ~ I~- S.... ... .......... I Iiii i5 1 11 IC IIJI .1 Figure 5. Proposed Biometric AuthorityTm 24 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 4.1 Biometric AuthorityTM If you have some complex software, keep it from the bad guys with some simple software." In Figure 5, the existing procedure by which digital signatures are generated and verified (see Figure 2) is shown in faded colors. The Biometric AuthorityTM layer is bright. This layer operates as follows: 1) Biometric data (in this case, an electronically-captured handwritten signature) is processed by appropriate hardware. The resulting features (possibly encrypted with the BA's public key) are added to the hash value associated with the document to be sent. 2) The receiver of the digital signature decrypts it as always. The result is a hash value used for comparison as before, plus the biometric data features from the sender. 3) The receiver transmits these features to the BA. 4) The BA checks the biometric data received against its database. The database contains several previous submissions from the same sender. (At the very least, these previous submissions would be the initial set of samples provided during the sender's enrollment.) The BA checks that a) the sample's important features match closely those of the samples provided, and b) the sample's features are not duplicates of previous samples. 5) The BA transmits to the receiver of the digitally signed document a message that communicates the BA's level of confidence that the sender is who he says he is. A corresponding metaphor for our proposed security infrastructure is shown in Figure 6. Treese, G. Winfield; Stewart, Lawrence C., 1998, Designing Systems for Internet Commerce. Reading, Massachusetts: Addison Wesley Longman. 25 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program Receiver Sender Signed Document Signed Document ....................................... A uthenticated Professional Witness Collection of John Doe Signatures TM Figure 6. An Analogy for the Biometric Authority In our mnetaphor, the sender signs his document as always, but does so in the presence of a trusted third-party witness. This professional witness has on file a collection of previous signatures by this person. The sender also forwards his signed document to the 26 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 receiver. The witness (the Biometric AuthorityTM) checks the signature against his database and informs the receiver of his conclusions regarding its validity. 4.2 Enrollment Enrollment is the process by which the sender of a biometrically enhanced digital signature initially submits his biometric data. Several samples (on the order of five) are required. This data will constitute the database entries against which candidate signatures (transmitted from the receiver of a digital signature) are compared for validation. With each valid signature the BA sees, another sample is added to its database. The confidence score returned to the receiver is affected not only by the position of the new sample within the existing database population's feature space (see Section 5.2), but also the size of the population for a given customer (sender). 4.3 Identity Verification Perhaps the most important aspect of the enrollment process is the method of personal identification. We envision three possible choices for the customer, resulting in three corresponding security level ratings and at least three pricing structures. Lowest. At this level, we use a method of identification the same as certificate authorities such as VeriSign commonly use today. This enrollment process could take place entirely online via our secure website. The user would supply credit card information, address, telephone numbers, the full names of family members, and perhaps their social security number. Using currently available methods (credit card purchase patterns, for example), the Biometric AuthorityTM would establish a relatively high confidence that the person is who he says he is. The biometric data would then be linked to his identity in the BA's database. Medium. Here, we tie biometric data to a person's identity established as in the above case, but only in the physical presence of a trusted authority or enrollment officer. The enrollment officer will be an employee of the corporate client. The client must have previously established a position of trust. The corporation may choose and monitor their own enrollment officer, who will enroll other members of the corporation's staff. The 27 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program corporation will be held responsible for the enrollment officer's professional conduct and trustworthiness. Highest. This level of security is established in a way very similar to the medium-level case, with the exception being that the enrollment officer is an employee of the Biometric AuthorityTM. This official would be very highly paid, with an impeccable record of integrity. The costs, of course, will be passed on to the customers who choose this highest level of security. The enrollment may be done locally or at the BA headquarters. 4.4 Information Flow Figure 7 shows the flow of goods and services between parties. ublic Key Tech Su o Enrollment Data. Payments Figure 7. Biometric AuthorityTM Information Flow The customer supplies enrollment data-shown in black along with other less frequent transactions-to the BA. In exchange for payments, the BA supplies technical support and periodically updates its public key. The BA's public key may in some cases be used to encrypt biometric data at the sender's locale. Upon receipt, the BA will decrypt the data with its private key. The normal, more frequent data flow is shown in dark blue. It consists of: biometric data contained in modified digital signatures flowing from the customer to its various 28 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 receivers; biometric data from the receivers to the BA; and a verification score supplied to the receivers by the BA. 29 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program 5 Technology 5.1 Existing Biometric Technologies There are several biometric technologies on the market. They are in different stages of development and deployment, each with different advantages and disadvantages. This section outlines each of the options presently available.' 2 5.1.1 Handwritten Signature Verification (HSV) A number of hospitals, pharmacies and insurance firms use DSV to authenticate electronic documents. Other applications include U.K. Employment Services and Ginnie Mae mortgage form processing. Although it is considered one of the less accurate biometrics, there are minimal public acceptance issues. The only hardware required is the graphics tablets, which are available in the $50 range. Software cost varies with application. Our research group is actively pursuing the accuracy problem (see Sections 5.2 and 5.3). 5.1.1.1 Advantages of HSV Handwritten Signature Verification has several advantages that, we believe, outweigh its perceived accuracy problem. For one, the natural variations that occur between signatures by the same person-and are a cause for the problem of accuracy-are also a potential source of security (see Section 1.1). The same is true of voiceprints and perhaps face geometry. Signatures have the added benefits of wide public acceptance and minimal intrusiveness. No one gives a signature unwillingly and we all understand the legally binding nature of a signature. Face geometry may be analyzed without the person's permission; voiceprints have not historically been accepted (at least in the U.S.) as having legal standing compared to a written signature. 12 International Biometric Group, www.biometricgroup.com, April 16, 2000. 30 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 5.1.2 Voice Verification The user states a given pass phrase and the system creates a template based on numerous characteristics, including cadence, pitch, tone. Background noise greatly affects how well the system operates. Speaker verification is primarily used in applications where identity must be verified remotely over telephone lines. It is used to access voice mail systems at the University of Maryland, to deter cellular phone fraud by GTE, and to activate credit cards by telephone. It is also used to verify the identity of parolees under home incarceration. The software and hardware costs are minimal. 5.1.3 Face Geometry This technology uses a standard off-the-shelf camera to acquire the image of a face from a few feet. The systems are designed to compensate for glasses, hats, and beards. There are a number of identification applications with face recognition, including driver's licenses in Illinois, welfare fraud in Massachusetts, and INS pilots. Face recognition is used in InnoVentry's ATM's to verify the identity of check cashers. FaceVACS is an automated safe deposit system using face recognition. The US/Mexico border crossing project, SENTRI, used Visionics face recognition in conjunction with voice verification. India Oil Company uses Miros face recognition for a time and attendance application. There are also consumer applications that replace passwords for Windows NT and 98 login with face recognition. Public acceptance is relatively high and system cost is low at $100 per site for the software. 5.1.4 Finger Scan Fingerprint scanning is fast becoming a popular method of identity verification. There are a number of financial applications, including ATM verification at Purdue Employees Federal Credit Union and verification of customers at the teller window at Conavi Bank in Colombia. Compaq is shipping an Identicator finger scan unit with it computers. 31 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program Woolworth retail stores in Australia use Identix finger scan to verify the identity of 80,000 employees. The Turkish parliament uses finger scan to verify government officials when they vote. There is some resistance to the idea of submitting a fingerprint, because some feel they are being treated as a criminal. The idea seems to be gaining acceptance, however. Units range in cost from a hundred to a few thousand dollars, including hardware and software, depending upon the configuration. 5.1.5 Hand Geometry Hand geometry has been used for physical access and time and attendance at a wide variety of locations, including Citibank data centers, the 1996 Atlanta Olympics, and New York University dorms. Lotus Development Corporation uses hand geometry to verify parents when picking up children from daycare. The University of Georgia uses hand geometry to verify students when they use their meal card. The Immigration and Naturalization Services department of the U.S. government has rolled out an unmanned kiosk to expedite frequent travelers through customs. The system can currently be found in 8 airports, including San Francisco, New York, Newark, Toronto, and Miami. The finger/hand geometry systems do not raise many privacy issues and the technology is easy to use. They are still relatively expensive, at $1500 per unit. 5.1.6 Iris Recognition The iris is an excellent choice for identification: it is stable throughout one's life, it is not very susceptible to wear and injury, and it contains a pattern unique to the individual. A pilot project outfitted some automatic teller machines (ATMs) in England with iris recognition systems. A number of employee physical access and prisoner identification applications are in use, including correctional facilities in Sarasota, Florida and Lancaster, Pennsylvania. 32 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 Iris scanning raises two key public acceptance issues: intrusiveness and ease of use. Technology is advancing such that both concerns are being addressed, but widespread use is still some years in the future. IiScan plans to release home and office products in the first half of 2000, and hopes to break the $500 price barrier. 5.1.7 Retina Scan Retina scanning looks at the pattern of blood vessels in a person's eye. It is even less well developed than iris scanning and requires the user to place his or her eye very close to the scanner. This makes it more intrusive than other systems. 5.1.8 Keystroke Dynamics Verification is based on the concept that the rhythm with which one types is distinctive. It is a behavioral verification system that works best for users who can "touch type". Currently NetNanny is working to commercialize this technology. 5.1.9 A Comparison of Biometric Technologies Figure 8 is a chart that compares the above biometric technologies on four points: intrusiveness, accuracy, cost, and effort. Intrusiveness and effort are important factors in determining public acceptance. Accuracy and cost are factors important to the company using the technology, as is the higher-level concern of public acceptance. 33 Verification Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Amsbury, June 2000 Burl Program Management Thesis, MIT's System Design and Zephyr": Analyis Ham Gownety Key Stro M Dynaenics Fae Geoetry.DnneSnnn FlrigEW Sean Rotina 8n 6QMie@aa Its $can 0 CopyrghtI Intematnal 6io8 0 UtyW4 f etrte Group wwwbt4,metrkc _*A~qUrutCy rotpXm *Colo40 effor Figure 8. Comparison of Biometric Technologies (Reprinted with permission from International Biometric Group, a New-York based consulting firm) 5.2 Robust Design Methods Applied to Pattern Recognition The core technology upon which our signature recognition algorithm is based has its roots in pattern recognition. Handwritten signatures can be viewed as "noisy" patterns. The noise is mostly a result of the fact that people do not sign their name exactly the same way twice. Prof. Daniel Frey of MIT's Aeronautic & Astronautic Department has developed an approach to image recognition that uses robust design methods to help select the features best employed for classification. Historically, one of the biggest problems facing the development of handwritten signature verification (HSV) algorithms 34 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 has been the need to reduce the feature set while reducing both false positive and false negative error rates.13 The remainder of Section 5.2 is an adaptation of Prof. Frey's paper "Application of Wavelets and Mahalanobis Distances to Robust Design of an Image Classification System". 1 4 This image recognition algorithm uses wavelet transforms to extract features from images. These features were used to construct Mahalanobis spaces for each type of image in a set. The system classifies noisy images by comparing the Mahalanobis distance to all of the image types in the set and selecting the image type with the smallest distance. The system was tested using gray-scale bitmaps of four famous portraits. Robust Design methods were employed to optimize the selection of image features used to construct the Mahalanobis space. The optimized system employs only 14 coefficients for classification and correctly classifies more than 99% of the noisy images presented to it. 5.2.1 Function of the Image Recognition System This image recognition base case was originally developed as a case for use in teaching robust design and Mahalanobis distances. The system classifies gray-scale 5 representations of fine art prints-given a bitmap, it should respond with the title of the artwork represented. For purposes of this study, Prof. Frey chose four well-known portraits: DaVinci's "Mona Lisa", Whistler's "Portrait of the Artist's Mother", Peale's Gupta, Gopal and McCabe, Alan, 1997, "A Review of Dynamic Handwritten Signature Verification", James Cook University, Townsville, Queensland, Australia. 13 Frey, Daniel D., 1999, "Application of Wavelets and Mahalanobis Distances to Robust Design of an Image Classification System", presented at ASI's 17th Annual Taguchi Methods Symposium, Cambridge, Massachusetts. 14 In gray-scale images, a value of zero represents black while a value of 255 represents white. All the integers between are smoothly varying shades of gray between these extremes. 1 35 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 "Thomas Jefferson", and Van Gogh's "Self Portrait with Bandaged Ear". The lowresolution bitmaps (32x32) used in the study are depicted in the top row of Figure 9. In practice, if one were given an image to identify, various types of noise would likely affect it. In the case of images, sources of noise might include lack of focus, white noise or "snow" introduced during transmission, and off-center framing. Further, it may be desirable to correctly identify the image without prior knowledge of whether the image is a negative or a print. To simulate such noise conditions, the following operations in the following order were performed on each image to be classified: 1. The position of the image in the "frame" was shifted by -2, -1, 0, 1, or 2 pixels with equal probability. The shift was made both horizontally and vertically but the amount of the shift in the x and y directions were probabilistically independent. 2. The images were transformed into a negative with probability 0.5. 3. The image was blurred by convolving the image with a pixel aperture whose size varies randomly among 3, 4, and 5 pixels square. 4. The image was superposed with "snow" by switching each pixel to white with probability 0.75. Examples of the effects of these noises are depicted in Figure 9. The first row contains bitmaps of all four portraits without noise. Below each portrait are three noisy versions of the same portrait. The degree of noise is intended to be severe enough to make classification of the images difficult. 36 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 01 e Jw", -- Jr. r.A 16 r 46 IF Z9.I S 4:L Je ; k z-;I% VU-0 .4. k We Figure 9. The Images of Fine Art Prints Before and After Application of Noise Factors 5.2.2 Extracting Features Using Wavelets This section introduces wavelets and their application to image recognition. The goal is to provide just enough background to allow the reader to understand the Robust Design case study. The treatment will therefore be qualitative. For a more detailed mathematical introduction to wavelets in engineering, the reader may wish to refer to Williams and Amaratunga.16 A wavelet transform is a tool that cuts up data, functions, or operators into different frequency components with a resolution matched to its scale. 17 Therefore, wavelets are useful in many applications in which it is convenient to analyze or process data hierarchically on the basis of scaling. 16 Williams, J. R. and K. Amaratunga, 1994, "Introduction to Wavelets in Engineering", International Journal of Numerical Methods Engineering, vol. 37, pp. 2365-2388. 17 Daubechies, I., 1992, Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series, SIAM, Philadelphia. 37 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program Given the power of wavelets in extracting key features of an image based on a hierarchy of scales, Prof. Frey chose to employ them in the Robust Design of an image recognition system. The next section describes the way that the matrix of wavelet coefficients was used to construct Mahalanobis spaces used to classify images. 5.2.3 Classifying Images Using Mahalanobis Distances The Mahalanobis distance is a statistical measure of similarity of a set of features of an object to the set of features in a population of other objects. To compute the distance, one must construct a Mahalanobis space by computing the mean vector (pi) and covariance matrix (1) of the features in the training population. The Mahalanobis distance of any object within the space is a scalar that can be computed given a vector of features f by the formula MD (f)= n (f - p) E- (f - I)t(1) where n is the number of elements in the feature vector f. Objects can be classified using the Mahalanobis distance by constructing two or more Mahalanobis spaces. Classification is accomplished by computing the Mahalanobis distance in each space and selecting the shortest distance. In effect, this procedure allows one to determine which class of objects is statistically most similar to the object to be classified. Prof. Frey applied Mahalanobis spaces and wavelets to image recognition by using wavelet coefficients as the elements of the feature vector f. For each of the four portrait types (Mona Lisa, Whistler's Mother, Jefferson, and Van Gogh) he took the following steps: " Created a training population of 74 noisy images using the noise factors described in Section 5.2.1. " Took the two-dimensional wavelet transform of each noisy image based on the Daubechies four-coefficient wave filter. 38 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 * Selected a subset of the wavelet coefficients (the approach for selecting this subset will be described in Section 5.2.4) and assembled them into a vector. * Computed the mean vector and covariance matrix of the population. 50 0 401 0 0 0 0 30 0 0 8 0 0 0 - 20 0 0 0 0 0 0 O00 X 0 00 0 10 00 0 0 0 R 0 X X X 0 -< 10 o 0 0 Individual Mona Lisas X XX Individual Whistler's Mothers X X X 30 20 40 50 Distance to Mona Lisa Threshold Figure 10. The Mahalanobis Distances Applied to Classification of Two Fine Art Images To illustrate how Mahalanobis spaces constructed in this manner can be used for classification, consider the following example. The first 8x8 coefficients of the wavelet transform were used to construct the Mahalanobis space for the Mona Lisa and for Whistler's Mother. Then, a test population of forty noisy copies of both portraits was produced. The Mahalanobis distance of each portrait in the test population was computed in both spaces. Figure 10 graphically displays the results. The Mahalanobis distance of any noisy image to the Mona Lisa is plotted on the x-axis while the distance to Whistler's Mother is plotted on the y-axis. Circles represent the images created by adding noise to 39 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program the Mona Lisa while the images derived from Whistler's Mother are represented by X's. The Mona Lisa points tend to lie in the upper left because they are more similar to the training population derived from Mona Lisas. There is however considerable scatter among the test population of Mona Lisas due to the degree of noise applied. A similar pattern is evident in the test population of Whistler's Mothers except that, as one would expect, they tend to lie in the lower right. To classify an object, one may compute the Mahalanobis distance of that object within two or more spaces and select the class for which the distance is lowest. For the two-way classification task in Figure 10, this procedure implies that those points above the diagonal line are classified as Mona Lisa while those below the line are classified as Whistler's Mother. As Figure 10 shows, this will result in occasional mistaken classifications. In this case, the error rate is about 10%. The classification procedure described above was expanded to a four-way classification system by making bitmaps of the other two portraits, constructing their Mahalanobis spaces, and generating test populations. For the same set of noise factors, the error rate rose to 37%. In general, a classification task becomes more difficult as one increases the number of possible classes to which the objects may belong. In some cases, reducing the set of features in the vector f will reduce the error rate. Reducing the number of features also tends to increase the speed of the classification task and lowers the required size of the training population since the covariance matrix cannot be inverted unless the training population is at least the square of the length of the feature vector. This suggests that some systematic and efficient means of selecting the features would be of significant value. Taguchi has shown that orthogonal array experiments can serve this purpose well and several case studies have been published on the technique. The next section documents an adaptation of Taguchi's method to optimizing the image classification system. 5.2.4 Robust Design Procedure This section documents the application of Robust Design methods to select a set of wavelet coefficients for more effective classification of noisy images. The P-diagram for 40 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 the system is depicted in Figure 11. The signal factor is the image to be classified. As discussed in Section 5.2.1, the ideal function of the system is to provide the title matching the signal image despite the noise factors. The noise factors are the positioning errors, blurring, and snow discussed in Section 5.2.1. Rather than inducing noise through an outer array, we induced noise by performing 20 replications of each type of image. The reason for this choice is that the "snow" is actually 322 separate noise factors so that the outer array would be excessively large. As discussed in Section 5.2.3, the wavelet coefficients are the control factors that may be used to make the system more robust. The control factors each have two levels, level 1 implies the corresponding wavelet coefficient is "on" (included in the feature vector) and level 2 implies the corresponding wavelet coefficient is "off' (removed from the feature vector). In principle, there are 322 possible control factors to consider. To make the experiment size more manageable, we chose to investigate only the wavelet coefficients in the upper left 8x8 sub-matrix of the 32x32 matrix of wavelet coefficients. The number of control factors was thus reduced to 82 or 64. The number of features was further reduced to 63 to fit the experiment into the L64 (263) by eliminating the wavelet coefficient in the lower right hand corner of the matrix. Error in positioning Negative / positive image Blurring Snow Image to be classified i.eImage Mona Lisa, Whistler's Mother, Jefferson, or Van Gogh - -- Classification System Classification of image i.e., Mona Lisa, Whistler's Mother, Jefferson, or Van Gogh Wavelet coefficients used in the feature vector (l=ON, 2=OFF) Figure 11. P-Diagram for the Image Classification System 41 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 An important step in our Robust Design process was formulation of an appropriate signal to noise ratio (7) for the system. The image classification system is in many ways similar to a digital-digital (D-D) system'8. In a D-D system, two different values may be transmitted, 1's and O's. The ideal function is that whenever a 1 is transmitted, it should be classified as a 1 and whenever a 0 is transmitted, it should be classified as a 0. The image classification system under consideration here is an extension of the D-D system concept-it can receive and must classify four distinct classes of objects. However, as Phadke notes, "if it is at all possible and convenient to observe the underlying continuous variable, we should prefer it." 18 So, rather than employ the D-D signal to noise ratio, Prof. Frey sought to adapt a continuous signal-to-noise ratio (SNR) by observing an underlying variable, namely the Mahalanobis distance. The Mahalanobis distance is often used in a larger-the-better SNR with Mahalanobis distance of known "abnormal" individuals as the response to be maximized. However, this application is different from most applications of the Mahalanobis Taguchi system (MTS). In MTS, the goal is usually to distinguish "normal" individuals from "abnormal" ones by using a single Mahalanobis space. The image recognition system requires four Mahalanobis spaces instead of one. Also, in the present context, classification is accomplished via the ratio of two Mahalanobis distances rather than by setting a threshold value of a single Mahalanobis distance. Therefore, some modifications of MTS were required to optimize the image classification system. Referring to Figure 10, those Mona Lisa images that lie clearly above the diagonal "threshold" line are adequately distinguished from Whistler's Mother. If one constructs a line through the origin to any Mona Lisa, the steeper the slope of the line, the more clear the distinction. In other words, if we test the system with images known to be the Mona Lisa, we prefer a high ratio of distance to Whistler's Mother to distance to Mona Lisa. Phadke, Madhav J., 1989, Quality Engineering Using Robust Design, Englewood Cliffs, New Jersey: Prentice Hall. 18 42 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 Therefore, the signal-to-noise ratio for distinguishing Mona Lisas from Whistler's Mothers was defined as a larger-the-better SNR 19 where the ratio of Mahalanobis distances is the response to be maximized. The resulting SNR is 7 = -10log10 1 i=1 D Mi (2) (2) 2 where DMw denotes the Mahalanobis distance of a test image known to be a Mona Lisa within the Mahalanobis space constructed using known Whistler's Mothers. The four-way classification system proposed in this paper requires that every type of image must be easy to distinguish within the Mahalanobis space of every other type of image. Therefore, a signal to noise ratio for every pair of images must be evaluated. We chose to sum the 7 values for each possible pair to arrive at an overall signal to noise ratio 77=7m 7i+qv+7w 7i+7w 7m+7j )j (3) where, for example, a subscript of J designates the portrait of Jefferson and a subscript of V designates portrait of Van Gogh. The next section documents the results of executing the experimental plan described above. 5.2.5 Results The experimental runs in the L64 were carried out on a Pentium 2 based computer within the Mathcad programming environment. Table 1 contains the SNRs of each 19 Taguchi, G., 1987, System of Experimental Design, Dearborne, Michigan and White Plains, New York: ASI Press and UNIPUB-Kraus International Publications. 43 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 experimental run. There is a substantial difference among the SNRs for the systems tested; during the experiments, the SNR varied by over 50dB. It is interesting to note that the first row of the L6 4 , which leaves all the control factors on, is essentially the same system that generated the data depicted in Figure 10. The second row of the L64 turns off the last 32 wavelet coefficients and leaves on only the first 31 wavelet coefficients. This change already nets over a 50dB improvement in performance and drops the error rate to 8%. However, even greater improvement is possible through Robust Design methods as discussed below. Exp 7 Exp 27 Exp 77 Exp 7 Exp 7 Exp 7 1 2 3 4 19.5 75.0 55.1 61.4 11 12 13 14 46.4 36.1 31.6 36.1 21 22 23 24 41.9 46.6 37.1 38.2 31 32 33 34 34.1 31.9 28.3 33.0 41 42 43 44 6.5 18.3 27.3 35.9 51 52 53 54 40.0 50.1 58.4 54.6 5 6 7 38.0 39.2 41.4 15 16 17 35.3 35.8 54.4 25 26 27 28.0 42.2 26.2 35 36 37 37.8 41.7 46.9 45 46 47 44.4 54.4 40.5 55 56 57 61.1 56.1 -1.9 8 46.7 18 60.6 28 26.8 38 44.0 48 51.5 58 7.4 9 10 59.6 60.2 19 20 35.2 41.7 29 30 42.9 35.9 39 40 37.2 45.4 49 50 16.8 28.2 59 60 34.8 32.6 Exp 17 61 62 63 64 51.6 57.4 51.6 54.9 CF 61 62 63 Effect -0.63 -2.51 -5.22 Table 1. Signal to Noise Ratios from the L6 CF 1 2 3 4 5 6 7 8 9 Effect -2.58 -4.49 -3.82 5.26 -8.32 -11.6 -5.87 -2.83 20.80 CF 11 12 13 14 15 16 17 18 19 Effect 8.66 10.20 5.97 7.24 -0.72 -1.81 7.89 -3.22 -4.30 CF 21 22 23 24 25 26 27 28 29 Effect -2.10 -2.08 -5.92 -7.43 10.18 -4.73 -4.78 7.27 -1.92 CF 31 32 33 34 35 36 37 38 39 Effect -2.73 1.98 3.50 -1.88 6.94 -9.07 -7.26 -4.73 -9.98 CF 41 42 43 44 45 46 47 48 49 Effect -1.11 4.08 -4.69 -2.63 -4.11 -5.42 -3.45 -6.10 -1.83 CF 51 52 53 54 55 56 57 58 59 Effect -5.85 3.57 -5.11 -4.67 -0.99 -2.97 -5.13 -5.23 -1.05 10 11.93 20 0.20 30 -6.15 40 -6.79 50 2.86 60 -4.30 Table 2. Control Factor Effects An analysis of means was performed on the data from Table 1 to compute the control factor effects in Table 2. Positive control factor effects indicate that turning on the corresponding wavelet coefficient had, on average, a positive effect on the system's signal-to-noise ratio. It is interesting to note that only 17 of the 63 control factors had a 44 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 positive effect. The information was used to optimize the image recognition system as described in the next section. 5.2.6 Confirmation We chose to optimize the image recognition system by retaining in the feature vector only those 14 wavelet coefficients with positive control factor effects. When the optimized system was tested, it achieved a SNR of 110dB, higher than that of any of the systems tested in the L64 . This system had an error rate of less than 2%. D-D systems operate most efficiently when the error probabilities for transmission of 1 and 0 are equal. 2 0 Therefore a leveling operation is usually required for optimal performance. Since this system is similar to a D-D system, it too can benefit from a leveling operation. The probabilities for incorrect identification of each image with each other image were made approximately equal by modifying the slope of the threshold lines. With these minor adjustments, the error rate of the image classifier system was cut to less than 1%. Table 3 shows how the retained features are distributed in the wavelet coefficient matrix. The control factors analyzed in the study correspond to elements in an 8x8 wavelet coefficient matrix. In Table 3, the coefficients that were retained in the optimized system are signified by an 'X'. One way to interpret Table 3 is that those wavelet coefficients left on are those that make these portraits distinct from one another in the presence of this particular type of noise. The finer scale features (lower rows and right hand columns) tend not to be included in the feature vector. Some of the coarsest features (upper left corner) were also discarded from the vector. Most of the retained features have medium length scales. The specific pattern of retained features is critical to the success of the classification system, but is not intuitively obvious. The pattern is a function of the set of portraits to be classified and the noises applied to the images. Orthogonal array-based Phadke, Madhav J., 1989, Quality Engineering Using Robust Design, Englewood Cliffs, New Jersey: Prentice Hall. 20 45 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 experimentation was effective in discovering the pattern and thereby improving the performance of the classification system. X X X1 X X X X X X X X X X X Table 3. The Coefficients of the Optimized Matrix (X=ON, <blank>=OFF) 5.2.7 Summary Robust design methods have proven effective in tuning an image recognition system for reliable performance in the presence of noise. Wavelet transforms were helpful for extracting features from images on the basis of scale and Mahalanobis distances enabled those features to be used for classification in the presence of noise. Table 4 summarizes the results of application of this approach to a set of four famous portraits. Retention of a large number of wavelet coefficients in the feature vector tended to increase the required size of the training populations and degrade system performance resulting in error rates in excess of 30%. Formulation of an appropriate SNR and use of orthogonal array-based experimentation to choose a subset of wavelet coefficients substantially improved the system performance resulting in error rates of less than 1%. Original System Optimized System # of Wavelet Coefficients 63 14 17 19 dB 102 dB Error Rate 37% < 1% Table 4. Summary of Robust Design Results 5.3 A More Robust ClassificationMethod Two other members of Prof. Frey's research group, Fredrik Engelhardt and Jens Hacker, have invented a more effective way to choose classification features. It is called Principal component Feature overlap Measure (PFM) and outperforms MTS (see Section 5.2.4) significantly. In particular, Hcker and Engelhardt found that PFM achieves an 46 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 80dB higher signal-to-noise ratio and, even more impressive, does so using 75% fewer features. The method by which these features are chosen is the main thrust of this section. It is adapted directly from "Robust Manufacturing Inspection and Classification with Machine Vision" by Hdcker, Engelhardt, and Frey. 2 1 5.3.1 Feature Extraction The same academic example explained in Section 5.2.1 was used here. A wavelet transform was used in the same way to create a population of potentially useful features. Figure 12 shows how a feature vector was created from a picture using a smaller subset of features generated by the coefficient of a wavelet transform. Figure 13 illustrates how this feature vector relates to the comparison of methods for selecting optimal features for classification. Original picture Noisy picture Wavelet Reduced transform set of of noisy features Features Rearranged to vector picture Simulated manufacturing variations Data preprocessing and feature extraction Figure 12. Extracting a vector of features for each picture by using wavelet transformation 21 Hacker, Jens; Engelhardt, Fredrik; Frey, Daniel D., 2000, "Robust Manufacturing Inspection and Classification with Machine Vision", presented at the 33rd CIRP International Seminar on Manufacturing Systems, Stockholm, Sweden. 47 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program Class. fication Mona Lisa? VanGogh? Jefferson? Whistler's mother? Original picture Feature vector Data generation and preprocessing Comparison of feature extraction for classification Figure 13. Comparing two different ways of selecting the optimum features for classification To summarize, bitmap pictures of paintings are distorted by applying noise factors. The noisy pictures are then preprocessed using a wavelet transform. A subset of features representing the wavelet approximation of the picture are then extracted into a vector, f. The feature vector forms the basis for the comparison of the MTS and PFM methods. See also Figure 13. 5.3.2 Feature Evaluation Two methods for feature selection and extraction in a classifier system will be examined in the following passages. The first method is the Mahalanobis Taguchi System (MTS), which is often proposed within the Robust Design community. The second method is the Principal component Feature overlap Measure (PFM). 5.3.2.1 Mahalanobis Distance Classifier Mathematically, the process of classification is an attempt to find the population of already-classified samples that most closely fits the new sample in question. It amounts to an attempt to find the population of example observations X = [x 1,...,xN] that are most similar to the sample vector f that needs classification. Similarity (closeness of fit) between vectors can be measured using any of several different metrics. The 22 23 Mahalanobis classifier utilizes the Mahalanobis Distance (MD) metric ' to calculate a Shtirmann, J., 1996, Pattern Classification: A Unified View of Statistical and Neuronal Approaches, New York, NY: John Wiley & Sons. 22 48 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 distance measure between a vector f and a population X. The Mahalanobis classifier takes into account errors associated with prediction measurements (such as noise) by using the feature covariance matrix of X to scale features according to their variances. The formula for the Mahalanobis distance is given by 1 MD i (M)= - (f -p n ) (f -J )(4) where f denotes the input vector, i is the population mean vector and Zi is the covariance matrix calculated from the n features stored as row elements in the example data vectors Xj. The index i = 1,2,...k names the class of our k-class classifier. In this example problem, k = 4. When dealing with k-classes, every class ci provides the system with a class mean p and a feature covariance matrix Zi. For classification of f, the scalar Mahalanobis distance MD to each class is calculated. See Equation (4). As a result, we get k different MDs for our example vector f. The minimum selector then chooses the class ci with the smallest distance by comparing the calculated Mahalanobis distances (see Figure 14). input: vector f of image features (n-dimensional) MD - ut ut: name -5w image nm Figure 14. The k-class Mahalanobis classifier architecture Duda, R. 0., 1973, Pattern classification and scene analysis, USA: John Wiley & Sons. 49 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 With fewer features, a smaller number of enrollment samples will be required.2 4 Moreover, picking a small number of optimal features from the feature pool that minimize both the classification error and the classification time can increase time efficiency. A well-selected feature should be uncorrelated with other features and inexpensive to measure. A feature that takes values independent of its class is useless for classification. This means that good features should possess large interclass mean distance and small interclass variance. To extract knowledge from the classifier it is also preferable to find features that are explainable in physical terms. 2 5 Interpretation of classification in terms of design parameters is then enhanced. However, high accuracy of the classifier and a classification system that requires low effort are the most important parameters when selecting the optimum features for the classifier. 5.3.2.2 Mahalanobis Taguchi System (MTS) Mahalanobis Taguchi System (MTS) combines Taguchi's approach to quality engineering 26,27 with Mahalanobis' statistical measure of distance. MTS is for instance applied to picture identification 28 , medical studies for identification of patients in the risk Shfirmann, J., 1996, Pattern Classification: A Unified View of Statistical and Neuronal Approaches, New York, NY: John Wiley & Sons. 24 Kil, D. H. and Shin, F. B. 1996, Pattern Recognition and Prediction with Applications to Signal Characterization, Woodbury, NY, USA: AIP Press. 25 Taguchi, G., 1993, Taguchi on Robust Technology Development: Bringing Quality Engineering Upstream, New York: ASME Press. 26 Phadke, Madhav J., 1989, Quality Engineering Using Robust Design, Englewood Cliffs, New Jersey: Prentice Hall. 2 Nagao, M.; Yamamoto, M.; Suzuki, K.; and Ohuchi, A., 1999, "A robust face identification system using MTS", presented at ASI's 17th Annual Taguchi Methods Symposium, Cambridge, Massachusetts. 28 50 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Buri Amsbury, June 2000 zone of having a certain disease 29 , and robustness and fault awareness of a flight control system when some in data are missing 30 For MTS we have to define a measure for the performance of our classifier in each experiment. If we run an experiment that classifies a number of known test images, then the classification rate is not a sufficient measure since it tells us nothing about the robustness of the system. The performance of each Taguchi experiment is therefore measured as signal-to-noise ratio. Many different SNRs have been proposed. We will define the SNR for the presented classification task according to q. [dB]= -10 log (i Di)ij1..k classe examples M 12j- (5) and i# When we present one of our test pictures f to the system, the classification is accomplished via the comparison of Mahalanobis Distances. Whenever we compare the MD of an image belonging to class cj with its MDs that are calculated in spaces belonging to a different classes ci, we want our system to have a high ratio MDi/MDj. Obviously the ratio MDi/MDj has to be greater than 1 to be classified correctly by the minimum distance selector. Figure 15 shows a comparison of the MDs of two classes when a population of noisy images is classified. In our k-class classifier, we have to compare the separation of each pair of classes. Taguchi, G. and Jugulum, R., 1999, "Role of S/N ratios in Multivariate Diagnosis", Quality Engineering,vol. 7. 29 Matsuda, R.; Ikeda, Y.; and Touhara, K., 1999, "Application of Mahalanobis Taguchi System to the fault diagnosis program", presented at ASI's 17th Annual Taguchi Methods Symposium, Cambridge, Massachusetts. 30 Phadke, Madhav J., 1989, Quality Engineering Using Robust Design, Englewood Cliffs, New Jersey: Prentice Hall. 31 51 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 I_ 1s Classification Mona Lisa Is o 0) 0 0 Is selection & PCA (PFM) Classification Mona Lsa o I Mona Lisa o Features: n=13 Van Gogh 0 Classification Features: n=63 Van Gogh 0 Features: n=3 Van Gogh o M 0.01 R~, C 0 Statistical feature MTS Full feature classifier 00 Thresholdg , (a)(b)(c C~ kb (a)temdB] (c) b) thehodn as shw in Fiur 1 c. Figet ure g15ba Comasinger elrasse epaainandurobsteRsysm.he clsierN optiizates performings al xprmet b nayinherslsa bymensofANM Thaggrhegaitne bthaevlaes robushtnesoftecaiiain system an lisao iilb outades caluledy theNRsifeiels (dB)iaton ther means) 32oFierst This givesin ctssbutionooflassiagesoi eah feates ofefracated b addinged sige the SNfeatpultionth two seps. Choosinrgae onyfetre that contruate to getfour NewaJe se ie ostiely o the classification wilten resalulte in HanA fea SNasuof ofeer eeryctiture. iture .Snceaesuto reducqution that5.hratr increasesr the performance of theovrl systemca the osesatesyscan 32 Phcadke Maha sumin 1989, Qalit Engsinern UNsin -1ktrbst De csig, cliffs, NwJre:Petc al 52 r Enlwood Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 5.3.2.3 Principal component Feature overlap Measure (PFM) A second competing approach for feature selection is described in this section. The method is based on principal component transformation of wavelet transformed picture data and a feature overlap measure using the feature probability mass functions. The parts of the PFM approach can separately be found in statistical literature, 34 . The PFM method was assembled in order to find out if it is possible to make use of the statistical information provided by the sample data, which is mostly ignored when MTS is applied to a problem. 5.3.2.3.1 Feature Overlap Measure (FM) We try to measure the effectiveness of each feature in a single-dimension feature space using the statistical data in the training samples Xi. By looking at probability mass functions p(vplci), the probability of a feature f, of class ci having the value v, one can come up with a measure of the feature overlap among various classes. Figure 16 gives an example of this idea. The probability mass functions (PMFs) for the feature f11 , p(v 1 I cMonaLisa) and p(v1 1 !CVanGogh) in Figure 16a show good separation whereas the PMFs for feature f8 , p(v8IcMonaLisa) and p(v81cvanGogh) in Figure 16b show strong overlapping and are, hence, not useful for separating the two classes. 33Kil, D. H. and Shin, F. B. 1996, Pattern Recognition and Prediction with Applications to Signal Characterization, Woodbury, NY, USA: AIP Press. 34 Johnson, R.A. and Wichern, W.W., 1998, Applied Multivariate Statistical Analysis, 4th edition, Upper Saddle River, NJ USA: Prentice-Hall. 53 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 0.06 01 - -Van Gogh 0.08 - 1-24.-0.07 0.03 . 20 0 ~ 0.01 0 Mona Lisa -0.08Van Gogh Mona Lisa 04 . 0020 00 0.01 -- -600 0 -400 200 Value v 0 200 400 C 600 -400 -300 of feature f,, -200 100 0 Value (a) 100 200 300 400 500 600 v of feature f8 (b) Figure 16. Probability Mass Functions of two different features shown for two classes J is an objective function that measures the degree of feature space overlap in the feature distribution functions p(vpIci): k J(p)=1-J7Jp(v, Ic,). (7) i=1 The Multimodal Overlap Measure (MOM)35 for population p (MOMp) value calculated for every feature fp gives a relative measure of the importance of each feature for the classification. Feature selection in the PFM scheme consists of calculating the feature rating and then picking the features that lie over a certain threshold. However, picking to many features will result in a decrease of the system performance. Selecting the threshold value can be automated by calculating the S/N ratio for different number of features, since the MOM ranking of the features define the relative importance and thereby defines which features to select for each threshold value. We called this approach of feature selection by measuring the feature overlap FM. The attractiveness of this single-dimension feature optimization approach also lies in the computational simplicity. We only need the features' probability mass functions (PMF), which can easily be approximated by computing the histogram of each feature f, for each D. H. and Shin, F. B. 1996, Pattern Recognition and Prediction with Applications to Signal Characterization. Woodbury, NY, USA: AIP Press. 3Kil, 54 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 class ci. To further improve the results and overcome the shortcomings with potential correlated features in the method proposed above we choose to add a principal component transform of data before performing the feature selection. 5.3.2.3.2 Principal Component Transform The biggest shortcoming associated with single-dimension feature optimization algorithms is that they fail to take potential feature correlation into account. We therefore used a principal component transform to decorrelate the features so the optimization could be performed separately. The diagonal elements of a covariance matrix Z measure the variance of each feature while the degree of correlation between features is captured in the off-diagonal elements. Features can be decorrelated by forcing the off-diagonal terms to zero by applying an orthogonal transformation such as the Principal Component (PC) transformation.36 To find the PC transformation the global covariance matrix Ex for the original training data sets Xsystem = [X 1 ... Xk] is estimated. Then the eigenvalues-eigenvector decomposition for the feature covariance matrix Z2 is determined, that is, S=DA<T where A is a diagonal matrix with the eigenvalues of Z, in a decreasing order and (8) OTD= [ui,..,u,] is a normalized matrix with corresponding eigenvectors ui of Ex. With this choice of transformation the covariance matrix for the transformed data is Zy= A a diagonal matrix build up with the eigenvalues of the PC analysis. The transformation of the data can now be computed by Johnson, R.A. and Wichern, W.W., 1998, Applied Multivariate Statistical Analysis, 4th edition, Upper Saddle River, NJ USA: Prentice-Hall. 36 55 tud of andriten ignaureVerfictio CseEterris Lauch:Ahrouh CoreTecnolgy Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 Y =4DX (9) (10 fy =<WX ) Ysystem = [Y 1 ... Yk] is the PC transformed data used for training the Mahalanobis classifier and fy is the input data to classify, which has to be transformed as well. The index y indicates that the data is transformed into the PC system. We find that features are ranked based on their variances or eigenvalues in the transformed space when using the MOM feature extraction method. Features with large eigenvalues are important since they span the signal subspace while those with small eigenvalues span the noise subspace and are not major contributors to class separation37 (see Figure 17). Scatter plot for raw data Scatter plot for PC transformed data 4000 3900 - So0 "Mona Lisa Mona Lisa 0 Whistle s Mother 0 V VWs Mohe 3700 - 3400 3500 3000 3100 3200 3300 -0 3400 3500 Feature 3600 3700 3800 3900 G S -0 300 0 -4 4000 000 -3800 -3600 fg -30 -300 -3000 -2800 -200 Feature f1 (b) (a) of two features for the (a) raw data and (b) the PC transformed data. The feature chosen for this projection had the highest ranking in the two Figure 17. Feature subspace (scatter plot) optimization approaches MTS and PMF. In essence, the PC transformation is class independent and performs a coordinate rotation that aligns the directions of maximum variance with the new transformed coordinate Kil, D. H. and Shin, F. B. 1996, Pattern Recognition and Prediction with Applications to Signal Characterization. Woodbury, NY, USA: ALP Press. 37 56 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 axes. There is no guarantee that uncorrelated features are good discriminants between classes because the features defined by principal component transform are not optimal with regard to class separability 8 , but they were effective for the given problem. 5.3.2.4 Results A direct visual comparison of the SNR is shown in Figure 18. Figure 19 shows the number of features used. 260 System S/N-Ratios Compared . 0 Daubechies Haar, db4 ps=0.75 Haar, Haar, ps=0.85 ps=0.95 MMTS * FM * PFM 0 Full Feature Classifier Figure 18. Signal-to-noise ratios achieved with different classifier optimization schemes Classification Features 70 60 -- - - - - -- ~- - - - - 50 40 30 20 .0 E z 10 -1V 0 Daubechies db4 Haar, ps=0.75 0 Full Feature Classifier Haar, ps=0.85 U MTS U FM Haar, ps=0.95 U PFM Figure 19. Number of features used for optimal classification Chen, C.H. and Wang, P.S.P., 1999, Handbook of Pattern Recognition & Computer Vision, second edition, Singapore: World Scientific. 38 57 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Burl Amsbury, June 2000 Thesis, MIT's System Design and Management Program 5.3.3 Summary Reliable classification can be carried out based on both the PFM and the MTS method. The accuracy of the PFM approach is as good or better than the MTS method. Robustness in the face of several noise sources is better when the PFM method is used, which is indicated by larger signal-to-noise ratios. The computations for classification can be carried out using a lower number of features (i.e. parameters) when working with the PFM method. 5.4 Handwritten Signature Verification (HSV) Technology 5.4.1 Challenges As can be seen in Figure 8 of Section 5.1.9, handwritten signature verification has historically fallen short in accuracy. When accuracy is improved, signatures will be a very attractive biometric. Broadly speaking, there are two ways to compare signatures: with statistical features and by comparing shape. Statistical features might include such things as total pen-down time, path length, tangent angles, velocity profiles, acceleration profiles, and others. The best techniques for signature verification are likely to take both statistical features and shapes into account. 39 When verifying identity using any technique, there are two different types of errors. False acceptance rate (FAR) is the percentage of invalid attempts (forgeries) that get accepted as authentic. False rejection rate (FRR) is the percentage of valid attempts that get thwarted. There will generally be a trade-off between these two error types. If a system is optimized to reduce FRR, FAR will tend to be high and vice versa. A sensible measure of accuracy would be the sum of FRR and FAR. Where in the FRR-FAR spectrum a system is best designed depends on the application. For a third-party identity 39 Gupta, Gopal and McCabe, Alan, 1997, "A Review of Dynamic Handwritten Signature Verification", James Cook University, Townsville, Queensland, Australia. 58 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 verification application, we feel that the thresholds for acceptance or rejection should really be left to the receiver of our customer's digital signature. To do this, the BA simply returns to the receiver a confidence score, rather than acceptance or rejection. Then, if the receiver is satisfied with a 95% confidence score, she may accept it. This leaves us to develop the technology to reduce overall accuracy as much as possible without concerning ourselves with the optimization of FAR vs. FRR. 5.4.2 New Technology 5.4.2.1 Signature Feature Extraction Data from a signature capture device is in the form of a stream of time-tagged position and perhaps pressure data. Figure 20 shows an example of position data captured from the author's signature. From time and position, an algorithm can deduce velocity profiles. Our feature extraction algorithm partitions this data into "snippets." Each snippet is the data for, say, 1 /2 0 th of the total time spent signing. Typically, this would equate to one letter or a bit less. 0 2000 4000 6000 8000 10000 12000 14000 16000 -6000 -6500 -7000 -7500 -8000 -8500 -9000 -9500 -10000 -10500 2 -11000 Figure 20. Signature Data When comparing two signatures (see Figure 21), it is often necessary to resample one so that the time scales are normalized. After resampling, the orientation and size are 59 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 normalized so that each snippet is as close in size and position as possible to its counterpart from the other signature. 5000 1 11 -6000 -7000 -8000 - -9000 1*10 4: - -1.1010 4 0 2000 4000 6000<60> < O Buril ,Burl2 12104 1.2*10 . 4 1.4010 1.3275-10 Figure 21. Comparing Two Signatures The size/orientation scaling is done using a simultaneous optimization of the errors between position samples in corresponding snippets. Data from these normalized snippets are then used to populate a feature list from which the best features are chosen. In this way, we choose statistical features from snippets of a signature, which are shapebased. Thus, our feature extraction method uses both shape and statistical features. 5.4.2.2 Signature Feature Selection Feature selection is the process of choosing the best features from a list comprised of several dozen statistical features from several snippets of a signature (as outlined in Section 5.4.2.1). Selection is done using an adaptation of methods described in Sections 5.2 and 5.3. The result is expected to be a highly reliable signature verification algorithm, competing quite favorably with existing products on the market. 60 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 6 Certificate Authority Business Model IDC predicts the market for PKI products and certificate authority (CA) services will grow from $122.7 million in 1998 to $1.3 billion in 2003.40 The big-name certificate authorities offer several products and services that complement their CA business. With their identification and security suites of products, along with their recent flood of acquisitions, it seems that these companies are aiming to be the DMVs of the Internet. By far, the two biggest players are Entrust and VeriSign. 6.1 Entrust Entrust is the largest competitor in the CA market space, with 46% of the product market share and 35% of the overall market. 40 In 1999, they logged a net income of $6 million on $85 million in sales. Customers include J.P. Morgan, Schumberger, and the U.S. Postal Service. Nortel Networks, the Canadian telecom giant, owns 53% of the company.41 The products and services Entrust offers include: * Digital certificate issuing. e Software and support that allow the customer to issue certificates. e PKI systems for email, web browsers, and software code signing. - Systems that support the integration of wireless technology with PKI and network security. e VPNs. - File encryption and decryption. 40 "IDC Report Says Entrust Technologies is "the Big Gorilla" in PKI Software Products: Entrust Market-share Trend Increases Year Over Year", Business Wire, www.businesswire.com, March 1, 2000. 41 Hoover's Company Capsule Database - American Public Companies, Hoover's, 2000. 61 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 * Products for non-commercial (individual) encryption/decryption, digital signatures, and PKI. * Business-to-business (B2B) site security and site access control software. * Developer tools for security, PKI, and encryption. Enterprise Resource Planning (ERP) security. In March 2000, Entrust acquired CygnaCom Solutions for $16 million. CygnaCom sold computer security and PKI consulting services to the U.S. federal government. 6.2 VeriSign VeriSign provides essentially the same product and service line as Entrust. They also offer dedicated training seminars and courses. Their net income in 1999 was $4 million on $85 million in sales, turning a profit for the first time since inception in 1995. VeriSign became a public company with an IPO in January 1998. In 1999, VeriSign was the first commercial venture to win approval from the U.S. Commerce Department to expand sales of its more powerful encryption products to overseas companies. They have since made several corporate acquisitions. By April 2000, VeriSign had bought Thawte Consulting for $575 million, Signio for $745 million, and Network Solutions for $21 billion. Thawte was a provider of digital certificate products with a global customer base. Signio provided Internet payment systems and services. Network Solutions is an Internet domain name registrar. The contribution to profitability of the various businesses in which VeriSign (and Entrust) engage is not public information. However, VeriSign's website gives some idea of the pricing structure for the CA business. A personal digital ID, which works with certain email programs and allows digital signing, verification, and encryption/decryption, sells for $15 per year. Practically speaking, though, it is a free service since one can get a 30-day free trial from VeriSign or any competitor with a similar product, including Entrust and PGP. Identification of the customer is done via the Internet, although the process takes about three days indicating that identification is done 62 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Bur Amsbury, June 2000 largely off-line. Information required to complete the process is the customer's name, email address, challenge phrase (password), and credit card information. Digital certificates for small-business web sites cost between $250 and $1200 per year. A complete enterprise digital certificate service costs between $10,000 and $500,000 per year. An "affiliate" (another certificate authority licensing VeriSign's certificate authority software) will pay an initial fee of $250,000 to $2 million, and royalties of 20% to 70%, on every certificate issued. 6.3 Some Other Network Security Companies 6.3.1 Pretty Good Privacy PGP is the company started by Phil Zimmerman, who invented the PKI (and was charged by our federal government with violation of U.S. export restrictions of cryptographic software). PGP distributes their version as freeware via MIT. The fact that it is free has led to its organic expansion all over the globe. In 1996, after three years, the government dropped its case against Zimmerman without indictment. PGP now also offers firewall solutions, intrusion detection and risk assessment solutions, VPNs, data security software, and payment processing solutions. 6.3.2 InterTrust Technologies InterTrust sells technologies that protect and manage the rights and interests in digital information of artists, authors, producers, publishers, distributors, traders and brokers, enterprises, governments and other institutions, and consumers. They also provide the utility services needed for security, interoperability, and trust of the global system. InterTrust technology is designed to protect digital information, apply rules persistently after information is distributed, and automate many of the commercial consequences of using the information. 6.3.3 SAFLINK Corporation SAFLINK provides enterprise solutions for biometric verification and authorization. Their software is designed to replace passwords with biometric data. Customers can 63 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 automate a corporate-wide system so that when an employee logs in using biometry, the appropriate access is provided to that person. SAFLINK products are concerned with identification and access control. They do not (yet) offer a solution for third-party verification of a digitally signed document sender's identity. Our Biometric AuthorityTM concept will fill that need. 64 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Buri Amsbury, June 2000 7 Proposed Biometric AuthorityTM Revenue Model Projected total sales of security hardware and software products: $5.3 billion in 2001, $8 billion in 2003. Prediction: 62% projected compound annual growth rate for sales of PKI products from 1998 to 2003.4 We envision that the Biometric AuthorityTM will be implemented as an Application Service Provider (ASP). As such, there are three base ways to charge customers. Any combination of these may also be considered. Customer's sample population size: This is a way to link the database storage overhead for a customer's biometric data to the price charged. The amount of storage required in the BA's database will be directly related to the amount of use that customer is getting from the BA's service. Every time the customer sends a biometrically enhanced digital signature to someone who has it verified by the BA, the BA stores another biometric data sample for that customer. Each billing period, the increase in storage space results in a higher cost, eventually reaching a maximum charge rate. * Per use: We would charge the customer a small amount each time a biometric sample is submitted to the BA from a receiver. * Flat rate: Monthly or quarterly charges would be billed. Just as many service providers do, a Biometric AuthorityTM would likely provide several payment options, each of which is some combination of the base methods listed above. Other considerations that will affect payment rates are the security level desired by the customer (see Section 4.3) and the length of the contract with the BA. Willson, Cheryl J., April 2000, "The global market for security products and services will top $15 billion by 2003", Red Herring. 42 65 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 Another very significant source of revenue would be licensing, as is the case for CAs. The original Biometric AuthorityTM would license certain intellectual property and software to other BAs, thereby creating a loose hierarchy of BAs. 66 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 8 Summary Q. I understand that I can now file my taxes electronically. How does that work? A. It's easy. You fill out some forms on your computer, then log on to the Internet. Within seconds, all of your personal financial information is in the hands of a 17-year-old hacker known as The DataBooger. A system is designed to answer the need for better security infrastructure. It makes use of what we call a Biometric AuthorityTM, which acts as a trusted third-party witness and centralized database of biometric samples. The sender would have enrolled in this service at an earlier time, supplying irrefutable identification and several biometric samples (signatures, for example). When digitally signing a document, the sender's real signature (supplied in real time) gets folded into the digital signature. The receiver sends the biometric data to the BA, which will be implemented as an application service provider. The BA then compares the data with samples in its database. If the sample is similar enough-and is not exactly the same as-the database samples, then the receiver is notified of a positive match. The signature verification algorithm makes use of newly developed robust design methodologies designed in collaboration with other members of Professor Frey's research group. These methodologies are adaptations of MTS applied to features extracted from signatures according to a unique snippet-matching algorithm. The resulting feature list is then pared down significantly using the new PFM technique. Future work will pursue a reduction to practice of the handwritten signature verification techniques developed to date. The algorithm software will be marketed commercially as part of the core competency of a Biometric AuthorityTM, which we intend to commercialize. The targeted customers for such a venture are certificate authorities. We believe that by giving people the option of eliminating anonymity on the Web, we offer security as well as privacy. 4 Dave Barry, April 16, 2000, "Unhappy returns", The Boston Globe Magazine. 67 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 9 Bibliography Armstrong, Illena, April 2000, "Computer Crime Spreads", SC Magazine, www.scmagazine.com. Chen, C.H. and Wang, P.S.P., 1999, Handbook of Pattern Recognition & Computer Vision, second edition, Singapore: World Scientific. Daubechies, I., 1992, Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series, SIAM, Philadelphia. Duda, R. 0., 1973, Pattern classification and scene analysis, USA: John Wiley & Sons. Frey, Daniel D., 1999, "Application of Wavelets and Mahalanobis Distances to Robust Design of an Image Classification System", presented at ASI's 17th Annual Taguchi Methods Symposium, Cambridge, Massachusetts. Gupta, Gopal and McCabe, Alan, 1997, "A Review of Dynamic Handwritten Signature Verification", James Cook University, Townsville, Queensland, Australia. Hicker, Jens; Engelhardt, Fredrik; Frey, Daniel D., 2000, "Robust Manufacturing Inspection and Classification with Machine Vision", presented at the 33rd CIRP International Seminar on Manufacturing Systems, Stockholm, Sweden. Harrington, Ed, February 1999, "Digital Certificates: Proven Technology, Upcoming Challenges", SC Magazine, www.scmagazine.com. Johnson, R.A. and Wichern, W.W., 1998, Applied Multivariate Statistical Analysis, 4th edition, Upper Saddle River, NJ USA: Prentice-Hall. Kil, D. H. and Shin, F. B. 1996, Pattern Recognition and Prediction with Applications to Signal Characterization, Woodbury, NY, USA: AIP Press. Matsuda, R.; Ikeda, Y.; and Touhara, K., 1999, "Application of Mahalanobis Taguchi System to the fault diagnosis program", presented at ASI's 17th Annual Taguchi Methods Symposium, Cambridge, Massachusetts. Nagao, M.; Yamamoto, M.; Suzuki, K.; and Ohuchi, A., 1999, "A robust face identification system using MTS", presented at ASI's 17th Annual Taguchi Methods Symposium, Cambridge, Massachusetts. Neumann, Peter G., 1995, Computer Related Risks, Reading, Massachusetts: AddisonWesley. Phadke, Madhav J., 1989, Quality Engineering Using Robust Design, Englewood Cliffs, New Jersey: Prentice Hall. Shtirmann, J., 1996, Pattern Classification: A Unified View of Statistical and Neuronal Approaches, New York, NY: John Wiley & Sons. Taguchi, G., 1987, System of Experimental Design, Dearborne Michigan and White Plains New York: ASI Press and UNIPUB-Kraus International Publications. Taguchi, G., 1993, Taguchi on Robust Technology Development: Bringing Quality Engineering Upstream, New York: ASME Press. 68 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 Taguchi, G. and Jugulum, R., 1999, "Role of S/N ratios in Multivariate Diagnosis", Quality Engineering,vol. 7. Teshima, Shoichi, Tomonori Bando, and Dan Jin, 1998, "A Research of Defect Detection using the Mahalanobis-Taguchi System Method." American Supplier Institute Robust Engineering Conference Proceedings, pp. 167-180. Turban, Efraim; Lee, Jae; King, David; Chung, H. Michael, 2000, Electronic Commerce: A Managerial Perspective, Upper Saddle River, New Jersey: Prentice-Hall. Treese, G. Winfield; Stewart, Lawrence C., 1998, Designing Systems for Internet Commerce, Reading, Massachusetts: Addison Wesley Longman. Williams, J. R. and K. Amaratunga, 1994, "Introduction to Wavelets in Engineering." International Journal of Numerical Methods Engineering, vol. 37, pp. 2365-2388. Williams J. R. and K. Amaratunga, 1993, "Matrix and Image Decomposition Using Wavelets." Proceedings MAFELAP '93, Eighth International Conference on the Mathematics of Finite Elements, Brunel, England. Willson, Cheryl J., April 2000, "The global market for security products and services will top $15 billion by 2003", Red-Herring. "Entrust CEO John Ryan Discusses the Rapid Growth of the Internet Security Market on The IT Radio Network", Business Wire, www.businesswire.com, March 30, 2000. "Body Parts", SC Magazine, www.scmagazine.com, February 2000. "IDC Report Says Entrust Technologies is "the Big Gorilla" in PKI Software Products: Entrust Market-share Trend Increases Year Over Year", Business Wire, www.businesswire.com, March 1, 2000. "Security Roadmap", SC Magazine, www.scmagazine.com, December 1999. Hoover's Company Capsule Database - American Public Companies, Hoover's, 2000. International Biometric Group, www.biometricgroup.com, April 16, 2000. 69 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 10 Acronyms ANOM - ANalysis Of Means ASP - Application Service Provider ATM - Automatic Teller Machine B2B - Business-to-Business B2C - Business-to-Consumer BA - Biometric AuthorityTM CA - Certificate Authority CIPD - MIT's Center for Innovation in Product Development D-D - Digital-to-Digital DMV - Department of Motor Vehicles EC - Electronic Commerce FAR - False Acceptance Rate FM - Feature overlap Measure FRR - False Rejection Rate HSV - Handwritten Signature Verification ID - IDentification IDC - International Data Corporation IRS - Internal Revenue Service IT - Information Technology MD - Mahalanobis Distance MIT - Massachusetts Institute of Technology MOM - Multimodal Overlap Measure MTS - Mahalanobis-Taguchi System 70 Core Technology Through Enterprise Launch: A Case Study of Handwritten Signature Verification Thesis, MIT's System Design and Management Program Burl Amsbury, June 2000 PC - Principal Component PKI - Public Key Infrastructure PIN - Personal Identification Number PGP - Pretty Good Privacy PMF - Probability Mass Functions PFM - Principal component Feature overlap Measure PTO - U.S. Patent and Trademark Office SNR - Signal-to-Noise Ratio TCP/IP - Transmission Control Protocol/Internet Protocol VPN - Virtual Private Network 71