CIS4330: Professor Kirs Artificial Intelligence An Overview of Artificial Intelligence Slide 1 CIS4330: Professor Kirs Artificial Intelligence Slide 2 TOPICS AI Background • How did we get here and Why? Natural Language Processing (NLP) • How do you deal with Symbolic Representations? Neural Networks • How can machines be made to emulate humans? CIS4330: Professor Kirs Artificial Intelligence Slide 3 AI Background • People have always been fascinated with giving machines human Abilities • • C. 270 BC: An Greek engineer named Ctesibus made organs and water clocks with movable figures. Jacques de Vaucanson (1709 -1782) created a mechanical duck that ate and drank with realistic motions of head and throat, produced the sound of quacking, and could pick up cornmeal and swallow, digest, and excrete it. • Mary Shelley’s Book Frankenstein (1818) • 1921: R.U.R. (Rossum's Universal Robots): A Play by Karel Capek • “Robot" comes from the Czech word "robota" (forced labor) • The Movie Frankenstein (1931) • Science fiction writer Isaac Asimov first used the word "robotics" to describe the technology of robots and predicted the rise of a powerful robot industry (1941) Robot from the 1921 play "R.U.R." CIS4330: Professor Kirs Artificial Intelligence Slide 4 AI Background • People have always tried (unsuccessfully) to figure out how the brain works • McCulloch and Pitts (1943) developed a (workable) mathematical model of brain (networks of neurons) functioning (Binary, Since firing is an ‘all-ornone’ process) • Influenced John von Neumann (1945: Stored Programs) • Led to the use of Neural Networks (discussed later) • Encouraged the development of Perceptrons (Learning Systems) • Turing Test (1950) • Newell and Simon (1954) • conceived of using computer programming language to build theories of human symbolic behavior • showed how a wide range of cognitive processes in problem solving and problem understanding can be explained in information-processing terms and modeled with computer programs. CIS4330: Professor Kirs Artificial Intelligence Slide 5 AI Background • Arthur Samuel’s Checker Program (1955) • First ‘Learning’ Program • Performed a look-ahead search from each current position • Saved a description of each board position encountered during play together with its backed-up value determined by the minimax procedure “If the program is now faced with a choice of board positions whose scores differ only by the ply number, it will automatically make the most advantageous choice, choosing a low-ply alternative if winning and a highply alternative if losing" (Samuel, 1959, p. 80). • • Dartmouth Workshop (1956) • • Introduction of the term AI First conference on robotics • LISP (1958) • The first programming language dedicated to AI CIS4330: Professor Kirs Artificial Intelligence Slide 6 AI Background • Dendral (1965) • First (?) Expert System • Chemical analysis of organic compounds using mass spectroscopy • Shakey the Robot (1970) • The first mobile robot using AI Programming • MYCIN (1975) • Once MYCIN determines the most likely cause of infection and accounted for the patient's allergies, it will suggest a course of medication • Uses rules like, 'If the infection is primary bacteriemia, and the site of the culture is one of the sterile sites, and the suspected portal of entry of the organism is the gastrointestinal tract, then there is suggestive evidence that the identity of the organism is bacteriodes." • Because Physician’s Distrusted MYCIN, it was the first ES to provide explanations CIS4330: Professor Kirs Artificial Intelligence Slide 7 AI Background • LISP Machines (LISPM) (C. 1980) • A computer which has been optimized to run lisp efficiently and provide a good environment for programming in it • 1985: Over 100 US Companies offered AI Oriented Technologies for sale • In 1986-87 the demand in AI systems decreased, and the industry lost almost a half of a billion dollars ?? Why the Change ??? • The lack of Application vs. Theory • 1991: Desert Storm • AI-based technologies were used in missile systems, heads-up-displays, and other advancements. • AI once again becomes a “Hot Topic” CIS4330: Professor Kirs Artificial Intelligence Slide 8 AI Background ?? What are Computers Better at than Humans ??? • Fast Calculations • Short-Term Memory (RAM) • Fast Recall • Long-Term Memory • Sequential Processing • Ah ….. Fast Calculations • Ah ….. Fast Calculations • Massive Parallelism • Fault Tolerance • Dealing with Ambiguity • • • • • • • • • • Ah ….. Fast Calculations Ah ….. Fast Calculations Ah ….. Fast Calculations Ah ….. Fast Calculations Ah ….. Fast Calculations Adapting to Circumstances Creativity Learning Associations Procreating -- Alright – That’s pushing it!! You Win!! Human’s are Superior to Computers !! A ‘Typical’ Computer A ‘Typical’ Human CIS4330: Professor Kirs Artificial Intelligence Natural Language Processing (NLP) • Symbolic Manipulation • Uses (Existing and Future): • Information Retrieval (IR) • Internet/Automated Search Engines/Web-Crawlers • Document Classification • Word-Processing Assistance (WP “Wizards”) • Expert Systems • Indexing (Textbook) • Keyword Classification • E-Mail Routing • Extensions: • Voice Response • Voice Recognition Slide 9 CIS4330: Professor Kirs Artificial Intelligence Slide 10 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Suppose you wish to get general information about E-Commerce Articles Retrieved Retrieve all articles having the Key Word “E-Commerce” WSJ MISQ NewsWeek Elle CACM E-Commerce Stocks Down This Week ~~~~~~~~~~~~ ~~~~~~~~~~~~ E-Commerce Strategies ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ E-Commerce: Who’s Using it ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ Buying Clothes at E-Commerce Sites ~~~~~~~~~~~~ ~~~~~~~~~~~~ Designing ECommerce Webs ~~~~~~~~~~~~ ~~~~~~~~~~~~ Where: Useful Articles Unrelated Articles CIS4330: Professor Kirs Artificial Intelligence Slide 11 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Assume we have There are a total of 50 Documents • Of those, assume only 11 are relevant • Recall: The Percentage of Relevant articles found Where are we now?? About 3 of 11 (27%) Available Articles Retrieved CIS4330: Professor Kirs Artificial Intelligence Slide 12 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Assume we have There are a total of 50 Documents • Of those, only 11 are relevant • Recall: The Percentage of relevant articles found • Precision: The Percentage of Useful articles found Where are we now?? About 3 of 9 (33%) Articles retrieved are relevant CIS4330: Professor Kirs Artificial Intelligence Slide 13 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems ?? How do Internet Search Engines Retrieve Documents ??? • “Bag of words” Approach • Count of Simple occurrence frequencies (For listing order) • No attention paid to inter-word relationships • No attempt made to characterize documents • Problems: • Words are ambiguous • Words are used in different forms • Words are used synonymously ?? WHY ??? Can’t the process be improved ??? -- Stay Tuned -- CIS4330: Professor Kirs Artificial Intelligence Slide 14 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: vs. ~~~~~~~ John Smith ~~~~~~~ Olusegun Obasanjo ~~~~~~~~~~ Where are we now?? Over 95% Accuracy CIS4330: Professor Kirs Artificial Intelligence Slide 15 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): ~~~~~~~ John Smith ~~~~~~~ Olusegun Obasanjo ~~~~~~~~~~ ~~ Mr. Obasanjo ~~~~~ The President of Nigeria ~~~~~~~~~~~~ Where are we now?? About 85% Accuracy CIS4330: Professor Kirs Artificial Intelligence Slide 16 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • The bridge of one’s nose • The bridge of a pair of glasses • The bridge over a river • The bridge of a ship • A dental bridge • A guitar bridge • A game of bridge CIS4330: Professor Kirs Artificial Intelligence Slide 17 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • Need to disambiguate relative to: • Hand-constructed Senses (heuristics) • English Dictionaries • Bilingual Dictionaries • Thesauruses CIS4330: Professor Kirs Artificial Intelligence Slide 18 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • • • • Identity Recognition: Associations (Identifying co-referential items): Ambiguous Terminology: Non-contributory Terminology • There is a need to parse terms/phrases to reduce searches • The Information Systems are used …. → Information System • The Information Systems → Information System • Information Systems → Information System • The problem is How to do it CIS4330: Professor Kirs Artificial Intelligence Slide 19 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • • • • Identity Recognition: Associations (Identifying co-referential items): Ambiguous Terminology: Non-contributory Terminology • Some words can readily be eliminated (Stop Words) • a • an • be • is • not • or • the • to • was • were • This can sometimes be problematic: • Search for “IS” (the common initialization for Information Systems) • Search for the phrase “to be or not to be” (from Hamlet) CIS4330: Professor Kirs Artificial Intelligence Slide 20 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • • • • Identity Recognition: Associations (Identifying co-referential items): Ambiguous Terminology: Non-contributory Terminology • Some words/phrases can readily be eliminated (Stop Words) • Prefix/Infix Removal (Stemming) • prefix Removal: megavolt • infix Removal: un-bloody-likely volt unlikely • Still Problematic: • Isn’t megavolt a relevant search term? • Does un-bloody-likely need additional parsing? CIS4330: Professor Kirs Artificial Intelligence Slide 21 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • • • • Identity Recognition: Associations (Identifying co-referential items): Ambiguous Terminology: Non-contributory Terminology • Some words/phrases can readily be eliminated (Stop Words) • Prefix/Infix Removal (Stemming) • Suffix Removal (Stemming) • If a word ends in “ies” but not “eies”, “aies” • Queries • Berries Query Berry • BUT, what about: • Series Sery ??? • Hierarchies • Glossaries “y” Hierarchy Glossary CIS4330: Professor Kirs Artificial Intelligence Slide 22 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • • • • Identity Recognition: Associations (Identifying co-referential items): Ambiguous Terminology: Non-contributory Terminology • Some words/phrases can readily be eliminated (Stop Words) • Prefix/Infix Removal (Stemming) • Suffix Removal (Stemming) • If a word ends in “es” but not “aes”, “ees”, “oes” • Loves • Cares Love Care • Mandates • Envelopes • BUT, what about: • Cactuses Cactuse ??? Mandate Envelope “e” CIS4330: Professor Kirs Artificial Intelligence Slide 23 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • • • • Identity Recognition: Associations (Identifying co-referential items): Ambiguous Terminology: Non-contributory Terminology • Some words/phrases can readily be eliminated (Stop Words) • Prefix/Infix Removal (Stemming) • Suffix Removal (Stemming) • If a word ends in “s” but not “us”, “ss” • Wants • Walks Want Walk • BUT, what about: • Has Ha ??? • Bananas • Maniacs “” (eliminate) Banana Maniac CIS4330: Professor Kirs Artificial Intelligence Slide 24 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • • • • Identity Recognition: Associations (Identifying co-referential items): Ambiguous Terminology: Non-contributory Terminology • Some words/phrases can readily be eliminated (Stop Words) • Prefix/Infix Removal (Stemming) • Suffix Removal (Stemming) • Additional Considerations: • Words ending in “ed”, “ing”, “ational”, “ation”, “able”, “ism”, etc. • Additional Problems: • Bed B ? Be ? • Fling Fl ? • Able • Prism “” ? Pr ? CIS4330: Professor Kirs Artificial Intelligence Slide 25 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Document Classification Problems • The goal is to be able to classify any document: Music Sports Business Mud Slinging • Although it is too often: Other CIS4330: Professor Kirs Artificial Intelligence Slide 26 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Document Classification Problems • Simple Classifications (“Bag of Words” - Webcrawlers) • Documents are scanned for words/phrases • Lists of most frequently occurring words/phrases are maintained • Problems: • Massive Lists needed • Very Slow: How many websites and documents at each site are there? How often are new sites added? How often are documents added to existing sites? How long to determine frequencies? • Spamming: Looking for an article on Jennifer Aniston?? SEX SEXY MONICA LEWINSKY JENNIFER LOPEZ CLAUDIA SCHIFFER CINDY CRAWFORD JENNIFER ANISTON GILLIAN ANDERSON MADONNA NIKI TAYLOR SEXY SEXY ELLE MACPHERSON KATE MOSS CAROL ALT TYRA BANKS FREDERIQUE KATHY IRELAND PAM ANDERSON KAREN MULDER VALERIA MAZZA SHALOM HARLOW AMBER VALLETTA LAETITA CASTA SEXY SEXY BETTIE PAGE HEIDI KLUM PATRICIA FORD DAISY FUENTES KELLY BROOK SEX SEXY MONICA LEWINSKY …….. CIS4330: Professor Kirs Artificial Intelligence Slide 27 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Document Classification Problems • Another approach is to analyze the frequency of words/terms for a given document occurring in each category Music Words 6 Matches • Tempo • Symphony • Volume ••••• Word Counts • Organization (24) • Profit (16) • Volume (12) ••••• Sports Words 12 Matches Problem: • Profit • Stock Value • Assets 42 Matches ••••• M-Sling Words • Baseball Game • Points scored • Teams ••••• Business Words • So’s-your-old-man • You Stink • Liar, Liar Business Document Establishing Category Lists ••••• 22 Matches CIS4330: Professor Kirs Artificial Intelligence Slide 28 Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Document Classification Problems ?? Is it worth it ??? • Research shows it is: • Hull, D.A. (1996): Stemming algorithms: A case study for detailed evaluation, in Journal of the American Society for Information Science, 47(1): 70-84 • Web Search engines almost never use it ?? WHY ??? • Time • Lack of Consistency (So Far) • Complexity • User Expectations • Cost • Foreign Language Usage CIS4330: Professor Kirs Artificial Intelligence ????????????? Any Questions (Please !!!) ????????????? Slide 29 CIS4330: Professor Kirs Artificial Intelligence Slide 30