Lecture Notes 2 - Expert Systems

advertisement
CIS4330: Professor Kirs
Artificial Intelligence
An Overview of
Artificial Intelligence
Slide 1
CIS4330: Professor Kirs
Artificial Intelligence
Slide 2
TOPICS
 AI Background
• How did we get here and Why?
 Natural Language Processing (NLP)
• How do you deal with Symbolic Representations?
 Neural Networks
• How can machines be made to emulate humans?
CIS4330: Professor Kirs
Artificial Intelligence
Slide 3
AI Background
• People have always been fascinated with giving
machines human Abilities
•
•
C. 270 BC: An Greek engineer named Ctesibus made organs and water
clocks with movable figures.
Jacques de Vaucanson (1709 -1782) created a mechanical duck that ate
and drank with realistic motions of head and throat, produced the sound of
quacking, and could pick up cornmeal and swallow, digest, and excrete it.
•
Mary Shelley’s Book Frankenstein (1818)
•
1921: R.U.R. (Rossum's Universal Robots): A Play
by Karel Capek
• “Robot" comes from the Czech word "robota"
(forced labor)
•
The Movie Frankenstein (1931)
•
Science fiction writer Isaac Asimov first used the word
"robotics" to describe the technology of robots and
predicted the rise of a powerful robot industry (1941)
Robot from the 1921
play "R.U.R."
CIS4330: Professor Kirs
Artificial Intelligence
Slide 4
AI Background
• People have always tried (unsuccessfully) to figure out
how the brain works
•
McCulloch and Pitts (1943) developed a (workable) mathematical model of
brain (networks of neurons) functioning (Binary, Since firing is an ‘all-ornone’ process)
• Influenced John von Neumann (1945: Stored Programs)
• Led to the use of Neural Networks (discussed later)
• Encouraged the development of Perceptrons (Learning Systems)
• Turing Test (1950)
• Newell and Simon (1954)
•
conceived of using computer programming language to build theories of
human symbolic behavior
•
showed how a wide range of cognitive processes in problem solving and
problem understanding can be explained in information-processing terms
and modeled with computer programs.
CIS4330: Professor Kirs
Artificial Intelligence
Slide 5
AI Background
• Arthur Samuel’s Checker Program (1955)
•
First ‘Learning’ Program
•
Performed a look-ahead search from each current position
•
Saved a description of each board position encountered during play
together with its backed-up value determined by the minimax procedure
“If the program is now faced with a choice of board positions whose scores
differ only by the ply number, it will automatically make the most
advantageous choice, choosing a low-ply alternative if winning and a highply alternative if losing" (Samuel, 1959, p. 80).
•
• Dartmouth Workshop (1956)
•
•
Introduction of the term AI
First conference on robotics
• LISP (1958)
•
The first programming language dedicated to AI
CIS4330: Professor Kirs
Artificial Intelligence
Slide 6
AI Background
• Dendral (1965)
•
First (?) Expert System
•
Chemical analysis of organic compounds using mass spectroscopy
• Shakey the Robot (1970)
• The first mobile robot using AI Programming
• MYCIN (1975)
•
Once MYCIN determines the most likely cause of
infection and accounted for the patient's allergies,
it will suggest a course of medication
•
Uses rules like, 'If the infection is primary bacteriemia, and the site of the
culture is one of the sterile sites, and the suspected portal of entry of the
organism is the gastrointestinal tract, then there is suggestive evidence that
the identity of the organism is bacteriodes."
•
Because Physician’s Distrusted MYCIN, it was the first ES to provide
explanations
CIS4330: Professor Kirs
Artificial Intelligence
Slide 7
AI Background
• LISP Machines (LISPM) (C. 1980)
•
A computer which has been optimized to run lisp
efficiently and provide a good environment for
programming in it
• 1985: Over 100 US Companies offered AI
Oriented Technologies for sale
• In 1986-87 the demand in AI systems decreased, and the
industry lost almost a half of a billion dollars
?? Why the Change ???
•
The lack of Application vs. Theory
• 1991: Desert Storm
•
AI-based technologies were used in missile systems, heads-up-displays,
and other advancements.
•
AI once again becomes a “Hot Topic”
CIS4330: Professor Kirs
Artificial Intelligence
Slide 8
AI Background
?? What are Computers Better at than Humans ???
• Fast Calculations
• Short-Term Memory (RAM)
• Fast Recall
• Long-Term Memory
• Sequential Processing
• Ah ….. Fast Calculations
• Ah ….. Fast Calculations
• Massive Parallelism
• Fault Tolerance
• Dealing with Ambiguity
•
•
•
•
•
•
•
•
•
•
Ah ….. Fast Calculations
Ah ….. Fast Calculations
Ah ….. Fast Calculations
Ah ….. Fast Calculations
Ah ….. Fast Calculations
Adapting to Circumstances
Creativity
Learning
Associations
Procreating
-- Alright – That’s pushing it!! You Win!! Human’s are Superior to Computers !!
A ‘Typical’ Computer
A ‘Typical’ Human
CIS4330: Professor Kirs
Artificial Intelligence
Natural Language Processing (NLP)
• Symbolic Manipulation
• Uses (Existing and Future):
• Information Retrieval (IR)
• Internet/Automated Search Engines/Web-Crawlers
• Document Classification
• Word-Processing Assistance (WP “Wizards”)
• Expert Systems
• Indexing (Textbook)
• Keyword Classification
• E-Mail Routing
• Extensions:
• Voice Response
• Voice Recognition
Slide 9
CIS4330: Professor Kirs
Artificial Intelligence
Slide 10
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Suppose you wish to get general information about E-Commerce
Articles
Retrieved
Retrieve all
articles having
the Key Word
“E-Commerce”
WSJ
MISQ
NewsWeek
Elle
CACM
E-Commerce
Stocks Down
This Week
~~~~~~~~~~~~
~~~~~~~~~~~~
E-Commerce
Strategies
~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~
E-Commerce:
Who’s Using it
~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~
Buying Clothes
at E-Commerce
Sites
~~~~~~~~~~~~
~~~~~~~~~~~~
Designing ECommerce
Webs
~~~~~~~~~~~~
~~~~~~~~~~~~
Where:
Useful Articles
Unrelated Articles
CIS4330: Professor Kirs
Artificial Intelligence
Slide 11
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Assume we have There are a total of 50 Documents
• Of those, assume only 11 are relevant
• Recall: The Percentage of Relevant articles found
Where are we now?? About 3 of 11 (27%) Available Articles Retrieved
CIS4330: Professor Kirs
Artificial Intelligence
Slide 12
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Assume we have There are a total of 50 Documents
• Of those, only 11 are relevant
• Recall: The Percentage of relevant articles found
• Precision: The Percentage of Useful articles found
Where are we now?? About 3 of 9 (33%) Articles retrieved are relevant
CIS4330: Professor Kirs
Artificial Intelligence
Slide 13
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
?? How do Internet Search Engines Retrieve Documents ???
• “Bag of words” Approach
• Count of Simple occurrence frequencies (For listing order)
• No attention paid to inter-word relationships
• No attempt made to characterize documents
• Problems:
• Words are ambiguous
• Words are used in different forms
• Words are used synonymously
?? WHY ??? Can’t the process be improved ???
-- Stay Tuned --
CIS4330: Professor Kirs
Artificial Intelligence
Slide 14
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
• Identity Recognition:
vs.
~~~~~~~ John Smith ~~~~~~~ Olusegun Obasanjo ~~~~~~~~~~
Where are we now?? Over 95% Accuracy
CIS4330: Professor Kirs
Artificial Intelligence
Slide 15
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
• Identity Recognition:
• Associations (Identifying co-referential items):
~~~~~~~ John Smith ~~~~~~~ Olusegun Obasanjo ~~~~~~~~~~
~~ Mr. Obasanjo ~~~~~ The President of Nigeria ~~~~~~~~~~~~
Where are we now?? About 85% Accuracy
CIS4330: Professor Kirs
Artificial Intelligence
Slide 16
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
• Identity Recognition:
• Associations (Identifying co-referential items):
• Ambiguous Terminology:
• The bridge of one’s nose
• The bridge of a pair of glasses
• The bridge over a river
• The bridge of a ship
• A dental bridge
• A guitar bridge
• A game of bridge
CIS4330: Professor Kirs
Artificial Intelligence
Slide 17
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
• Identity Recognition:
• Associations (Identifying co-referential items):
• Ambiguous Terminology:
• Need to disambiguate relative to:
• Hand-constructed Senses (heuristics)
• English Dictionaries
• Bilingual Dictionaries
• Thesauruses
CIS4330: Professor Kirs
Artificial Intelligence
Slide 18
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
•
•
•
•
Identity Recognition:
Associations (Identifying co-referential items):
Ambiguous Terminology:
Non-contributory Terminology
• There is a need to parse terms/phrases to reduce searches
• The Information Systems are used …. → Information System
• The Information Systems → Information System
• Information Systems → Information System
• The problem is How to do it
CIS4330: Professor Kirs
Artificial Intelligence
Slide 19
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
•
•
•
•
Identity Recognition:
Associations (Identifying co-referential items):
Ambiguous Terminology:
Non-contributory Terminology
• Some words can readily be eliminated (Stop Words)
• a
• an
• be
• is
• not
• or
• the
• to
• was
• were
• This can sometimes be problematic:
• Search for “IS” (the common initialization for Information
Systems)
• Search for the phrase “to be or not to be” (from Hamlet)
CIS4330: Professor Kirs
Artificial Intelligence
Slide 20
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
•
•
•
•
Identity Recognition:
Associations (Identifying co-referential items):
Ambiguous Terminology:
Non-contributory Terminology
• Some words/phrases can readily be eliminated (Stop Words)
• Prefix/Infix Removal (Stemming)
• prefix Removal: megavolt
• infix Removal: un-bloody-likely
volt
unlikely
• Still Problematic:
• Isn’t megavolt a relevant search term?
• Does un-bloody-likely need additional parsing?
CIS4330: Professor Kirs
Artificial Intelligence
Slide 21
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
•
•
•
•
Identity Recognition:
Associations (Identifying co-referential items):
Ambiguous Terminology:
Non-contributory Terminology
• Some words/phrases can readily be eliminated (Stop Words)
• Prefix/Infix Removal (Stemming)
• Suffix Removal (Stemming)
• If a word ends in “ies” but not “eies”, “aies”
• Queries
• Berries
Query
Berry
• BUT, what about:
• Series
Sery ???
• Hierarchies
• Glossaries
“y”
Hierarchy
Glossary
CIS4330: Professor Kirs
Artificial Intelligence
Slide 22
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
•
•
•
•
Identity Recognition:
Associations (Identifying co-referential items):
Ambiguous Terminology:
Non-contributory Terminology
• Some words/phrases can readily be eliminated (Stop Words)
• Prefix/Infix Removal (Stemming)
• Suffix Removal (Stemming)
• If a word ends in “es” but not “aes”, “ees”, “oes”
• Loves
• Cares
Love
Care
• Mandates
• Envelopes
• BUT, what about:
• Cactuses
Cactuse ???
Mandate
Envelope
“e”
CIS4330: Professor Kirs
Artificial Intelligence
Slide 23
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
•
•
•
•
Identity Recognition:
Associations (Identifying co-referential items):
Ambiguous Terminology:
Non-contributory Terminology
• Some words/phrases can readily be eliminated (Stop Words)
• Prefix/Infix Removal (Stemming)
• Suffix Removal (Stemming)
• If a word ends in “s” but not “us”, “ss”
• Wants
• Walks
Want
Walk
• BUT, what about:
• Has
Ha ???
• Bananas
• Maniacs
“” (eliminate)
Banana
Maniac
CIS4330: Professor Kirs
Artificial Intelligence
Slide 24
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
•
•
•
•
Identity Recognition:
Associations (Identifying co-referential items):
Ambiguous Terminology:
Non-contributory Terminology
• Some words/phrases can readily be eliminated (Stop Words)
• Prefix/Infix Removal (Stemming)
• Suffix Removal (Stemming)
• Additional Considerations:
• Words ending in “ed”, “ing”, “ational”, “ation”, “able”, “ism”, etc.
• Additional Problems:
• Bed
B ? Be ?
• Fling
Fl ?
• Able
• Prism
“” ?
Pr ?
CIS4330: Professor Kirs
Artificial Intelligence
Slide 25
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
• Document Classification Problems
• The goal is to be able to classify any document:
Music
Sports
Business
Mud Slinging
• Although it is too often:
Other
CIS4330: Professor Kirs
Artificial Intelligence
Slide 26
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
• Document Classification Problems
• Simple Classifications (“Bag of Words” - Webcrawlers)
• Documents are scanned for words/phrases
• Lists of most frequently occurring words/phrases are maintained
• Problems:
• Massive Lists needed
• Very Slow: How many websites and documents at each site are
there? How often are new sites added? How often are documents
added to existing sites? How long to determine frequencies?
• Spamming: Looking for an article on Jennifer Aniston??
SEX SEXY MONICA LEWINSKY JENNIFER LOPEZ CLAUDIA SCHIFFER CINDY CRAWFORD JENNIFER ANISTON GILLIAN ANDERSON
MADONNA NIKI TAYLOR SEXY SEXY ELLE MACPHERSON KATE MOSS CAROL ALT TYRA BANKS FREDERIQUE KATHY IRELAND
PAM ANDERSON KAREN MULDER VALERIA MAZZA SHALOM HARLOW AMBER VALLETTA LAETITA CASTA SEXY SEXY BETTIE
PAGE HEIDI KLUM PATRICIA FORD DAISY FUENTES KELLY BROOK SEX SEXY MONICA LEWINSKY ……..
CIS4330: Professor Kirs
Artificial Intelligence
Slide 27
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
• Document Classification Problems
• Another approach is to analyze the frequency of words/terms for
a given document occurring in each category
Music Words
6 Matches
• Tempo
• Symphony
• Volume
•••••
Word Counts
• Organization (24)
• Profit (16)
• Volume (12)
•••••
Sports Words
12 Matches
Problem:
• Profit
• Stock Value
• Assets
42 Matches
•••••
M-Sling Words
• Baseball Game
• Points scored
• Teams
•••••
Business Words
• So’s-your-old-man
• You Stink
• Liar, Liar
Business Document
Establishing Category Lists
•••••
22 Matches
CIS4330: Professor Kirs
Artificial Intelligence
Slide 28
Natural Language Processing (NLP)
• Underlying Problems
• Document Retrieval Problems
• Document Analysis Problems
• Document Classification Problems
?? Is it worth it ???
• Research shows it is:
•
Hull, D.A. (1996): Stemming algorithms: A case study for detailed evaluation, in Journal of
the American Society for Information Science, 47(1): 70-84
• Web Search engines almost never use it
?? WHY ???
• Time
• Lack of Consistency (So Far)
• Complexity
• User Expectations
• Cost
• Foreign Language Usage
CIS4330: Professor Kirs
Artificial Intelligence
?????????????
Any Questions
(Please !!!)
?????????????
Slide 29
CIS4330: Professor Kirs
Artificial Intelligence
Slide 30
Download