Voice Recognition

advertisement
Syllabus for AI
2nd Mst notes
Expert system from ES.pdf file from slides till page 8
Rest of the topics are listed below
Voice Recognition
AI continues to become increasingly a part of our daily lives. Voice recognition is is a key enabler
of human-machine interface. Although we often talk (or curse) at our computers, the thought of
them talking back to us is somewhat disturbing. The ultimate computer with artificial intelligence
was the HAL computer in the science fiction classic.
Speech recognition (also known as automatic speech recognition or computer speech
recognition) converts spoken words to text. The term "voice recognition" is sometimes used to
refer to recognition systems that must be trained to a particular speaker—as is the case for
most desktop recognition software. Recognizing the speaker can simplify the task of translating
speech.
Speech recognition is a broader solution which refers to technology that can recognize speech
without being targeted at single speaker—such as a call center system that can recognize
arbitrary voices.
Speech recognition applications include voice user interfaces such as voice dialing (e.g., "Call
home"), call routing (e.g., "I would like to make a collect call"), domotic appliance control, search
(e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a
credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text
processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).
Applications
Health care
In the health care domain, even in the wake of improving speech recognition technologies,
medical transcriptionists (MTs) have not yet become obsolete. The services provided may be
redistributed rather than replaced. Speech recognition can be implemented in front-end or backend of the medical documentation process.
Military
High-performance fighter aircraft
Substantial efforts have been devoted in the last decade to the test and evaluation of speech
recognition in fighter aircraft.
Helicopters
The problems of achieving high recognition accuracy under stress and noise pertain strongly to
the helicopter environment as well as to the fighter environment.
Training air traffic controllers
Training for air traffic controllers (ATC) represents an excellent application for speech recognition
systems. Many ATC training systems currently require a person to act as a "pseudo-pilot",
engaging in a voice dialog with the trainee controller, which simulates the dialog which the
controller would have to conduct with pilots in a real ATC situation.
Telephony and other domains
ASR in the field of telephony is now commonplace and in the field of computer gaming and
simulation is becoming more widespread. Despite the high level of integration with word
processing in general personal computing, however, ASR in the field of document production has
not seen the expected increases in use.
Further applications
• Automatic translation;
• Automotive speech recognition (e.g., Ford Sync);
• Robotics;
• Video games, with Tom Clancy's EndWar and Lifeline as working examples;
• Transcription (digital speech-to-text);
• Speech-to-text (transcription of speech into mobile text messages);
• Air Traffic Control Speech Recognition
• Home automation;
• Interactive voice response;
Pattern Recognition
In machine learning, pattern recognition is the assignment of some sort of output value (or
label) to a given input value (or instance), according to some specific algorithm. An example of
pattern recognition is classification, which attempts to assign each input value to one of a given
set of classes (for example, determine whether a given email is "spam" or "non-spam"). However,
pattern recognition is a more general problem that encompasses other types of output as well.
Other examples are regression, which assigns a real-valued output to each input; sequence
labeling, which assigns a class to each member of a sequence of values (for example, part of
speech tagging, which assigns a part of speech to each word in an input sentence); and parsing,
which assigns a parse tree to an input sentence, describing the syntactic structure of the
sentence.
Pattern recognition algorithms generally aim to provide a reasonable answer for all possible
inputs and to do "fuzzy" matching of inputs. This is opposed to pattern matching algorithms,
which look for exact matches in the input with pre-existing patterns. A common example of a
pattern-matching algorithm is regular expression matching, which looks for patterns of a given
sort in textual data and is included in the search capabilities of many text editors and word
processors. In contrast to pattern recognition, pattern matching is generally not considered a type
of machine learning, although pattern-matching algorithms (especially with fairly general, carefully
tailored patterns) can sometimes succeed in providing similar-quality output to the sort provided
by pattern-recognition algorithms.
Pattern recognition is generally categorized according to the type of learning procedure used to
generate the output value. Supervised learning assumes that a set of training data (the training
set) has been provided, consisting of a set of instances that have been properly labeled by hand
with the correct output. A learning procedure then generates a model that attempts to meet two
sometimes conflicting objectives: Perform as well as possible on the training data, and generalize
as well as possible to new data (usually, this means being as simple as possible, for some
technical definition of "simple", in accordance with Occam's Razor). Unsupervised learning, on
the other hand, assumes training data that has not been hand-labeled, and attempts to find
inherent patterns in the data that can then be used to determine the correct output value for new
data instances. A combination of the two that has recently been explored is semi-supervised
learning, which uses a combination of labeled and unlabeled data (typically a small set of labeled
data combined with a large amount of unlabeled data). Note that in cases of unsupervised
learning, there may be no training data at all to speak of; in other words, the data to be labeled is
the training data.
Uses
Within medical science, pattern recognition is the basis for computer-aided diagnosis (CAD)
systems. CAD describes a procedure that supports the doctor's interpretations and findings.
Typical applications are automatic speech recognition, classification of text into several
categories (e.g. spam/non-spam email messages), the automatic recognition of handwritten
postal codes on postal envelopes, or the automatic recognition of images of human faces. The
last two examples form the subtopic image analysis of pattern recognition that deals with digital
images as input to pattern recognition systems.
Features of AI
The general problem of simulating (or creating) intelligence has been broken down into a number
of specific sub-problems. These consist of particular traits or capabilities that researchers would
like an intelligent system to display. The traits described below have received the most attention.
Deduction, reasoning, problem solving
Early AI researchers developed algorithms that imitated the step-by-step reasoning that humans
were often assumed to use when they solve puzzles, play board games or make logical
deductions.[39] By the late 1980s and '90s, AI research had also developed highly successful
methods for dealing with uncertain or incomplete information, employing concepts from
probability and economics.
Human beings solve most of their problems using fast, intuitive judgments rather than the
conscious, step-by-step deduction that early AI research was able to model. AI has made some
progress at imitating this kind of "sub-symbolic" problem solving: embodied agent approaches
emphasize the importance of sensorimotor skills to higher reasoning; neural net research
attempts to simulate the structures inside human and animal brains that give rise to this skill.
Knowledge representation
Knowledge representation[43] and knowledge engineering[44] are central to AI research. Many of
the problems machines are expected to solve will require extensive knowledge about the world.
Among the things that AI needs to represent are: objects, properties, categories and relations
between objects;[45] situations, events, states and time;[46] causes and effects;[47] knowledge about
knowledge (what we know about what other people know);[48] and many other, less well
researched domainsAmong the most difficult problems in knowledge representation are:
1 Default reasoning and the qualification problem
Many of the things people know take the form of "working assumptions." For example, if
a bird comes up in conversation, people typically picture an animal that is fist sized,
sings, and flies. None of these things are true about all birds.
2 The breadth of commonsense knowledge
The number of atomic facts that the average person knows is astronomical. A major goal
is to have the computer understand enough concepts to be able to learn by reading from
sources like the internet, and thus be able to add to its own ontology. [citation needed]
3 The subsymbolic form of some commonsense knowledge
Much of what people know is not represented as "facts" or "statements" that they could
express verbally. For example, a chess master will avoid a particular chess position
because it "feels too exposed"[53] or an art critic can take one look at a statue and
instantly realize that it is a fake.[54]
Planning
Intelligent agents must be able to set goals and achieve them.[56] They need a way to visualize
the future (they must have a representation of the state of the world and be able to make
predictions about how their actions will change it) and be able to make choices that maximize the
utility (or "value") of the available choices.[57]
Learning
Machine learning[61] has been central to AI research from the beginning.[62] Unsupervised learning
is the ability to find patterns in a stream of input. Supervised learning includes both classification
and numerical regression. Classification is used to determine what category something belongs
in, after seeing a number of examples of things from several categories. Regression takes a set
of numerical input/output examples and attempts to discover a continuous function that would
generate the outputs from the inputs. In reinforcement learning[63] the agent is rewarded for good
responses and punished for bad ones. These can be analyzed in terms of decision theory, using
concepts like utility. The mathematical analysis of machine learning algorithms and their
performance is a branch of theoretical computer science known as computational learning theory.
Natural language processing
ASIMO uses sensors and intelligent algorithms to avoid obstacles and navigate stairs.
Natural language processing[64] gives machines the ability to read and understand the languages
that humans speak. Many researchers hope that a sufficiently powerful natural language
processing system would be able to acquire knowledge on its own, by reading the existing text
available over the internet. Some straightforward applications of natural language processing
include information retrieval (or text mining) and machine translation.[65]
Motion and manipulation
The field of robotics[66] is closely related to AI. Intelligence is required for robots to be able to
handle such tasks as object manipulation[67] and navigation, with sub-problems of localization
(knowing where you are), mapping (learning what is around you) and motion planning (figuring
out how to get there).[68]
Perception
Machine perception[69] is the ability to use input from sensors (such as cameras, microphones,
sonar and others more exotic) to deduce aspects of the world. Computer vision[70] is the ability to
analyze visual input. A few selected subproblems are speech recognition,[71] facial recognition
and object recognition.[72]
Social intelligence
Emotion and social skills[73] play two roles for an intelligent agent. First, it must be able to predict
the actions of others, by understanding their motives and emotional states. (This involves
elements of game theory, decision theory, as well as the ability to model human emotions and the
perceptual skills to detect emotions.) Also, for good human-computer interaction, an intelligent
machine also needs to display emotions. At the very least it must appear polite and sensitive to
the humans it interacts with. At best, it should have normal emotions itself.
Creativity
A sub-field of AI addresses creativity both theoretically (from a philosophical and psychological
perspective) and practically (via specific implementations of systems that generate outputs that
can be considered creative, or systems that identify and assess creativity). A related area of
computational research is Artificial Intuition and Artificial Imagination.
General intelligence
Most researchers hope that their work will eventually be incorporated into a machine with general
intelligence (known as strong AI), combining all the skills above and exceeding human abilities at
most or all of them.[12] A few believe that anthropomorphic features like artificial consciousness or
an artificial brain may be required for such a project.[74]
Difference between AI and traditional programming
Artificial Intelligence
*primary symbolic process
*Heuristic search
-steps are implicit
*Control structure usually separate from the domain knowledge
*usually easy to modify update and enlarge
*some incorrect answers are tolerable
*satisfactory answers usually acceptable
Conventional Programming
*numeric
*Algorithmic--steps are explicit
*information and control are integrated together
*difficult to modify
*correct answers are required
*best possible solution usually sought
Travelling Salesman problem
The Travelling Salesman Problem (TSP) is an NP-hard problem in combinatorial optimization
studied in operations research and theoretical computer science. Given a list of cities and their
pairwise distances, the task is to find a shortest possible tour that visits each city exactly once.
“A salesman has a list of cities, each of which he must visit exactly once. There are direct roads
between each pair of cities on the list. Find the route the salesman should follow for the shortest
possible round trip that both starts and finishes at any one of the cities.”
Nearest neighbour heuristic:
1. Select a starting city.
2. Select the one closest to the current city.
3. Repeat step 2 until all cities have been visited.
TSP can be modeled as an undirected weighted graph, such that cities are the graph's vertices,
paths are the graph's edges, and a path's distance is the edge's length. A TSP tour becomes a
Hamiltonian cycle, and the optimal TSP tour is the shortest Hamiltonian cycle. Often, the model is
a complete graph (i.e., an edge connects each pair of vertices). If no path exists between two
cities, adding an arbitrarily long edge will complete the graph without affecting the optimal tour.
The most direct solution would be to try all permutations (ordered combinations) and see which
one is cheapest (using brute force search). The running time for this approach lies within a
polynomial factor of O(n!), the factorial of the number of cities, so this solution becomes
impractical even for only 20 cities. One of the earliest applications of dynamic programming is an
algorithm that solves the problem in time O(n22n).[14]
The dynamic programming solution requires exponential space. Using inclusion–exclusion, the
problem can be solved in time within a polynomial factor of 2n and polynomial space.[15]
Improving these time bounds seems to be difficult. For example, it is an open problem if there
exists an exact algorithm for TSP that runs in time O(1.9999n)[16]
Other approaches include:

Various branch-and-bound algorithms, which can be used to process TSPs containing
40–60 cities.

Progressive improvement algorithms which use techniques reminiscent of linear
programming. Works well for up to 200 cities.

Implementations of branch-and-bound and problem-specific cut generation; this is the
method of choice for solving large instances. This approach holds the current record,
solving an instance with 85,900 cities,
Asymmetric TSP
In most cases, the distance between two nodes in the TSP network is the same in both
directions. The case where the distance from A to B is not equal to the distance from B to A is
called asymmetric TSP. A practical application of an asymmetric TSP is route optimisation using
street-level routing (asymmetric due to one-way streets, slip-roads and motorways).
[edit] Solving by conversion to Symmetric TSP
Solving an asymmetric TSP graph can be somewhat complex. The following is a 3x3 matrix
containing all possible path weights between the nodes A, B and C. One option is to turn an
asymmetric matrix of size N into a symmetric matrix of size 2N, doubling the complexity.
Asymmetric Path Weights
A
B
C
A
1
2
B
6
3
C
5
4
To double the size, each of the nodes in the graph is duplicated, creating a second ghost node.
Using duplicate points with very low weights, such as −∞, provides a cheap route "linking" back to
the real node and allowing symmetric evaluation to continue. The original 3×3 matrix shown
above is visible in the bottom left and the inverse of the original in the top-right. Both copies of the
matrix have had their diagonals replaced by the low-cost hop paths, represented by −∞.
Symmetric Path Weights
A B C A' B' C'
A
−∞ 6 5
B
1 −∞ 4
C
2 3 −∞
A' −∞ 1 2
B' 6 −∞ 3
C' 5 4 −∞
The original 3x3 matrix would produce two Hamiltonian cycles (a path that visits every node
once), namely A-B-C-A [score 9] and A-C-B-A [score 12]. Evaluating the 6x6 symmetric version
of the same problem now produces many paths, including A-A'-B-B'-C-C'-A, A-B'-C-A'-A, A-A'-BC'-A [all score 9–∞].
The important thing about each new sequence is that there will be an alternation between dashed
(A',B',C') and un-dashed nodes (A,B,C) and that the link to "jump" between any related pair (A-A')
is effectively free. A version of the algorithm could use any weight for the A-A' path, as long as
that weight is lower than all other path weights present in the graph. As the path weight to "jump"
must effectively be "free", the value zero (0) could be used to represent this cost — if zero is not
being used for another purpose already (such as designating invalid paths). In the two examples
above, non-existent paths between nodes are shown as a blank square.
Logic
Logic[106] is used for knowledge representation and problem solving, but it can be applied to other
problems as well. For example, the satplan algorithm uses logic for planning[107] and inductive
logic programming is a method for learning.[108]
Several different forms of logic are used in AI research.
Propositional or sentential logic[109] is the logic of statements which can be true or false.
First-order logic[110] also allows the use of quantifiers and predicates, and can express facts about
objects, their properties, and their relations with each other.
Fuzzy logic,[111] is a version of first-order logic which allows the truth of a statement to be
represented as a value between 0 and 1, rather than simply True (1) or False (0). Fuzzy systems
can be used for uncertain reasoning and have been widely used in modern industrial and
consumer product control systems.
Subjective logic models uncertainty in a different and more explicit manner than fuzzy-logic: a
given binomial opinion satisfies belief + disbelief + uncertainty = 1 within a Beta distribution. By
this method, ignorance can be distinguished from probabilistic statements that an agent makes
with high confidence.
Default logics, non-monotonic logics and circumscription[51] are forms of logic designed to help
with default reasoning and the qualification problem.
Several extensions of logic have been designed to handle specific domains of knowledge, such
as: description logics;[45] situation calculus, event calculus and fluent calculus (for representing
events and time);[46] causal calculus;[47] belief calculus; and modal logics.[48
Expert Systems:
An expert system is a computer application that performs a task that would otherwise be
performed by a human expert. For example, there are expert systems that can diagnose human
illnesses, make financial forecasts, and schedule routes for delivery vehicles. Some expert
systems are designed to take the place of human experts, while others are designed to aid them.
To design an expert system, one needs a knowledge engineer, an individual who studies how
human experts make decisions and translates the rules into terms that a computer can
understand.
Neural Networks:
Neural Networks are systems that simulate intelligence by attempting to reproduce the types of
physical connections that occur in animal brains.
A type of artificial intelligence that attempts to imitate the way a human brain works. Rather than
using a digital model, in which all computations manipulate zeros and ones, a neural network
works by creating connections between processing elements, the computer equivalent of
neurons. The organization and weights of the connections determine the output.
Neural networks are particularly effective for predicting events when the networks have a large
database of prior examples to draw on. Strictly speaking, a neural network implies a non-digital
computer, but neural networks can be simulated on digital computers.
Neural networks are an exciting technology with the potential to change the way we solve "realworld" problems in science, engineering and economics.
Today, the hottest area of artificial intelligence is neural networks, which are proving successful in
a number of disciplines such as voice recognition and natural-language processing.
The pros:
1
Utility for a variety of pattern recognition tasks (handwriting, spoken word, face, etc.)
2
Simplicity of programming (reinforcement learning)
3
An elegant model of emergence in programming:
the idea that large numbers of simple programming elements are collectively capable of
interesting, higher-level behaviors.
The cons:
1
A vague feeling that we haven’t really learned anything about these pattern recognition
skills
2
Neural nets seem to be a far less natural means for demonstrating “explicit rule learning”
Download