Uploaded by visha9770

NLP-Unit I

advertisement
Walchand Institute of Technology, Solapur Honors in Artificial
Intelligence and Machine Learning
T.Y.B.Tech.(Computer Science and Engineering), SemesterVI
21CSU6HM3 : NATURAL LANGUAGE PROCESSING
Teaching Scheme
Examination Scheme
Lecture: 3 Hours /Week,
ESE – 60 Marks
Practical: 2 Hours/Week,
ISE – 40 Marks
ICA - 25 Marks
Natural Language Processing (NLP)
Introduction
• Natural Language Processing (NLP) is basically how you can
teach machines to understand human languages and extract
meaning from text.
• This course is intended as a theoretical and methodological
introduction to a the most widely used and effective current
techniques, strategies and toolkits for natural language
processing.
• This course also covers basis of syntax, semantic analysis and
discourse analysis and drives it to machine translation.
• Pre-requisite: A basic course in a object-oriented programming
Language, Theory of computation and parsers.
COURSE OUTCOMES: At the end of the course students will be
able to
1. Understand the fundamentals of Natural Language Processing.
2. Analyze how the words are formed morphologically and how
they are related to each others.
3. Develop strategies for language modeling, syntax and semantic
analysis.
4. Design and implement and analyze the Natural Language
Processing algorithms for real word
applications.
Natural Language Processing
Section I
Unit I: Introduction
(6)
Introduction to NLP, Machine Learning and NLP, why
NLP is hard? Programming languages Vs
Natural Languages, Are natural languages regular?
Finite automata for NLP, Stages of NLP,
challenges (Open Problems) in NLP.
Basics of Text Processing: Tokenization, Stemming,
Lemmatization, Part of Speech Tagging.
Introduction to NLP
•Natural Language Processing (NLP) is a branch of computer
science and artificial intelligence that deals with the interaction
between computers and humans using natural language.
• The goal of NLP is to enable computers to understand,
interpret, and generate human language in a way that is both
effective and efficient.
Introduction to NLP
Natural Language Processing (NLP) is the study of the
computational treatment of natural (human) language.
In other words, teaching computers how to understand (and
generate) human language.
It is field of Computer Science, Artificial Intelligence and
Computational Linguistics.
Natural language processing systems take strings of words
(sentences) as their input and produce structured
representations capturing the meaning of those strings as
their output. The nature of this output depends heavily on
the task at hand.
NLP has a wide range of applications
Chatbots and virtual assistants: Chatbots and virtual assistants use
NLP to understand natural language and provide human-like
responses to queries and requests.
Sentiment analysis: NLP can be used to analyses social media
posts, customer reviews, and other types of text to determine the
sentiment of the writer.
Machine translation: NLP is used to translate text from one
language to another. Machine translation is used in various
applications, including Google Translate and Microsoft Translator.
Text summarization: NLP can be used to summarize lengthy
documents or articles, making it easier for people to digest large
amounts of information.
Named entity recognition: NLP can be used to automatically
identify and classify named entities such as people, places, and
organizations within a text.
Speech recognition: NLP can be used to recognize and transcribe
spoken language, enabling applications such as virtual assistants
and speech-to-text systems.
Text classification: NLP can be used to classify text into
different categories, such as spam or non-spam emails, or
positive and negative sentiment in customer reviews.
Question answering: NLP can be used to answer questions posed
by users, such as on search engines or in chatbots.
Automatic summarization: NLP can be used to automatically
summarize news articles or other long-form content, providing
users with a quick overview of the content.
Language modelling: NLP can be used to create models of how
language is used in a particular domain, such as legal or medical
language, and can be used to generate new text in that domain.
Natural Language Generation: Natural Language Generation
(NLG) is the process of generating human-like language from
structured data or machine representations. It is used in
applications such as chatbots and automated report generation.
Perpectivising NLP: Areas of AI and
their inter-dependencies
Search
Logic
Machine
Learning
NLP
Vision
Knowledge
Representation
Planning
Robotics
AI is the forcing function for Computer Science
Expert
Systems
What is NLP ?
Branch of AI:
2 Goals
Science Goal: Understand the way language operates.
Engineering Goal: Build systems that analyze and generate
language; reduce the man machine gap.
Machine Learning (ML) and NLP
• Machine Learning (ML) is a subfield of artificial intelligence
that focuses on building systems that can learn and improve
from experience without being explicitly programmed.
• NLP and ML are closely related because ML algorithms are
often used in NLP applications to automatically learn patterns
and relationships in language data.
Some common ML techniques used in NLP include:
1.Supervised learning: A machine learning model is trained on a
labeled dataset, where each example is labeled with the correct
output.
For example, a sentiment analysis model can be trained on a
dataset of customer reviews, where each review is labeled as
positive or negative. Once trained, the model can predict the
sentiment of new, unlabeled reviews.
2.Unsupervised learning: a machine learning model is trained
on an unlabeled dataset, and the goal is to learn patterns and
relationships in the data without any specific guidance.
For example, a clustering algorithm can be used to group similar
documents together based on their content.
3.Semi-supervised learning: A machine learning model is
trained on a combination of labeled and unlabeled data. This
approach can be useful when labeled data is scarce or expensive
to obtain.
4.Deep learning: It is a subfield of ML that uses neural
networks with multiple layers to learn complex patterns in data.
Deep learning has been particularly
successful in NLP
applications, such as machine translation and text classification.
• NLP and ML have led to significant advances in natural
language understanding and
communication, and have
numerous practical applications in industries such as
healthcare, finance, and marketing.
Machine learning (ML) and natural language processing (NLP)
are two closely related fields that are often used together in
applications such as speech recognition, language translation, and
chatbots.
While there is overlap between the two fields, there are some key
differences:
1.Focus: Machine learning is a general term for the process of
teaching a computer system to recognize patterns in data and
make predictions based on those patterns.
NLP, on the other hand, focuses specifically on the processing and
analysis of human language.
1.Data: Machine learning algorithms can be applied to any type of
data, such as images, audio, numerical data, and text. NLP,
however, is specifically focused on analyzing and processing
textual data.
3.Techniques: Machine learning techniques can be used in NLP,
but there are also many specialized techniques that are unique
to NLP, such as part-of-speech tagging, named entity
recognition, and sentiment analysis.
4.Tools: There are many machine learning libraries and
frameworks that can be used for a wide range of applications,
such as TensorFlow and Scikit-Learn. For NLP, there are
specific tools and libraries such as NLTK, Spacy, and Gensim
that are designed for processing and analyzing natural
language.
5.Applications: While machine learning can be applied to a
wide range of applications, NLP is specifically focused on
applications related to human language, such as speech
recognition, language translation, and chatbots.
 In summary, while there is overlap between machine
learning and NLP, NLP is a specialized field focused
specifically on the processing and analysis of human
language, while machine learning is a more general
field that can be applied to a wide range of data types
and applications.
 Machine Learning (ML) and Natural Language
Processing (NLP) are both subfields of artificial
intelligence (AI), but they differ in their goals and
approaches.
 Machine learning is a broader field that involves using
algorithms to analyze and learn patterns from data.
 Machine learning models can be used to make
predictions or classify data based on patterns that
they have learned from past examples.
 Natural Language Processing, on the other hand,
focuses specifically on the interaction
between
computers and human language.
 NLP involves the development of algorithms and
models that can understand, analyze, and generate
human language.
 While both fields use algorithms to analyze data, the data
that is analyzed in NLP is typically text-based, while
machine learning can be applied to a wide range of data
types, including images, audio, and numerical data.
 Another key difference is that NLP often involves more
complex models than those used in machine learning.
 NLP models must be able to understand the meaning and
context of language, as well as the grammar and syntax
of sentences. This requires a deeper understanding of
language and how it works, as well as the ability to deal
with ambiguity and variability in human language.
Machine learning and NLP are complementary fields
that are often used together to develop powerful AI
applications. Machine learning provides the foundation
for analyzing and understanding data, while NLP
enables computers to interact with humans in a more
natural and intuitive way.
Why NLP is complex
Natural language is extremely rich in form and structure, and very ambiguous.
•
How to represent meaning,
•
Which structures map to which meaning structures.
One input can mean many different things. Ambiguity can be at different levels.
Lexical (word level) ambiguity -- different meanings of words
Syntactic ambiguity -- different ways to parse the sentence
Interpreting partial information -- how to interpret pronouns
Contextual information -- context of the sentence may affect the meaning of that
sentence.
 Attachment ambiguity in natural language processing
Attachment ambiguity in natural language processing, Define attachment
ambiguity, examples of attachment ambiguity, attachment ambiguity is a
type of syntactic ambiguity
• Attachment
ambiguity
is
a
type
of
syntactic
ambiguity
Syntactic ambiguity
• It is a type of ambiguity where the doubt is about the syntactic structure of the
sentence. That is, there is a possibility that a sentence could be parsed in
many syntactical forms (a sentence may be interpreted in more than one
way). The doubt is about which one among different syntactical forms is
correct.
• For example, the sentence “old men and women” is ambiguous. Here, the
doubt is that whether the adjective old is attached with both men and
women or men alone.
Attachment ambiguity
It arises from uncertainty of attaching a phrase or clause to a part of
sentence. It usually happens when a sentence has more than two
prepositional phrases.
Example 1
In the sentence “the boy saw the girl with the telescope”, the uncertainty is
about relating the prepositional phrase “with the telescope” to “the boy” or
to “the girl”. This could end up with the following meaning based on the
attachment;
1.
The boy saw the girl carrying a telescope
2.
The boy saw the girl through the telescope
The first meaning arises it we attach the prepositional phrase with “the girl”
whereas the second one arises if we attach the prepositional phrase with “the
boy”.
Example 2
Consider the following sentence;
“Guna ate an ice cream with fruits from Chennai”
In this sentence, we have two prepositional phrases “with fruits” and “from
Chennai”. Here the possible meanings are as follows;
1.
Guna who is from Chennai ate an ice cream filled with fruits.
2.
Guna ate an ice cream filled with fruits and the ice cream is brought from
Chennai.
3.
Guna who is from Chennai ate the ice cream with the help of fruits.
4.
Guna with the help of fruits ate the ice cream which is brought from Chennai
Here we got four possibilities due to two prepositional phrases. Each one arises
from how we attach the prepositional phrases “with fruits” and “from Chennai” to
either “Guna” or the “ice cream”.
Prepositional Phrase (PP) Attachment
Problem
V – NP1 – P – NP2
(Here P means preposition)
NP2 attaches to NP1 ?
or NP2 attaches to V ?
Parse Trees for a Structurally Ambiguous
Sentence
Let the grammar be –
S  NP VP
NP  DT N | DT N PP
PP  P NP
VP  V NP PP | V NP
For the sentence,
“I saw a boy with a telescope”
Parse Tree - 1
S
NP
N
I
VP
V
NP
saw Det N
PP
a boy P
NP
with Det N
a telescope
Parse Tree -2
S
NP
N
I
VP
V
NP
saw Det N
PP
P
NP
a boy with Det N
a telescope
\\ Lexical Knowledge Networks \\
• It is also known as lexical semantic networks, are a type of
knowledge representation used in NLP
and computational
linguistics.
•
They represent the relationships between words based on their
meaning or semantic content.
•
In a lexical knowledge network, words are represented as
nodes, and the relationships between words are represented
as edges.
•
The edges may be labeled with a specific relationship type,
such as "synonym", "antonym", "hypernym" (a word that is
more general than another word), or "hyponym" (a word that
is more specific than another word).
 There are several different types of lexical knowledge
networks, each with their own specific characteristics and
uses. Some of the most well-known lexical knowledge
networks include WordNet, FrameNet, and ConceptNet.
• WordNet is a lexical database of English words and their
relationships, developed at Princeton University's Cognitive
Science Laboratory.
• It is widely used in NLP and computational linguistics for tasks
such as word sense disambiguation, text classification, and
machine translation.
• WordNet groups English words into sets of synonyms, called
synsets, which are defined by a common sense or meaning.
• Each synset contains one or more words that are related in
meaning and can be used interchangeably in certain contexts.
• For example, the synset for "car" includes the words
"automobile", "vehicle", and "motorcar".
Download