Uploaded by diaaesam123

Data Structure & Data Augmentation For AI Assistant

advertisement
AI Assistant
Data Augmentation:
• Automated Augmentation:
▪
Using (nlpaug.augmenter.word) to augment some words in the given
questions in data to generate new questions and use them in training
data.
 Nlpaug: is a Python library for data augmentation in natural
language processing (NLP). It provides various techniques for
augmenting text data to improve the performance of NLP models.
 The augmentation source is wordnet.
 WordNet: is a lexical database of the English language that
relates words to one another in terms of synonyms, hypernyms,
hyponyms.
 Synonyms: are words or phrases that have similar or identical
meanings.
 Hypernym: A hypernym is a word that represents a category or a
general concept. It is more abstract and encompasses a range of
specific, for example: Animal.
 Hyponym: A hyponym is a word that falls within a more general
category represented by a hypernym. It is a more specific term
that describes a particular instance or subtype of the broader
concept, example: Bird.
 Used cosine similarity to compare the augmented questions to the
original question and filter them based on the average score of all
the augmented questions.
▪
•
Modifying the way of asking in the questions.
 Example: “what are the types of….” Is Modified to “Could you
tell me the types of….
Manual Augmentation:
▪
Use ChatGPT to generate multiple examples of the questions given in
the data.
Data Structure:
•
Instead of manipulating the JSON file directly by writing tags, patterns, and
responses, we have made it easier by allowing the user to put questions and
1
answers in a text file then we transform it to a JSON file to train the model.
•
Each list of questions and answers must be separated by a space.
Test Case of augmented data:
2
Download