AI Assistant Data Augmentation: • Automated Augmentation: ▪ Using (nlpaug.augmenter.word) to augment some words in the given questions in data to generate new questions and use them in training data. Nlpaug: is a Python library for data augmentation in natural language processing (NLP). It provides various techniques for augmenting text data to improve the performance of NLP models. The augmentation source is wordnet. WordNet: is a lexical database of the English language that relates words to one another in terms of synonyms, hypernyms, hyponyms. Synonyms: are words or phrases that have similar or identical meanings. Hypernym: A hypernym is a word that represents a category or a general concept. It is more abstract and encompasses a range of specific, for example: Animal. Hyponym: A hyponym is a word that falls within a more general category represented by a hypernym. It is a more specific term that describes a particular instance or subtype of the broader concept, example: Bird. Used cosine similarity to compare the augmented questions to the original question and filter them based on the average score of all the augmented questions. ▪ • Modifying the way of asking in the questions. Example: “what are the types of….” Is Modified to “Could you tell me the types of…. Manual Augmentation: ▪ Use ChatGPT to generate multiple examples of the questions given in the data. Data Structure: • Instead of manipulating the JSON file directly by writing tags, patterns, and responses, we have made it easier by allowing the user to put questions and 1 answers in a text file then we transform it to a JSON file to train the model. • Each list of questions and answers must be separated by a space. Test Case of augmented data: 2