Semantic Relation Extraction via Pre-trained CNNs

A comprehensive exploration of semantic relation extraction via pre-trained CNNs The paper begins with the discussion of the importance of relationship extraction in various applications and also as a research hotspot. It enumerates the several representative methods that have been employed in this relation classification, which include the utilisation of high level lexical and syntactic features, covering WordNet, part-of-speech tagging, morphological analysis, dependency parsing, named entity recognition to name a few. However, it clearly establishes the problems with these previously employed methods, which include expensive computation, and also the implicit error propagation birthed by these methods. The paper proposes a novel pre-trained CNN architecture which is then applied to the Sem-Eval-2010 task 8 dataset. The XM-CNN approach is utilised to conduct an analysis of the dataset with the BERT as the backbone and to undertake an all-encompassing study of the dataset. This proposed solution uses the MT-DNN pre training technique where the XM-CNN acquires the data through multi-task learning. The difficult semantic relation extraction problem benefits from the associated labelled dataset obtained using the MT-DNN thanks to this pre-trained input representation method. The replacement of the [CLS] with the [CXT] aids in capturing the sequential contextual semantics information. The primary focus is directed towards semantic information of entities and their latent types. The advantage offered by this approach is that it aids in the detection of more subtle cues despite the heterogenous structure of the given input sentence. Furthermore, a bi-level attention mechanism is employed for the capture of both entity aware and relation-aware pooling attention. The dataset evaluated in this paper is the XM-CNN on the SemEval-2010 Task-8. There are 10,717 annotated examples, 8000 training phrases, and 2717 testing examples in the dataset, which undergo 10 classifications. The carrying connection relationships include Cause–Effect, Product-Producer, Instrument-Agency, Content-Container, Entity-Origin, Component-Whole, Member-Collection, Entity-Destination, Message Topic to name a few. The results of this paper are compared to previous state of the art models for evaluation of the proposed methodology, through the use of the F1-score, which is based on the experimentation carried out. In these tests, the GloVe is used to train the XM-word CNN's embeddings. The MT-DNNKD, which extends PyTorch's BERT implementation, is used to create the model implementation. The other weights are chosen at random from a Gaussian distribution with a zero mean. On the development set, a cross-validation approach is undertaken to modify the hyperparameters of the model used. In general, when traditional features are used in models, the richer feature sets can result in improved performance. However, they are highly dependent on human creativity and preexisting NLP expertise, both of which are difficult to popularise. This is an issue that the pretraining model used here excels at solving, which is backed up by the experimental results of the paper. The attention modules' reinforcement mechanism derives more abstract higher-level features from the two entity matrices. The pooling layer's improvement mechanism can effectively solve the importance of individual windows. These are observed to be extremely useful in resolving relationship categorization problems. The influential relational terms can be simply found this way. An ablation study evaluates it as well. Improvisation of the F1-score is another positive observation. The usage of GloVe instead of the traditional word2vec model is another factor employed which leads to noticeable improvements. All in all, the paper successfully achieves it’s target of the improvisation of relationship extraction models using CNN by achieving a state-of-the-art F1-score of 91.6%. Future scope lies in the improvisation of this model performance through the addition of knowledge graphs. Joint Model of Entity Recognition and Relation Extraction with Self-attention Mechanism The paper begins by establishing the importance of Entity classification/recognition and relationship extraction from medical texts that refer to biomedical entities, and how the scope of this technology lies in helping with medical knowledge graph construction, medical question answering, medicine suggestion and a variety of other healthcare and medicine applications. To achieve these goals, most solutions used pipeline models in prior research. The EC/NER task is commonly carried out by conditional random fields (CRF), which are subsequently coupled up to commit the RE task by support vector machines, given the medical texts (SVM). These pipeline models have two flaws: (1) error propagation, in which faults in the EC/NER task are passed on to the RE task, and (2) ignorance of the intrinsic relationship or interaction between EC/NER and RE. With the growing popularity of deep learning to overcome such challenges, quite a handful of academics have turned towards the exploration of joint models, which has resulted in state-of-the-art performance on a number of datasets. However, the problems with this approach have been enumerated. The majority of models rely significantly on hand-crafted or artificial features and NLP tools, and they are unable to recognise all semantic orientation of a single entity in a sentence. Not to mention how the majority of studies have focused on English datasets. These problems are precisely what this study chooses to address. The proposed solution by the researchers plans to include their own dataset of Chinese medicine instructions propose a novel joint model to handle the two challenges listed above, using BiLSTM to learn deep features and a self-attention mechanism and multi-head selection to extract all the relationships between entities. Self-attention is typically employed to learn word dependency in a sentence and to record the sentence's underlying structure. Self-attention, in contrast to the convolutional neural network (CNN) and the recurrent neural network (RNN), ignores word distance and calculates word dependency directly, which is particularly successful for both long-distance and local dependencies. More importantly, compared to RNN, it has a significantly faster computational efficiency and fewer parameters. As mentioned above, the study uses Chinese medicine instructions as their new linguistic corpus, to study extraction and recognition on a dataset that has not been explored much, since most studies based on the usage of deep learning focus primarily on English datasets. The optimality of the discussed solution is verified by experimental comparison of the proposed method with previously enabled solutions. An F1 score of 93.25 % of NER task on the dataset used suggests that the annotation strategy and joint model are significantly effective. On further experimental comparison with previous models, a 6% improvement is observed in both EC and RE tasks. For the NER tasks, there is significant improvement as well, which is a sign of improved efficiency of this model on the CoNLL-2004 dataset. The future work for this model lies along the lines of data generalisation and attempting to train this model's learning abilities on a limited dataset, increasing the discontinuous entities which can be labelled from said dataset, as well as voyaging into more language rules for coming up with a solution to the discontinuous entities. Improve relation extraction with dual attention-guided graph convolutional networks The paper begins by familiarising the reader with the importance of relationship extraction in the scope of answering knowledge queries, constructing knowledge graphs, and its relevance as a supporting technology for information extraction. It then delves into the relevance of existing models used for the same purpose, namely entity recognition and rule-based methods, while also pointing out how entity recognition has fruited in satisfactory resulting observations, whereas the rule-based models are on poor generalizability. There are two types of relation extraction models now available: sequence-based and dependency-based. Sequence-based models work with word sequences, for example, by encoding words using recurrent neural networks to acquire sentence information. Dependency-based models incorporate the sentence relationship's dependency tree and efficiently employ the dependency tree's structural information to extract features. Dependency-based models, as opposed to sequence-based models, can capture implicit nonlocal syntactic links. The information in the dependency tree, on the other hand, isn't necessarily useful for entity relationship information. As a result, in order to boost system performance even more, several pruning procedures have been implemented to extract the dependency information. Using such pruning procedures, however, runs the danger of erasing some crucial information about the entire tree. The model should encompass the complete tree and utilise an end-to-end method to learn the strength of the relationships between entities to avoid losing this vital information and to make greater use of the hidden information in the tree. As a result, the task's key is to have the model learn from the full tree in order to strike a balance between maintaining and rejecting data. Multihop relational reasoning is required for multihop relational extraction, and hence removing the multihop influence on the dependent route is critical. The study aims to solve these two problems through the proposal of a model for the same that employs a dual attention mechanism with reinforcement learning in a graph convolutional network. This approach focuses on the representation of sentence words as nodes in a graph. The node representations are influenced by the nodes around them. When a neural network is applied to a graph structure, it can immediately extract node dependence information, reducing the multihop effect on a dependent path. Such an approach captures rich semantic information from the available dataset while at the same time also focuses on enabling the model to learn the strength of nodeto-node connections in order to make greater use of the dependency tree's information. The relationship classification's ambiguous information has a significant impact on the prediction results. To maximise the representation of relationship information and increase the accuracy of the relationship classification, distributional reinforcement learning is used to consider uncertain information in the relationship classification. As a result, this research uses a graph neural network with attention mechanism integrated reinforcement learning approach, however this method is a double attention graph neural network, which adaptively combines local features with global dependencies, as opposed to a single attention graph neural network. The researchers evaluate the proposed DAGCN model by evaluating it on the basis of TACRED and SemEval datasets. Special characteristics of the TACRED dataset include 41 defined relationship types and a special unrelated class. The experimental observations reveal the micro averaged F1 scores on the same. On the other hand, while the SemEval dataset is considerably smaller, it has still been widely used. It yields a total of 8000 samples, where the relationship between two supplied entities is noted on each. The paper reports the macro averaged F1 scores on this dataset. The study focused on the two primary attention modules and the classification reinforcement module. The addition of the attention modules resulted in significant improvements, according to these findings. F1 reduced by 3.5 without the location attention module, while F1 decreased by 3.3 without the relation attention module. If both are removed, the F1 value is reduced by 4.2. The results are unaffected by the feedforward neural network; eliminating it reduced the F1 score to 68.1. However, after reinforcement learning was improved, the outcome increased by 3.1F1. Overall, the two simultaneous attention modules and the categorization reinforcement module play critical roles in assisting the GCN in learning better aggregated information and generating better graph structure representations. In the future, the DAGCN model will help to enhance things. When faced with a shortage of data, the present relational extraction approach may be rendered useless; consequently, unsupervised knowledge extraction will aid future research. Direction-sensitive relation extraction using Bi-SDP attention model The paper begins by enumerating traditional relation extraction methodologies, namely feature and kernel-based methods and brings into focus the problems associated with such approaches, namely incorrect labelling, domain limitations and unsatisfactory accuracy, to name a few. Then it begins to talk about how this has led to the emergence of supervised neural networkbased methodologies to achieve the same. Such methods can automatically learn features from a sentence without the need for complicated feature engineering. However, they still possess a few weaknesses. First, the relationship's direction is ignored. When there are no directional prepositions, the relationship's orientation is uncertain. Second, the information from the reverse SDP is underutilised. The reverse SDP, according to the authors, not only provides significant suggestions for determining the direction of a link, but also provides more semantic information for relation extraction. Third, the input text's redundancy is not properly deleted. The keyword information or syntactical information in a phrase, particularly the sub-sentence between the supplied entities, can reflect the most relevant relationship, according to the prior work's analysis. As a result, several studies aim to employ SDP's denoising method or maintain the sub-sentence. These approaches, on the other hand, are prone to over-pruning. Fourth, the number of RNN cells is not adequately managed. The cell number of the RNN-based model can be efficiently lowered if the input sentence only contains a subset of the original sentence. This study provides a Bi-SDP-Att model for relation extraction that emphasises both relational and directional semantic words. To capture the crucial information in a phrase, BiLSTM uses a unique trimming technique and Bi-SDP attention mechanism. The pruning approach can keep as much relevant information in a sentence as possible. The self-attention mechanism and BiLSTM are used in the model's backbone structure to better capture the expression of words. They have the ability to perform semantic fusion between words in a sentence. Through a pair of CNNs, the Bi-SDP attention provides a set of parallel attention weights. One CNN focuses on key relational semantic terms, while the other focuses on directional words, both of which are utilised to capture sentence relation trigger words. Long-distance semantic information can be learned using RNN-based approaches. However, the work was hampered by the vanishing or exploding gradients problem, which occurs when gradients grow or decrease exponentially over extended sequences. To address this issue, BiLSTM was introduced to the task, which drew on six different forms of data, including position features, POS tags, and named object information. Following that, some RNN and attention mechanisms were included into the task. The early study Att-BiLSTM utilises the attention mechanism in Bi-LSTM for relation extraction. Furthermore, this model employs raw text with location markers as input rather than NLP tools or lexical resources. The SDP-based approaches, on the other hand, can extract the most important words in a phrase from a syntax dependency tree. The SDP-LSTM model was the first SDP-related work. The model used the shortest dependency path between two entities, then used four LSTM channels to aggregate four sorts of information in the SDP. Instead of using the SDP features to directly classify the relationship, the author's work employs the SDP to locate the crucial terms that activate relationship in the original phrase. This strategy can help prevent losing knowledge. Instead of two independent subtrees, the SDP is regarded as a single piece of text. To deal with Bi-SDP, they use a novel Bi-SDP attention model that takes into account the direction and distinct convolution units. The author’s chose two typical datasets to verify the direction problem because there are few datasets annotated with directions. The most extensively used dataset for relation extraction is the SemEval-2010 Task 8 dataset, which is utilised here to evaluate the performance of the proposed method and study the usefulness of the model. In addition, an additional comparative experiment is run on a more sophisticated and demanding dataset to further demonstrate the usefulness of the method: KBP37 To compare the experimental outcomes, this paper selects a few typical models. Furthermore, it is difficult to account for the direction of linkages in sentences using the CNN-based model. As a result, this model is compared to RNN-based approaches. On Semeval-2010 Task 8, the Bi-SDP-Att receives an F1-score of 85.1 percent. The suggested Bi-SDP-Att model beats all other RNN-based methods except Bi-LSTM+LET and DRNNs when compared to other RNNbased methods. DRNNs, on the other hand, require data augmentation, which might lead to overfitting. Furthermore, compared to DRNNs, fewer additional characteristics are required. To compare the model to Bi-LSTM+LET, the open-source code of Bi-LSTM+LET is run in the same experimental context and an F1-score of 84.2 percent is obtained. This research assesses the Bi-SDP-Att in terms of F1-score including all relation types to compare the results obtained on the KBP37 dataset. Because the texts in KBP37 are more complicated than those in SemEval-2010 Task 8, KBP37's F1-score findings are lower than those in SemEval-2010 Task 8. Despite the task's difficulty, the model receives an F1-score of 64.39 percent, outperforming current state-of-the-art performance. The proposed method Bi-SDP-Att outperforms the standard CNN and RNN methods by more than 10%. This is because of the application of the Bi-SDP attention mechanism to limit the impact of irrelevant input and identify more valuable features. Parallel attention weights, when compared to previous attention-based approaches, can make the proposed model more direction sensitive by recognising directional words. The author also investigated a novel pruning technique for reducing the length of the input sentence, which keeps the most important information in the original sentence while also reducing the amount of BiLSTM cells. Experiments reveal that the model obtains higher competitive performance on the SemEval-2010 Task 8 dataset and outperforms existing models on the KBP37 dataset, thanks to the Bi-SDP attention mechanism and pruning technique. However, this method is sensitive to relationships with evident physical direction, hence it is not ideal for extracting links with no apparent physical direction. The use of external knowledge to improve extraction outcomes is where this model's future research will focus. Traditional Chinese medicine entity relation extraction based on CNN with segment attention The paper begins by highlighting the growing attention of traditional Chinese medicine over the world as an alternative to traditional medical practices. Therefore, in this arena, entity relation mining has become a hot topic for research. Although there are numerous successful ways for extracting the relationship of biomedical entities, there are few related publications on herb relation extraction in the PubMed literature. As a result, this research investigates the problem of mining herb-related entity relations from PubMed literature and attempts to offer a viable alternative. Its goal is to overcome the problem of knowledge isolation in TCM knowledge bases by using the offered methodologies to extract herbal-related entity relationships. In biomedicine, entity relation extraction from text has received a lot of attention. Disease-specific, drug-protein, and chemical-protein relationships are among the entity relations. Machine learning methods such as Support Vector Machine (SVM) and logistic regression have replaced traditional rule-based or co-occurrence-based approaches in the implementation process. Deep learning technology has been applied to the relation extraction challenge with superior outcomes to avoid finicky feature engineering. However, relevant publications are rarely adapted to the TCM field. The authors suggest a novel architecture with an upgraded layer for entity relation extraction from PubMed literature in order to extract Chinese medicine-specific entity relations more consistently and effectively. There are two elements to the approaches proposed: To extract word-level features represented by word2vec, the first phase uses a Convolutional Neural Network with a SEGment ATTention Mechanism (SEGATT-CNN). The second section uses a machine learning classifier to connect the various embedding features in order to get the final relation categorization. The authors suggest a novel architecture with an upgraded layer for entity relation extraction from PubMed literature in order to extract Chinese medicine-specific entity relations more consistently and effectively. There are two elements to the approaches proposed: To extract word-level features represented by word2vec, the first phase uses a Convolutional Neural Network with a SEGment ATTention Mechanism (SEGATT-CNN). The second section uses a machine learning classifier to connect the various embedding features in order to get the final relation categorization. To assess the models' performance, the authors use precision (P), recall (R), and the F-score (F). The weighted average of precision and recall is the F-score. In another experiment, they introduce the Area Under the Curve (AUC) and accuracy. They used the designed SEGATT-CNN model and the approach of two-feature hybrid use SEGATT-CNN model combined with SVM classifier (SEGATT-CNN SVM) to test the performance of the suggested method. On this data set, they compared their methods to relevant deep learning models. Although its F value is not the highest, the SEGATT-CNN model that simply uses word embedding characteristics produced good results. However, the proposed SEGATT-CNN SVM technique outperformed all other current methods in terms of F-scores. In summary, after comparing the performance of all models, the authors found that employing solely word2vec and TF-IDF embedding features without human feature engineering yielded good results in terms of overall performance. This strategy has an absolute advantage over numerous baseline methods and has been proven to be effective in tackling this problem. Future research will focus on the use of unsupervised approaches in this field and the development of a Traditional Chinese Medicine knowledge base.

Semantic Relation Extraction via Pre-trained CNNs

Related documents

Products

Support

Semantic Relation Extraction via Pre-trained CNNs

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib