Uploaded by Mohammad Bilal

Research abstract

advertisement
A comprehensive exploration of semantic relation extraction via pre-trained CNNs
The paper begins with the discussion of the importance of relationship extraction in various
applications and also as a research hotspot. It enumerates the several representative methods
that have been employed in this relation classification, which include the utilisation of high
level lexical and syntactic features, covering WordNet, part-of-speech tagging, morphological
analysis, dependency parsing, named entity recognition to name a few. However, it clearly
establishes the problems with these previously employed methods, which include expensive
computation, and also the implicit error propagation birthed by these methods. The paper
proposes a novel pre-trained CNN architecture which is then applied to the Sem-Eval-2010
task 8 dataset. The XM-CNN approach is utilised to conduct an analysis of the dataset with the
BERT as the backbone and to undertake an all-encompassing study of the dataset. This
proposed solution uses the MT-DNN pre training technique where the XM-CNN acquires the
data through multi-task learning. The difficult semantic relation extraction problem benefits
from the associated labelled dataset obtained using the MT-DNN thanks to this pre-trained
input representation method. The replacement of the [CLS] with the [CXT] aids in capturing
the sequential contextual semantics information. The primary focus is directed towards
semantic information of entities and their latent types. The advantage offered by this approach
is that it aids in the detection of more subtle cues despite the heterogenous structure of the given
input sentence. Furthermore, a bi-level attention mechanism is employed for the capture of
both entity aware and relation-aware pooling attention.
The dataset evaluated in this paper is the XM-CNN on the SemEval-2010 Task-8. There are
10,717 annotated examples, 8000 training phrases, and 2717 testing examples in the dataset,
which undergo 10 classifications. The carrying connection relationships include Cause–Effect,
Product-Producer, Instrument-Agency, Content-Container, Entity-Origin, Component-Whole,
Member-Collection, Entity-Destination, Message Topic to name a few.
The results of this paper are compared to previous state of the art models for evaluation of the
proposed methodology, through the use of the F1-score, which is based on the experimentation
carried out. In these tests, the GloVe is used to train the XM-word CNN's embeddings. The
MT-DNNKD, which extends PyTorch's BERT implementation, is used to create the model
implementation. The other weights are chosen at random from a Gaussian distribution with a
zero mean. On the development set, a cross-validation approach is undertaken to modify the
hyperparameters of the model used.
In general, when traditional features are used in models, the richer feature sets can result in
improved performance. However, they are highly dependent on human creativity and preexisting NLP expertise, both of which are difficult to popularise. This is an issue that the pretraining model used here excels at solving, which is backed up by the experimental results of
the paper. The attention modules' reinforcement mechanism derives more abstract higher-level
features from the two entity matrices. The pooling layer's improvement mechanism can
effectively solve the importance of individual windows. These are observed to be extremely
useful in resolving relationship categorization problems. The influential relational terms can
be simply found this way. An ablation study evaluates it as well. Improvisation of the F1-score
is another positive observation. The usage of GloVe instead of the traditional word2vec model
is another factor employed which leads to noticeable improvements.
All in all, the paper successfully achieves it’s target of the improvisation of relationship
extraction models using CNN by achieving a state-of-the-art F1-score of 91.6%. Future scope
lies in the improvisation of this model performance through the addition of knowledge graphs.
Joint Model of Entity Recognition and Relation Extraction with Self-attention
Mechanism
The paper begins by establishing the importance of Entity classification/recognition and
relationship extraction from medical texts that refer to biomedical entities, and how the scope
of this technology lies in helping with medical knowledge graph construction, medical question
answering, medicine suggestion and a variety of other healthcare and medicine applications.
To achieve these goals, most solutions used pipeline models in prior research. The EC/NER
task is commonly carried out by conditional random fields (CRF), which are subsequently
coupled up to commit the RE task by support vector machines, given the medical texts (SVM).
These pipeline models have two flaws: (1) error propagation, in which faults in the EC/NER
task are passed on to the RE task, and (2) ignorance of the intrinsic relationship or interaction
between EC/NER and RE. With the growing popularity of deep learning to overcome such
challenges, quite a handful of academics have turned towards the exploration of joint models,
which has resulted in state-of-the-art performance on a number of datasets. However, the
problems with this approach have been enumerated. The majority of models rely significantly
on hand-crafted or artificial features and NLP tools, and they are unable to recognise all
semantic orientation of a single entity in a sentence. Not to mention how the majority of studies
have focused on English datasets. These problems are precisely what this study chooses to
address.
The proposed solution by the researchers plans to include their own dataset of Chinese
medicine instructions propose a novel joint model to handle the two challenges listed above,
using BiLSTM to learn deep features and a self-attention mechanism and multi-head selection
to extract all the relationships between entities. Self-attention is typically employed to learn
word dependency in a sentence and to record the sentence's underlying structure. Self-attention,
in contrast to the convolutional neural network (CNN) and the recurrent neural network (RNN),
ignores word distance and calculates word dependency directly, which is particularly
successful for both long-distance and local dependencies. More importantly, compared to
RNN, it has a significantly faster computational efficiency and fewer parameters.
As mentioned above, the study uses Chinese medicine instructions as their new linguistic
corpus, to study extraction and recognition on a dataset that has not been explored much, since
most studies based on the usage of deep learning focus primarily on English datasets.
The optimality of the discussed solution is verified by experimental comparison of the proposed
method with previously enabled solutions. An F1 score of 93.25 % of NER task on the dataset
used suggests that the annotation strategy and joint model are significantly effective. On further
experimental comparison with previous models, a 6% improvement is observed in both EC and
RE tasks. For the NER tasks, there is significant improvement as well, which is a sign of
improved efficiency of this model on the CoNLL-2004 dataset.
The future work for this model lies along the lines of data generalisation and attempting to train
this model's learning abilities on a limited dataset, increasing the discontinuous entities which
can be labelled from said dataset, as well as voyaging into more language rules for coming up
with a solution to the discontinuous entities.
Improve relation extraction with dual attention-guided graph convolutional networks
The paper begins by familiarising the reader with the importance of relationship extraction in
the scope of answering knowledge queries, constructing knowledge graphs, and its relevance
as a supporting technology for information extraction. It then delves into the relevance of
existing models used for the same purpose, namely entity recognition and rule-based methods,
while also pointing out how entity recognition has fruited in satisfactory resulting observations,
whereas the rule-based models are on poor generalizability. There are two types of relation
extraction models now available: sequence-based and dependency-based. Sequence-based
models work with word sequences, for example, by encoding words using recurrent neural
networks to acquire sentence information. Dependency-based models incorporate the sentence
relationship's dependency tree and efficiently employ the dependency tree's structural
information to extract features. Dependency-based models, as opposed to sequence-based
models, can capture implicit nonlocal syntactic links. The information in the dependency tree,
on the other hand, isn't necessarily useful for entity relationship information. As a result, in
order to boost system performance even more, several pruning procedures have been
implemented to extract the dependency information. Using such pruning procedures, however,
runs the danger of erasing some crucial information about the entire tree. The model should
encompass the complete tree and utilise an end-to-end method to learn the strength of the
relationships between entities to avoid losing this vital information and to make greater use of
the hidden information in the tree. As a result, the task's key is to have the model learn from
the full tree in order to strike a balance between maintaining and rejecting data. Multihop
relational reasoning is required for multihop relational extraction, and hence removing the
multihop influence on the dependent route is critical. The study aims to solve these two
problems through the proposal of a model for the same that employs a dual attention
mechanism with reinforcement learning in a graph convolutional network. This approach
focuses on the representation of sentence words as nodes in a graph. The node representations
are influenced by the nodes around them. When a neural network is applied to a graph structure,
it can immediately extract node dependence information, reducing the multihop effect on a
dependent path. Such an approach captures rich semantic information from the available
dataset while at the same time also focuses on enabling the model to learn the strength of nodeto-node connections in order to make greater use of the dependency tree's information. The
relationship classification's ambiguous information has a significant impact on the prediction
results. To maximise the representation of relationship information and increase the accuracy
of the relationship classification, distributional reinforcement learning is used to consider
uncertain information in the relationship classification. As a result, this research uses a graph
neural network with attention mechanism integrated reinforcement learning approach, however
this method is a double attention graph neural network, which adaptively combines local
features with global dependencies, as opposed to a single attention graph neural network.
The researchers evaluate the proposed DAGCN model by evaluating it on the basis of
TACRED and SemEval datasets. Special characteristics of the TACRED dataset include 41
defined relationship types and a special unrelated class. The experimental observations reveal
the micro averaged F1 scores on the same. On the other hand, while the SemEval dataset is
considerably smaller, it has still been widely used. It yields a total of 8000 samples, where the
relationship between two supplied entities is noted on each. The paper reports the macro
averaged F1 scores on this dataset.
The study focused on the two primary attention modules and the classification reinforcement
module. The addition of the attention modules resulted in significant improvements, according
to these findings. F1 reduced by 3.5 without the location attention module, while F1 decreased
by 3.3 without the relation attention module. If both are removed, the F1 value is reduced by
4.2. The results are unaffected by the feedforward neural network; eliminating it reduced the
F1 score to 68.1. However, after reinforcement learning was improved, the outcome increased
by 3.1F1. Overall, the two simultaneous attention modules and the categorization
reinforcement module play critical roles in assisting the GCN in learning better aggregated
information and generating better graph structure representations.
In the future, the DAGCN model will help to enhance things. When faced with a shortage of
data, the present relational extraction approach may be rendered useless; consequently,
unsupervised knowledge extraction will aid future research.
Direction-sensitive relation extraction using Bi-SDP attention model
The paper begins by enumerating traditional relation extraction methodologies, namely feature
and kernel-based methods and brings into focus the problems associated with such approaches,
namely incorrect labelling, domain limitations and unsatisfactory accuracy, to name a few.
Then it begins to talk about how this has led to the emergence of supervised neural networkbased methodologies to achieve the same. Such methods can automatically learn features from
a sentence without the need for complicated feature engineering. However, they still possess a
few weaknesses. First, the relationship's direction is ignored. When there are no directional
prepositions, the relationship's orientation is uncertain. Second, the information from the
reverse SDP is underutilised. The reverse SDP, according to the authors, not only provides
significant suggestions for determining the direction of a link, but also provides more semantic
information for relation extraction. Third, the input text's redundancy is not properly deleted.
The keyword information or syntactical information in a phrase, particularly the sub-sentence
between the supplied entities, can reflect the most relevant relationship, according to the prior
work's analysis. As a result, several studies aim to employ SDP's denoising method or maintain
the sub-sentence. These approaches, on the other hand, are prone to over-pruning. Fourth, the
number of RNN cells is not adequately managed. The cell number of the RNN-based model
can be efficiently lowered if the input sentence only contains a subset of the original sentence.
This study provides a Bi-SDP-Att model for relation extraction that emphasises both relational
and directional semantic words. To capture the crucial information in a phrase, BiLSTM uses
a unique trimming technique and Bi-SDP attention mechanism. The pruning approach can keep
as much relevant information in a sentence as possible. The self-attention mechanism and
BiLSTM are used in the model's backbone structure to better capture the expression of words.
They have the ability to perform semantic fusion between words in a sentence. Through a pair
of CNNs, the Bi-SDP attention provides a set of parallel attention weights. One CNN focuses
on key relational semantic terms, while the other focuses on directional words, both of which
are utilised to capture sentence relation trigger words. Long-distance semantic information can
be learned using RNN-based approaches. However, the work was hampered by the vanishing
or exploding gradients problem, which occurs when gradients grow or decrease exponentially
over extended sequences. To address this issue, BiLSTM was introduced to the task, which
drew on six different forms of data, including position features, POS tags, and named object
information. Following that, some RNN and attention mechanisms were included into the task.
The early study Att-BiLSTM utilises the attention mechanism in Bi-LSTM for relation
extraction. Furthermore, this model employs raw text with location markers as input rather than
NLP tools or lexical resources. The SDP-based approaches, on the other hand, can extract the
most important words in a phrase from a syntax dependency tree. The SDP-LSTM model was
the first SDP-related work. The model used the shortest dependency path between two entities,
then used four LSTM channels to aggregate four sorts of information in the SDP. Instead of
using the SDP features to directly classify the relationship, the author's work employs the SDP
to locate the crucial terms that activate relationship in the original phrase. This strategy can
help prevent losing knowledge. Instead of two independent subtrees, the SDP is regarded as a
single piece of text. To deal with Bi-SDP, they use a novel Bi-SDP attention model that takes
into account the direction and distinct convolution units.
The author’s chose two typical datasets to verify the direction problem because there are few
datasets annotated with directions. The most extensively used dataset for relation extraction is
the SemEval-2010 Task 8 dataset, which is utilised here to evaluate the performance of the
proposed method and study the usefulness of the model. In addition, an additional comparative
experiment is run on a more sophisticated and demanding dataset to further demonstrate the
usefulness of the method: KBP37
To compare the experimental outcomes, this paper selects a few typical models. Furthermore,
it is difficult to account for the direction of linkages in sentences using the CNN-based model.
As a result, this model is compared to RNN-based approaches. On Semeval-2010 Task 8, the
Bi-SDP-Att receives an F1-score of 85.1 percent. The suggested Bi-SDP-Att model beats all
other RNN-based methods except Bi-LSTM+LET and DRNNs when compared to other RNNbased methods. DRNNs, on the other hand, require data augmentation, which might lead to
overfitting. Furthermore, compared to DRNNs, fewer additional characteristics are required.
To compare the model to Bi-LSTM+LET, the open-source code of Bi-LSTM+LET is run in
the same experimental context and an F1-score of 84.2 percent is obtained. This research
assesses the Bi-SDP-Att in terms of F1-score including all relation types to compare the results
obtained on the KBP37 dataset. Because the texts in KBP37 are more complicated than those
in SemEval-2010 Task 8, KBP37's F1-score findings are lower than those in SemEval-2010
Task 8. Despite the task's difficulty, the model receives an F1-score of 64.39 percent,
outperforming current state-of-the-art performance. The proposed method Bi-SDP-Att
outperforms the standard CNN and RNN methods by more than 10%. This is because of the
application of the Bi-SDP attention mechanism to limit the impact of irrelevant input and
identify more valuable features. Parallel attention weights, when compared to previous
attention-based approaches, can make the proposed model more direction sensitive by
recognising directional words. The author also investigated a novel pruning technique for
reducing the length of the input sentence, which keeps the most important information in the
original sentence while also reducing the amount of BiLSTM cells. Experiments reveal that the
model obtains higher competitive performance on the SemEval-2010 Task 8 dataset and
outperforms existing models on the KBP37 dataset, thanks to the Bi-SDP attention mechanism
and pruning technique. However, this method is sensitive to relationships with evident physical
direction, hence it is not ideal for extracting links with no apparent physical direction. The use
of external knowledge to improve extraction outcomes is where this model's future research
will focus.
Traditional Chinese medicine entity relation extraction based on CNN with segment
attention
The paper begins by highlighting the growing attention of traditional Chinese medicine over
the world as an alternative to traditional medical practices. Therefore, in this arena, entity
relation mining has become a hot topic for research. Although there are numerous successful
ways for extracting the relationship of biomedical entities, there are few related publications
on herb relation extraction in the PubMed literature. As a result, this research investigates the
problem of mining herb-related entity relations from PubMed literature and attempts to offer a
viable alternative. Its goal is to overcome the problem of knowledge isolation in TCM
knowledge bases by using the offered methodologies to extract herbal-related entity
relationships. In biomedicine, entity relation extraction from text has received a lot of attention.
Disease-specific, drug-protein, and chemical-protein relationships are among the entity
relations. Machine learning methods such as Support Vector Machine (SVM) and logistic
regression have replaced traditional rule-based or co-occurrence-based approaches in the
implementation process. Deep learning technology has been applied to the relation extraction
challenge with superior outcomes to avoid finicky feature engineering. However, relevant
publications are rarely adapted to the TCM field. The authors suggest a novel architecture with
an upgraded layer for entity relation extraction from PubMed literature in order to extract
Chinese medicine-specific entity relations more consistently and effectively. There are two
elements to the approaches proposed: To extract word-level features represented by word2vec,
the first phase uses a Convolutional Neural Network with a SEGment ATTention Mechanism
(SEGATT-CNN). The second section uses a machine learning classifier to connect the various
embedding features in order to get the final relation categorization. The authors suggest a novel
architecture with an upgraded layer for entity relation extraction from PubMed literature in
order to extract Chinese medicine-specific entity relations more consistently and effectively.
There are two elements to the approaches proposed: To extract word-level features represented
by word2vec, the first phase uses a Convolutional Neural Network with a SEGment ATTention
Mechanism (SEGATT-CNN). The second section uses a machine learning classifier to connect
the various embedding features in order to get the final relation categorization. To assess the
models' performance, the authors use precision (P), recall (R), and the F-score (F). The
weighted average of precision and recall is the F-score. In another experiment, they introduce
the Area Under the Curve (AUC) and accuracy. They used the designed SEGATT-CNN model
and the approach of two-feature hybrid use SEGATT-CNN model combined with SVM
classifier (SEGATT-CNN SVM) to test the performance of the suggested method. On this data
set, they compared their methods to relevant deep learning models. Although its F value is not
the highest, the SEGATT-CNN model that simply uses word embedding characteristics
produced good results. However, the proposed SEGATT-CNN SVM technique outperformed
all other current methods in terms of F-scores. In summary, after comparing the performance
of all models, the authors found that employing solely word2vec and TF-IDF embedding
features without human feature engineering yielded good results in terms of overall
performance. This strategy has an absolute advantage over numerous baseline methods and has
been proven to be effective in tackling this problem. Future research will focus on the use of
unsupervised approaches in this field and the development of a Traditional Chinese Medicine
knowledge base.
Download