卷积神经网络句子分类

Convolutional Neural Networks for Sentence Classification Model Relate Work  Use pre-trained word vectors for sentence-level classification tasks  show a simple CNN with little hyper parameter tuning and static vectors achieves excellent results on multiple benchmarks.  propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. Word Embedding & Filter  一般每个字符的Word Embedding的长度为d，所以CNN的输入矩阵大小是不确定的，这取决于m的大小是多少，及句子的长短  卷积层本质上是个特征抽取层，可以设定超参数F来指定设立多少个特征抽取器（Filter）  某个Filter来说，可以想象有一个k*d大小的移动窗口从输入矩阵的第一个字开始不断往后移动，其中k是Filter指定的窗口大小，d是Word Embedding长度。  Chunk-MaxPooling：把某个Filter对应的Convolution层的所有特征向量进行分段，切割成若干段后，在每个分段里面各自取得一个最大特征值 Pooling  MaxPooling Over Time：对于某个Filter抽取到若干特征值，只取其中得分最大的那个值作为Pooling层保留值，其它特征值全部抛弃，值最大代表只保留这些特征中最强的，而抛弃其它弱的此类特征。  K-Max Pooling：可以取所有特征值中得分在Top –K 的值，并保留这些特征值原始的先后顺序，这种位置信息只是特征间的相对顺序，而非绝对位置信息。 Regularization  employ dropout on the penultimate layer with a constraint on l2-norms of the weight vectors(Hinton et al., 2012).  Dropout prevents co-adaptation of hidden units by randomly dropping out. Datasets and Experimental Setup Hyperparameters&Pre-trainedWord Vectors  For all datasets we use: rectified linear units, filter windows (h) of 3, 4, 5 with 100 feature maps each, dropout rate (p) of 0.5, l2 constraint (s) of 3, and mini-batch size of 50.  Initializing word vectors with those obtained from an unsupervised neural language model is a popular method to improve performance in the absence of a large supervised training set. Model Variations  CNN-rand  CNN-static  CNN-non-static  CNN-multichannel Results and Discussion  These results suggest that the pre-trained vectors are good, ‘universal’ feature extractors and can be utilized across datasets. Fine tuning the pre-trained vectors for each task gives still further improvements (CNN-non-static).  the single channel non-static model, the multichannel model is able to fine-tune the non-static channel to make it more specific to the task-at-hand.  the multichannel architecture would prevent overfitting and thus work better than the single channel model.The results, however, are mixed. Conclusion  A series of experiments with convolutional neural networks built on top of word2vec. Despite little tuning of hyperparameters, a simple CNN with one layer of convolution performs remarkably well.  The results add to the well-established evidence that unsupervised pre-training of word vectors is an important ingredient in deep learning for NLP

卷积神经网络句子分类

Products

Support

卷积神经网络句子分类

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib