搜索

x

地理信息服务领域的实体自动化识别

Automatic Entity Recognition of Geographic Information Service Document

  • 摘要: 针对地理信息服务领域(Geographic Information Services,GIServices)在实体自动识别方面存在缺乏语料、多种实体嵌套、语义稀疏等问题,本文设计了一套地理信息服务文献实体标注规范,构建了地理信息服务领域的语料;在传统实体识别模型BiLSTM-CRF的基础上,引入了BERT(Bidirectional Encoder Representaions from Transformers)预训练模型及卷积层(Convolutional layer),构建了BERT-1DCNN-BiLSTM-CRF模型,提升了地理信息服务文献实体识别的准确率.该模型在词嵌入层以BERT预训练模型取代了传统的静态语言模型,从而有效地解决了地理信息服务领域因缺乏大量训练语料而无法表达更丰富句子语义信息的问题;此外,在BERT模型之后还加入了字间卷积特征,提升了句子局部特征的表示能力,降低了句子语义稀疏的干扰.实验结果表明,融合了BERT模型与CNN模型的GIServices文献实体识别方法效果优于传统深度学习的方法,模型准确率达到了0.826 8,能够较好地实现GIServices文献自动化实体识别,同时也能较好地体现基于BERT的深度学习模型在实体自动化识别方面的有效性.

     

    Abstract: In order to solve the problems in the field of geographic information services (GIServices), such as lack of corpus, nesting of multiple entities, and semantic sparser, etc., in our report, a set of document entity labeling specifications for geographic information services was designed and the corpus in this field was constructed. Based on the traditional entity recognition model BiLSTM-CRF, the BERT (Bidirectional Encoder Representations from Transformers) pre-training model and convolutional layer were introduced, the BERT-1DCNN-BiLSTM-CRF model was proposed to improve the accuracy of the document entity recognition in geographic information services. In the word embedding layer of this model, the traditional static language model was replaced by the BERT pre-training model, which can solve the problems that the geographic information service field lacks a large amount of training corpus and cannot represent richer sentence semantic information. Additionally, the word volume features was added to the BERT model to improve the ability to express local features of sentences and reduce the interference of sentence semantic sparseness. The results showed that the GIServices document entity recognition method, which integrates BERT and CNN model, is better than the traditional deep learning method, whose accuracy is as high as 0.8268. It can realize the automatic entity recognition of GIServices documents effectively, which extends the potential application of deep learning model based on BERT in GIServices domain entity recognition.

     

/

返回文章
返回