As the number of financial literature grows rapidly, Financial text mining is becoming important increasingly. In recent years, extracting valuable information from financial documents, namely financial text mining, gained significant… Click to show full abstract
As the number of financial literature grows rapidly, Financial text mining is becoming important increasingly. In recent years, extracting valuable information from financial documents, namely financial text mining, gained significant popularity within research communities. Although Deep Learning-based financial text mining has achieved remarkable progress recently, in financial fields it still suffers from issues of lack of task-specific labeled training data. To alleviate these issues, we present a pretraining financial text encoder, named F-BERT, a domain-specific language model pretrained on large-scale financial corpora. Different from original BERT, proposed F-BERT is trained continually on both general corpus and financial domain corpus, and four pretraining tasks can be pretrained through lifelong learning, which can enable our F-BERT to continually capture language knowledge and semantic information. The experimental results demonstrate that proposed F-BERT achieves strong results on several financial text mining tasks. Extensive experimental results show the effectiveness and robustness of F-BERT. The source code and pretrained models of F-BERT are available online.
               
Click one of the above tabs to view related content.