The system is workable even without manually tagging the datasets before training the model (the authors just use cora dataset to train and test on real-world PDF papers), as this is significantly different from what other works have done so far. The main model is adapted from Freitag and Mccallum (1999) and the authors add word features of the Nymble HMM (Bikel et al, 1997) to it. What is innovative is that it actually combines two HMM models. The feasibility of the proposed model is thus justified. The result shows that our system significantly outperforms Zotero 4.0. Finally, the paper conducts the fourth experiment with a larger dataset of 103 papers to compare our system with Zotero 4.0. The result shows that our proposed system (with online query) can perform pretty well on bibliographical data extraction and even outperform the free citation management tool Zotero 3.0. In the third experiment the paper examines the performance of our system on a small dataset of 43 real PDF research papers. The second experiment shows that our proposed model (without the aid of online query) can achieve as good performance as other researcher's model on Cora paper header dataset. The result shows that one state model can have a comparable performance with multi-state model, but is more suitable to deal with real-world unknown states. The first experiment compares two different HMM models: multi-state model and one state model (the proposed model). Four experiments are conducted to examine the feasibility of the proposed system.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |