• Journal of Internet Computing and Services
    ISSN 2287 - 1136 (Online) / ISSN 1598 - 0170 (Print)
    https://jics.or.kr/

Automatic Information Extraction for Structured Web Documents


Bo-Hyun Yun, Journal of Internet Computing and Services, Vol. 6, No. 3, pp. 129-146, Jun. 2005
Full Text:

Abstract

This paper proposes the web information extraction system that extracts the pre-defined information automatically from web documents (i.e, HTML documents) and integrates the extracted information, The system recognizes entities without lables by the probabilistic based entity recognition method and extends the existing domain knowledge semiautomatically by using the extracted data, Moreover, the system extracts the sub-linked information linked to the basic page and integrates the similar results extracted from heterogeneous sources, The experimental result shows that the system extracts the sub-linked information and uses the probabilistic based entity recognition enhances the precision significantly against the system using only the domain knowledge, Moreover, the presented system can the more various information precisely due to applying the system with flexibleness according to domains, Because bath the semiautomatic domain knowledge expansion and the probabilistic based entity recognition improve the quality of the information, the system can increase the degree of user satisfaction at its maximum. Thus, this system can satisfy the intellectual curiosity of users from movie sites, performance sites, and dining room sites, We can construct various comparison shopping mall and contribute the revitalization of e-business.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Yun, B. (2005). Automatic Information Extraction for Structured Web Documents. Journal of Internet Computing and Services, 6(3), 129-146.

[IEEE Style]
B. Yun, "Automatic Information Extraction for Structured Web Documents," Journal of Internet Computing and Services, vol. 6, no. 3, pp. 129-146, 2005.

[ACM Style]
Bo-Hyun Yun. 2005. Automatic Information Extraction for Structured Web Documents. Journal of Internet Computing and Services, 6, 3, (2005), 129-146.