Method of Web Page Text Extraction Based on Text Feature and Page Structure

[1] [1] Liu L, Pu C. XWRAP: an XML 2 enable wrapper constructionsystem for the Web information source [C]//Proceedings of the 16th IEEE International Conference onData Engineering, 2000: 611-620.

[2] [2] Ma Ling, Goharian N, Chowdhury A,et al. Extracting unstructured data from template generated Web documents [C]//Proceedings of the 12th International Conference on Information and Knowledge anagement, 2003: 512-515．

[3] [3] Mei Xue, Cheng Xueqi, Guo Yan,et al. Fully automatic Wrapper generation for web information extraction [J]. Journal of Chinese Information Processing, 2008, 22(1): 22-29(in Chinese).

[4] [4] Sun Chengjie, Guan Yi. A statistical approach for content extraction from web page [J].Journal of Chinese Information Processing, 2004, 18(5): 17-22(in Chinese).

[5] [5] Sun Hao, Dong Shoubin. Adaptive approach for content extraction based on tag density [J].Journal of Zhengzhou University, 2009, 41(1): 44-47(in Chinese).

[6] [6] An Zengwen, Wang Chao, Xu Jiefeng. An approach based on machine learning for information extraction method [J].Microcomputer & Its Applications, 2010(12): 4-6(in Chinese).

[7] [7] You Guirong, Lu Yuchang. Extraction of topical information from Chinese web page based on the statistic and machine learning [J].Journal of Fujian Commercial College, 2009, 4(2): 68-72(in Chinese).

Tools

Get Citation

Copy Citation Text

HU Lulu, LIU Xiaoqin, SUN Kai. Method of Web Page Text Extraction Based on Text Feature and Page Structure[J]. Journal of Atmospheric and Environmental Optics, 2017, 12(3): 230

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Jan. 12, 2016

Accepted: --

Published Online: Jun. 9, 2017

The Author Email: Lulu HU (hllyyy@mail.ustc.edu.cn)

DOI:10.3969/j.issn.1673-6141.2017.03.009

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology

微信扫一扫：分享