logo
  人民邮电出版社  
     
     
   
  首页 | 关于我们 | 新闻 | 分类检索 | 丛书检索 | 高级检索 | 招聘 | 读者交流卡 | 用户注册 | 用户登录
高级查询
分类查询
丛书查询
浏览图书
查看图书详情
单击可查看完整封面
书名: 文本挖掘(英文版)
评论星级:
书号: 978-7-115-20535-3
原书名: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
原出版社: Cambridge University Press
丛书名: 图灵原版计算机科学系列
分类: 计算机 >> 人工智能
作者: Ronen Feldman, James Sanger
译者:
出版日期: 2009-08-12
语种: 简体中文
开本: 16开
页数: 424
定价: 69.00 元人民币
 
    The information age has made it easy to store large amounts of data. The proliferationof documents available on the Web, on corporate intranets, on news wires, andelsewhere is overwhelming. However, although the amount of data available to usis constantly increasing, our ability to absorb and process this information remainsconstant. Search engines only exacerbate the problem by making more and moredocuments available in a matter of a few key strokes.
    Text mining is a new and exciting research area that tries to solve the informationoverload problem by using techniques from data mining, machine learning, naturallanguage processing (NLP), information retrieval (IR), and knowledge management.Text mining involves the preprocessing of document collections (text categorization,information extraction, term extraction), the storage of the intermediate representations,the techniques to analyze these intermediate representations (such as distributionanalysis, clustering, trend analysis, and association rules), and visualization ofthe results.
    This book presents a general theory of text mining along with the main techniquesbehind it.We offer a generalized architecture for text mining and outline thealgorithms and data structures typically used by text mining systems.
    The book is aimed at the advanced undergraduate students, graduate students,academic researchers, and professional practitioners interested in complete coverageof the text mining field. We have included all the topics critical to peoplewho plan to develop text mining systems or to use them. In particular, we havecovered preprocessing techniques such as text categorization, text clustering, andinformation extraction and analysis techniques such as association rules and linkanalysis.
    The book tries to blend together theory and practice; we have attempted toprovide many real-life scenarios that show how the different techniques are used inpractice.When writing the book we tried to make it as self-contained as possible andhave compiled a comprehensive bibliography for each topic so that the reader canexpand his or her knowledge accordingly.
    BOOK OVERVIEW
    The book starts with a gentle introduction to text mining that presents the basicdefinitions and prepares the reader for the next chapters. In the second chapter wedescribe the core text mining operations in detail while providing examples for eachoperation. The third chapter serves as an introduction to text mining preprocessingtechniques. We provide a taxonomy of the operations and set the ground forChapters IV through VII. Chapter IV offers a comprehensive description of thetext categorization problem and outlines the major algorithms for performing textcategorization.
    Chapter V introduces another important text preprocessing task called text clustering,and we again provide a concrete definition of the problem and outline themajor algorithms for performing text clustering. Chapter VI addresses what is probablythe most important text preprocessing technique for text mining – namely, informationextraction. We describe the general problem of information extraction andsupply the relevant definitions. Several examples of the output of information extractionin several domains are also presented.
    In Chapter VII, we discuss several state-of-the-art probabilistic models for informationextraction, and Chapter VIII describes several preprocessing applicationsthat either use the probabilistic models of Chapter VII or are based on hybridapproaches incorporating several models. The presentation layer of a typical textmining system is considered in Chapter IX. We focus mainly on aspects relatedto browsing large document collections and on issues related to query refinement.
    Chapter X surveys the common visualization techniques used either to visualize thedocument collection or the results obtained from the text mining operations. ChapterXI introduces the fascinating area of link analysis. We present link analysis asan analytical step based on the foundation of the text preprocessing techniques discussedin the previous chapters, most specifically information extraction. The chapterbegins with basic definitions from graph theory and moves to common techniquesfor analyzing large networks of entities.
    Finally, in Chapter XII, three real-world applications of text mining are considered.We begin by describing an application for articles posted in BioWorld magazine.
    This application identifies major biological entities such as genes and proteins andenables visualization of relationships between those entities. We then proceed tothe GeneWays application, which is based on analysis of PubMed articles. The nextapplication is based on analysis of U.S. patents and enables monitoring trends andvisualizing relationships between inventors, assignees, and technology terms.
    The appendix explains the DIAL language, which is a dedicated informationextraction language. We outline the structure of the language and describe its exactsyntax. We also offer several code examples that show how DIAL can be used toextract a variety of entities and relationships. A detailed bibliography concludes thebook.
    ACKNOWLEDGMENTS
    This book would not have been possible without the help of many individuals. Inaddition to acknowledgments made throughout the book, we feel it important totake the time to offer special thanks to an important few. Among these we wouldlike to mention especially Benjamin Rosenfeld, who devoted many hours to revisingthe categorization and clustering chapters. The people at ClearForest Corporationalso provided help in obtaining screen shots of applications using ClearForesttechnologies – most notably in Chapter XII. In particular, we would like to mentionthe assistance we received from RafiVesserman,YonatanAumann, Jonathan Schler,Yair Liberzon, Felix Harmatz, and Yizhar Regev. Their support meant a great dealto us in the completion of this project.
    Adding to this list, we would also like to thank Ian Bonner and Kathy Bentaiebof Inxight Software for the screen shots used in Chapter X. Also, we would like toextend our appreciation to Andrey Rzhetsky for his personal screen shots of theGeneWays application.
    A book written on a subject such as text mining is inevitably a culmination ofmany years of work. As such, our gratitude is extended to both Haym Hirsh andOren Etzioni, early collaborators in the field.
    In addition, we would like to thank Lauren Cowles of Cambridge UniversityPress for reading our drafts and patiently making numerous comments on how toimprove the structure of the book and its readability. Appreciation is also owed toJessica Farris for help in keeping two very busy coauthors on track.
    Finally it brings us great pleasure to thank those dearest to us – our children Yael,Hadar, Yair, Neta and Frithjof – for leaving us undisturbed in our rooms while wewere writing. We hope that, now that the book is finished, we will have more timeto devote to you and to enjoy your growth.We are also greatly indebted to our dearwives Hedva and Lauren for bearing with our long hours on the computer, doingresearch, and writing the endless drafts.Without your help, confidence, and supportwe would never have completed this book. Thank you for everything. We love you!
关于我们广告服务联系我们招聘信息法律公告用户反馈会员注册教师登记网站地图
Copyright © 2005 北京图灵文化发展有限公司 All Rights Reserved
地址:北京市朝阳区北苑路13号院1号楼领地OFFICE C座603室 100107
电话:010-510951815109518251095183 传真:010-52086950 E-mail:contact@turingbook.com
京ICP备06005389号