Automatic Text Summarization

Gebonden Engels 2014 9781848216686
Verwachte levertijd ongeveer 9 werkdagen

Samenvatting

Textual information in the form of digital documents quickly accumulates to create huge amounts of data. The majority of these documents are unstructured: it is unrestricted text and has not been organized into traditional databases. Processing documents is therefore a perfunctory task, mostly due to a lack of standards. It has thus become extremely difficult to implement automatic text analysis tasks. Automatic Text Summarization (ATS), by condensing the text while maintaining relevant information, can help to process this ever–increasing, difficult–to–handle, mass of information.

This book examines the motivations and different algorithms for ATS. The author presents the recent state of the art before describing the main problems of ATS, as well as the difficulties and solutions provided by the community. The book provides recent advances in ATS, as well as current applications and trends. The approaches are statistical, linguistic and symbolic. Several examples are also included in order to clarify the theoretical concepts.

Specificaties

ISBN13:9781848216686
Taal:Engels
Bindwijze:gebonden
Aantal pagina's:376
Serie:ISTE

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Inhoudsopgave

<p>FOREWORD BY A. ZAMORA AND R. SALVADOR&nbsp;xi</p>
<p>FOREWORD BY H. SAGGION&nbsp;xv</p>
<p>NOTATION&nbsp;xvii</p>
<p>INTRODUCTION xix</p>
<p>PART 1. FOUNDATIONS 1</p>
<p>CHAPTER 1. WHY SUMMARIZE TEXTS?&nbsp;3</p>
<p>1.1. The need for automatic summarization&nbsp;3</p>
<p>1.2. Definitions of text summarization 5</p>
<p>1.3. Categorizing automatic summaries&nbsp;10</p>
<p>1.4. Applications of automatic text summarization&nbsp;13</p>
<p>1.5. About automatic text summarization&nbsp;15</p>
<p>1.6. Conclusion&nbsp;21</p>
<p>CHAPTER 2. AUTOMATIC TEXT SUMMARIZATION: SOME IMPORTANT CONCEPTS&nbsp;23</p>
<p>2.1. Processes before the process 23</p>
<p>2.1.1. Sentence–term matrix: the vector space model (VSM) model 26</p>
<p>2.2. Extraction, abstraction or compression? 28</p>
<p>2.3. Extraction–based summarization 30</p>
<p>2.3.1. Surface–level algorithms&nbsp;31</p>
<p>2.3.2. Intermediate–level algorithms 33</p>
<p>2.3.3. Deep parsing algorithms&nbsp;34</p>
<p>2.4. Abstract summarization&nbsp;35</p>
<p>2.4.1. FRUMP&nbsp;35</p>
<p>2.4.2. Information extraction and abstract generation&nbsp;38</p>
<p>2.5. Sentence compression and Fusion&nbsp;38</p>
<p>2.5.1. Sentence compression&nbsp;38</p>
<p>2.5.2. Multisentence fusion&nbsp;39</p>
<p>2.6. The limits of extraction&nbsp;39</p>
<p>2.6.1. Cohesion and coherence&nbsp;40</p>
<p>2.6.2. The HexTAC experiment 42</p>
<p>2.7. The evolution of text summarization tasks 43</p>
<p>2.7.1. Traditional tasks 43</p>
<p>2.7.2. Current and future problems 45</p>
<p>2.8. Evaluating summaries&nbsp;50</p>
<p>2.9. Conclusion&nbsp;51</p>
<p>CHAPTER 3. SINGLE–DOCUMENT SUMMARIZATION 53</p>
<p>3.1. Historical approaches&nbsp;53</p>
<p>3.1.1. Luhn s Automatic Creation of Literature Abstracts 57</p>
<p>3.1.2. The Luhn algorithm&nbsp;59</p>
<p>3.1.3. Edmundson s linear combination&nbsp;61</p>
<p>3.1.4. Extracts by elimination 64</p>
<p>3.2. Machine learning approaches 66</p>
<p>3.2.1. Machine learning parameters 66</p>
<p>3.3. State–of–the–art approaches&nbsp;69</p>
<p>3.4. Latent semantic analysis&nbsp;73</p>
<p>3.4.1. Singular value decomposition (SVD) 73</p>
<p>3.4.2. Sentence weighting by SVD 74</p>
<p>3.5. Graph–based approaches&nbsp;76</p>
<p>3.5.1. PAGERANK and SNA algorithms&nbsp;77</p>
<p>3.5.2. Graphs and automatic text summarization&nbsp;78</p>
<p>3.5.3. Constructing the graph&nbsp;79</p>
<p>3.5.4. Sentence weighting&nbsp;80</p>
<p>3.6. DIVTEX: a summarizer based on the divergence of probability distribution&nbsp;83</p>
<p>3.7. CORTEX&nbsp;85</p>
<p>3.7.1. Frequential measures&nbsp;86</p>
<p>3.7.2. Hamming measures&nbsp;87</p>
<p>3.7.3. Mixed measures 88</p>
<p>3.7.4. Decision algorithm&nbsp;89</p>
<p>3.8. ARTEX 90</p>
<p>3.9. ENERTEX 93</p>
<p>3.9.1. Spins and neural networks 93</p>
<p>3.9.2. The textual energy similarity measure 95</p>
<p>3.9.3. Summarization by extraction and textual energy 97</p>
<p>3.10. Approaches using rhetorical analysis&nbsp;102</p>
<p>3.11. Lexical chains&nbsp;107</p>
<p>3.12. Conclusion&nbsp;107</p>
<p>CHAPTER 4. GUIDED MULTI–DOCUMENT SUMMARIZATION&nbsp;109</p>
<p>4.1. Introduction&nbsp;109</p>
<p>4.2. The problems of multidocument summarization 110</p>
<p>4.3. DUC/TAC &amp; INEX Tweet Contextualization&nbsp;112</p>
<p>4.4. The taxonomy of MDS methods 115</p>
<p>4.4.1. Structure based 115</p>
<p>4.4.2. Vector space model based 116</p>
<p>4.4.3. Graph based&nbsp;117</p>
<p>4.5. Some multi–document summarization systems and algorithms 117</p>
<p>4.5.1. SUMMONS&nbsp;118</p>
<p>4.5.2. Maximal marginal relevance 119</p>
<p>4.5.3. A multidocument biography summarization system 120</p>
<p>4.5.4. Multi–document ENERTEX&nbsp;121</p>
<p>4.5.5. MEAD 123</p>
<p>4.5.6. CATS 126</p>
<p>4.5.7. SUMUM and SUMMA&nbsp;128</p>
<p>4.5.8. NEO–CORTEX 131</p>
<p>4.6. Update summarization&nbsp;134</p>
<p>4.6.1. Update summarization pilot task at DUC 2007&nbsp;134</p>
<p>4.6.2. Update summarization task at TAC 2008 and 2009&nbsp;135</p>
<p>4.6.3. A minimization–maximization approach 138</p>
<p>4.6.4. The ICSI system at TAC 2008 and 2009 142</p>
<p>4.6.5. The CBSEAS system at TAC 145</p>
<p>4.7. Multidocument summarization by polytopes&nbsp;146</p>
<p>4.8. Redundancy&nbsp;148</p>
<p>4.9. Conclusion&nbsp;149</p>
<p>PART 2. EMERGING SYSTEMS&nbsp;151</p>
<p>CHAPTER 5. MULTI AND CROSS–LINGUAL SUMMARIZATION&nbsp;153</p>
<p>5.1. Multilingualism, the web and automatic summarization 153</p>
<p>5.2. Automatic multilingual summarization&nbsp;156</p>
<p>5.3. MEAD 159</p>
<p>5.4. SUMMARIST&nbsp;159</p>
<p>5.5. COLUMBIA NEWSBLASTER 161</p>
<p>5.6. NEWSEXPLORER 163</p>
<p>5.7. GOOGLE NEWS 166</p>
<p>5.8. CAPS&nbsp;166</p>
<p>5.9. Automatic cross–lingual summarization 168</p>
<p>5.9.1. The quality of machine translation&nbsp;169</p>
<p>5.9.2. A graph–based cross–lingual summarizer&nbsp;172</p>
<p>5.10. Conclusion&nbsp;177</p>
<p>CHAPTER 6. SOURCE AND DOMAIN–SPECIFIC SUMMARIZATION&nbsp;179</p>
<p>6.1. Genre, specialized documents and automatic summarization 179</p>
<p>6.2. Automatic summarization and organic chemistry 183</p>
<p>6.2.1. YACHS2&nbsp;183</p>
<p>6.3. Automatic summarization and biomedicine 189</p>
<p>6.3.1. SUMMTERM 189</p>
<p>6.3.2. A linguistic–statistical approach&nbsp;196</p>
<p>6.4. Summarizing court decisions 201</p>
<p>6.5. Opinion summarization&nbsp;204</p>
<p>6.5.1. CBSEAS at TAC 2008 opinion task 204</p>
<p>6.6.Web summarization 206</p>
<p>6.6.1. Web page summarization 206</p>
<p>6.6.2. OCELOT and the statistical gist&nbsp;207</p>
<p>6.6.3. Multitweet summarization 211</p>
<p>6.6.4. Email summarization&nbsp;215</p>
<p>6.7. Conclusion&nbsp;216</p>
<p>CHAPTER 7. TEXT ABSTRACTING 219</p>
<p>7.1. Abstraction–based automatic summarization 219</p>
<p>7.2. Systems using natural language generation 220</p>
<p>7.3. An abstract generator using information extraction&nbsp;222</p>
<p>7.4. Guided summarization and a fully abstractive approach 223</p>
<p>7.5. Abstraction–based summarization via conceptual graphs 226</p>
<p>7.6. Multisentence fusion 227</p>
<p>7.6.1. Multisentence fusion via graphs&nbsp;228</p>
<p>7.6.2. Graphs and keyphrase extraction: the TAKAH&Eacute; system 231</p>
<p>7.7. Sentence compression&nbsp;232</p>
<p>7.7.1. Symbolic approaches&nbsp;235</p>
<p>7.7.2. Statistical approaches&nbsp;236</p>
<p>7.7.3. A statistical–linguistic approach&nbsp;238</p>
<p>7.8. Conclusion&nbsp;241</p>
<p>CHAPTER 8. EVALUATING DOCUMENT SUMMARIES&nbsp;243</p>
<p>8.1. How can summaries be evaluated?&nbsp;243</p>
<p>8.2. Extrinsic evaluations 245</p>
<p>8.3. Intrinsic evaluations 246</p>
<p>8.3.1. The baseline summary&nbsp;247</p>
<p>8.4. TIPSTER SUMMAC evaluation campaigns 248</p>
<p>8.4.1. Ad hoc task&nbsp;249</p>
<p>8.4.2. Categorization task&nbsp;249</p>
<p>8.4.3. Question–answering task 250</p>
<p>8.5. NTCIR evaluation campaigns 250</p>
<p>8.6. DUC/TAC evaluation campaigns 251</p>
<p>8.6.1. Manual evaluations&nbsp;252</p>
<p>8.7. CLEF–INEX evaluation campaigns&nbsp;254</p>
<p>8.8. Semi–automatic methods for evaluating summaries 256</p>
<p>8.8.1. Level of granularity: the sentence&nbsp;256</p>
<p>8.8.2. Level of granularity: words&nbsp;257</p>
<p>8.9. Automatic evaluation via information theory&nbsp;263</p>
<p>8.9.1. Divergence of probability distribution 265</p>
<p>8.9.2. FRESA 266</p>
<p>8.10. Conclusion&nbsp;271</p>
<p>CONCLUSION&nbsp;275</p>
<p>APPENDIX 1. INFORMATION RETRIEVAL, NLP AND ATS&nbsp;281</p>
<p>APPENDIX 2. AUTOMATIC TEXT SUMMARIZATION RESOURCES 305</p>
<p>BIBLIOGRAPHY 309</p>
<p>INDEX&nbsp;343</p>

Managementboek Top 100

Rubrieken

    Personen

      Trefwoorden

        Automatic Text Summarization