Abstract
Automatic text summarization is the process of reducing the size of a text document, to create a summary that retains the most important points of the original document. It can thus be applied to summarize the original document by decreasing the importance or removing part of the content. The contribution of this paper in this field is twofold. First we show that text summarization can improve the performance of classical text clustering algorithms, in particular by reducing noise coming from long documents that can negatively affect clustering results. Moreover, the clustering quality can be used to quantitatively evaluate different summarization methods. In this regards, we propose a new graph-based summarization technique for keyphrase extraction, and use the Classic4 and BBC NEWS datasets to evaluate the improvement in clustering quality obtained using text summarization.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
\(f=l=1\), where f and l are the number of sentences in FS and LS, respectively.
- 2.
- 3.
- 4.
- 5.
- 6.
For the convergence of HITS, we stop iterating when for any vertex i in the graph the difference between the scores computed at two successive iterations fall below a given threshold:\(\frac{|x_{i}^{k+1}-x_{i}^{k}|}{x_{i}^{k}}<10^{-3}\) [10].
- 7.
References
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces, vol. 29. ACM (2000)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Dash, M., Koot, P.W.: Feature selection for clustering. In: Encyclopedia of Database Systems, pp. 1119–1125. Springer, New York (2009)
Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd international conference on Machine learning, pp. 377–384. ACM (2006)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out, Proceedings of the ACL-04 workshop, vol. 8 (2004)
Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24. Association for Computational Linguistics (2008)
Liu, T., Liu, S., Chen, Z., Ma, W.-Y.: An evaluation on feature selection for text clustering. ICML 3, 488–495 (2003)
Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, p. 20. Association for Computational Linguistics (2004)
Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. Association for Computational Linguistics (2004)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. (CSUR) 46(1), 13 (2013)
Wyse, N., Dubes, R., Jain, A.K.: A critical evaluation of intrinsic dimensionality algorithms. In: Pattern recognition in Practice, pp. 415–425 (1980)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Pourvali, M., Orlando, S., Gharagozloo, M. (2015). Improving Clustering Quality by Automatic Text Summarization. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-28940-3_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28939-7
Online ISBN: 978-3-319-28940-3
eBook Packages: Computer ScienceComputer Science (R0)