Unsupervised News Video Segmentation by Combined Audio-Video Analysis

De Santo, M.; Percannella, G.; Sansone, C.; Vento, M.

doi:10.1007/11848035_37

M. De Santo²⁰,
G. Percannella²⁰,
C. Sansone²¹ &
…
M. Vento²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4105))

Included in the following conference series:

International Workshop on Multimedia Content Representation, Classification and Security

1437 Accesses
6 Citations

Abstract

Segmenting news video into stories is among key issues for achieving efficient treatment of news-based digital libraries. In this paper we present a novel unsupervised algorithm that combines audio and video information for automatic partitioning news videos into stories. The proposed algorithm is based on the detection of anchor shots within the video. In particular, a set of audio/video templates of anchorperson shots is first extracted in an unsupervised way, then shots are classified by comparing them to the templates using both video and audio similarity. Finally, a story is obtained by linking each anchor shot with all successive shots until another anchor shot, or the end of the news video, occurs. Audio similarity is evaluated by means of a new index and helps to achieve better performance in anchor shot detection than pure video approach. The method has been tested on a wide database and compared with other state-of-the-art algorithms, demonstrating its effectiveness with respect to them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kraaij, W., Smeaton, A.F., Over, P., Arlandis, J.: TRECVID 2004 - An Overview. TREC Video Retrieval Evaluation Online Proc., http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html
Wang, C., Wang, Y., Liu, H.Y., He, Y.X.: Automatic Story Segmentation of News Video Based on Audio-Visual Features and Text Information. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi’an, November 2–5, pp. 3008–3011 (2003)
Google Scholar
Wei, W., Gao, W.: Automatic Segmentation of News Items Based on Video and Audio Features. Journal of Computer Science and Technology 17(2), 189–195 (2002)
Article Google Scholar
De Santo, M., Percannella, G., Sansone, C., Vento, M.: An Unsupervised Shot Classification System for News Video Story Detection. In: Abate, A.F., Nappi, M., Sebillo, M. (eds.) Multimedia Database and Image Communication, pp. 93–104. World Scientific Publ., Singapore (2005)
Google Scholar
Gao, X., Tang, X.: Unsupervised Video-Shot Segmentation and Model-Free Anchorperson Detection for News Video Story Parsing. IEEE Trans. on Circ. and Syst. for Video Tech. 12(9), 765–776 (2002)
Article Google Scholar
Swanberg, D., Shu, C.F., Jain, R.: Knowledge Guided Parsing in Video Databases. In: Proc. of SPIE Symposium on Electronic Imaging: Science and Technology, San Jose, CA, pp. 13–24 (1993)
Google Scholar
Smoliar, S.W., Zhang, H.J., Tao, S.Y., Gong, Y.: Automatic Parsing and Indexing of News Video. Multimedia Systems 2(6), 256–265 (1995)
Article Google Scholar
Hanjalic, A., Lagendijk, R.L., Biemond, J.: Semi-Automatic News Analysis, Indexing, and Classification System Based on Topics Preselection. In: Proc. of SPIE, Electronic Imaging, San Jose, CA (1999)
Google Scholar
Bertini, M., Del Bimbo, A., Pala, P.: Content-Based Indexing and Retrieval of TV News. Pattern Recognition Letters 22, 503–516 (2001)
Article MATH Google Scholar
Snoek, C.G.M., Worring, M.: Multimodal Video Indexing: A Review of the State-of-the-art. Multimedia Tools and Applications 25, 5–35 (in press, 2005)
Article Google Scholar
Qi, W., Gu, L., Jiang, H., Chen, X.R., Zhang, H.J.: Integrating Visual, Audio and Text Analysis for News Video. In: 7th IEEE Int. Conf. on Image Processing, Vancouver, British Columbia, Canada (2000)
Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
MATH Google Scholar
Viola, P., Jones, M.: Rapid Object Detection Using a Boosted Cascade of Simple Features. In: Proc. of the IEEE CVPR Conference, vol. 1, pp. 511–518 (2001)
Google Scholar
Lee, H.Y., Lee, H.K., Ha, Y.H.: Spatial Color Descriptor for Image Retrieval and Video Segmentation. IEEE Transactions on Multimedia 5(3), 358–367 (2003)
Article MathSciNet Google Scholar
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A Real-Time Text-Independent Speaker Identification System. In: IEEE ICIAP Conference, Mantova, Italy, pp. 632–637 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dip. di Ingegneria dell’Informazione ed Ingegneria Elettrica, Università degli Studi di Salerno, Via Ponte Don Melillo, I, I-84084, Fisciano (SA), Italy
M. De Santo, G. Percannella & M. Vento
Dipartimento di Informatica e Sistemistica, Università degli Studi di Napoli “Federico II”, Via Claudio 21, I-80125, Napoli, Italy
C. Sansone

Authors

M. De Santo
View author publications
You can also search for this author in PubMed Google Scholar
G. Percannella
View author publications
You can also search for this author in PubMed Google Scholar
C. Sansone
View author publications
You can also search for this author in PubMed Google Scholar
M. Vento
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Multimedia Signal Processing and Pattern Recognition Lab., Dept. of Electronics and Communications Eng., Istanbul Technical University, 34469, Istanbul, Turkey
Bilge Gunsel
Department of Computer Science and Engineering, Michigan State University,
Anil K. Jain
College of Engineering, Koç University, 34450, Sarıyer, İstanbul, Turkey
A. Murat Tekalp
Department of Electrical and Electronics Engineering, Boğaziçi University, Istanbul, Turkey
Bülent Sankur

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Santo, M., Percannella, G., Sansone, C., Vento, M. (2006). Unsupervised News Video Segmentation by Combined Audio-Video Analysis. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds) Multimedia Content Representation, Classification and Security. MRCS 2006. Lecture Notes in Computer Science, vol 4105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11848035_37

Download citation

DOI: https://doi.org/10.1007/11848035_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39392-4
Online ISBN: 978-3-540-39393-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics