Skip to main content
Log in

Segmentation of news videos based on audio-video information

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose an innovative architecture to segment a news video into the so-called “stories” by both using the included video and audio information. Segmentation of news into stories is one of the key issues for achieving efficient treatment of news-based digital libraries. While the relevance of this research problem is widely recognized in the scientific community, we are in presence of a few established solutions in the field. In our approach, the segmentation is performed in two steps: first, shots are classified by combining three different anchor shot detection algorithms using video information only. Then, the shot classification is improved by using a novel anchor shot detection method based on features extracted from the audio track. Tests on a large database confirm that the proposed system outperforms each single video-based method as well as their combination.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Kraaij W, Smeaton AF, Over P, Arlandis J “TRECVID 2004–An Overview”, TREC Video Retrieval Evaluation Online Proceedings, http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html

  2. De Santo M, Percannella G, Sansone C, Vento M (2004) “Combining experts for anchorperson shot detection in news videos”, Pattern Analysis and Applications, vol. 7 no. 4, pp. 447–460, Springer, London

  3. De Santo M, Percannella G, Sansone C, Vento M (2004) “A Multi-Expert Approach for Shot Classification in News Videos”, Lecture Notes in Computer Science vol. 3211, Springer, Berlin, pp. 564–571

  4. Snoek CGM, Worring M (2005) “Multimodal video indexing: a review of the state-of-the-art”. Multimedia Tools Appl 25: 5–35

    Article  Google Scholar 

  5. Gunsel B, Ferman AM, Tekalp AM (1996) “Video indexing through integration of syntactic and semantic features” In Proc. Workshop Applications of Computer Vision, Sarasota, FL, pp 90–95

  6. Swanberg D, Shu CF, Jain R (1993) “Knowledge guided parsing in video databases” Proc. of SPIE Symposium on Electronic Imaging: Science and Technology, San Jose, CA, pp. 13–24

  7. Smoliar SW, Zhang HJ, Tao SY, Gong Y (1995) “Automatic parsing and indexing of news video”. Multimedia Systems 2(6):256–265

    Article  Google Scholar 

  8. Hanjalic A, Lagendijk RL, Biemond J (1999) “Semi-Automatic News Analysis, Indexing, and Classification System Based on Topics Preselection”, Proc. of SPIE: Electronic Imaging: Storage and Retrieval of Image and Video Databases, San Jose (CA)

  9. Avrithis Y, Tsapatsoulis N, Kollias S (2000) “Broadcast news parsing using visual cues: A robust face detection approach”, Proc. IEEE Int. Conf. on Multimedia and Expo, vol. 3, pp. 1469–1472

  10. Gao X, Tang X (2002) “Unsupervised Video-Shot Segmentation and Model-Free Anchorperson Detection for News Video Story Parsing”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12, No. 9, pp. 765 776

  11. Bertini M, Del Bimbo A, Pala P (2001) “Content-based indexing and retrieval of TV News”. Pattern Recognition Letters 22:503–516

    Article  MATH  Google Scholar 

  12. Eickeler S, Muller S (1999) “Content-based video indexing of TV broadcast news using Hidden Markov Models”, Proc. IEEE International Conference on Acoustic, Speech, and Signal Processing, pp. 2997–3000

  13. Chaisorn L, Chua TS, Lee CH (2003) “A multi-modal approach to story segmentation for news video”. World wide Web 6:187–208

    Article  Google Scholar 

  14. Wang C, Wang Y, Liu HY, He YX (2003) “Automatic Story Segmentation of News Video Based on Audio-Visual Features and Text Information”, Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi’an, 2–5 November, pp 3008–3011

  15. Wei W, Gao W (2002) Automatic segmentation of news items based on video and audio features. J Comput Sci Technol 17(2):189–195

    Article  Google Scholar 

  16. Qi W, Gu L, Jiang H, Chen XR, Zhang HJ (2000) “Integrating Visual, Audio And Text Analysis For News Video”, 7th IEEE International Conference on Image Processing, Vancouver, British Columbia, Canada,10–13 September

  17. Huang YS, Suen CY (1995) “A method of combining multiple experts for the recognition of unconstrained handwritten numerals”. IEEE Trans Pattern Analysis Machine Intell 17(1):90–94

    Article  Google Scholar 

  18. Foggia P, Sansone C, Tortorella F, Vento M (1999) “Multiclassification: Reject Criteria for the Bayesian Combiner”. Pattern Recognit Pergamon 32(8):1435–1447

    Article  Google Scholar 

  19. Sansone C, Tortorella F, Vento M (2001) “A Classification Reliability Driven Reject Rule for Multi-Expert Systems”. Int J Pattern Recognit Artificial Intell 15(6):885–904

    Article  Google Scholar 

  20. Cordella LP, Foggia P, Sansone C, Vento M (2003) “A Real-Time Text-Independent Speaker Identification System”, Proceedings of the 12th International Conference on Image Analysis and Processing, IEEE Computer Society Press, Mantova, September 17–19, pp 632–637

  21. Xu L, Krzyzak A, Oja E (1993) “Rival penalized competitive learning for clustering analysis, RBF net and curve detection”. IEEE Trans Neural Networks 4:636–649

    Article  Google Scholar 

  22. Murthy HA, Beaufays F, Heck LP, Weintraub M (1999) “Robust text-independent speaker identification over telephone channels”. IEEE Trans Speech and Audio Processing 7(5):554–568

    Article  Google Scholar 

  23. Xu L, Krzyzak A, Suen CY (1992) “Methods of combining multiple classifiers and their application to handwritten numeral recognition”. IEEE Trans Systems, Man and Cybern 22(3):418–435

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gennaro Percannella.

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Santo, M., Percannella, G., Sansone, C. et al. Segmentation of news videos based on audio-video information. Pattern Anal Applic 10, 135–145 (2007). https://doi.org/10.1007/s10044-006-0055-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-006-0055-5

Keywords

Navigation