Skip to main content

Natural Language Processing Across Time: An Empirical Investigation on Italian

  • Conference paper
Book cover Advances in Natural Language Processing (GoTAL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

Abstract

In this paper, we study how existing natural language processing tools for Italian perform on ancient texts. The first goal is to understand to what extent such tools can be used “as they are” for the automatic analysis of old literary works. Indeed, while NLP tools for Italian achieve today good performance, it is not clear if they could be successfully used for the humanities, to support the critical study of historical works. Our analysis will show how tools’ performance systematically vary across different time periods, and within literary movements. As a second goal, we want to verify whether or not simple customization methods can improve the tools performance over the old works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. TEIconsortium: TEI P5: Guidelines for Electronic Text Encoding and Interchange. TEI Consortium (2005)

    Google Scholar 

  2. Moon, T., Baldridge, J.: Part-of-speech tagging for middle English through alignment and projection of parallel diachronic texts. In: Proceedings of the 2007 JointConference on Empirical Methods in Natural Language Processing and ComputationalNatural Language Learning (EMNLP-CoNLL), pp. 390–399 (2007)

    Google Scholar 

  3. Rocio, V., Alves, M.A., Lopes, J.G.P., Xavier, M.F., Vicente, G.: Automated creation of a partially syntactially annotated corpus of medieval portuguese using contemporary portuguese resources. In: Proceedings of the ATALA workshop on Treebanks, Paris, France (1999)

    Google Scholar 

  4. Britto, H., Finger, M., Galves, C.: Computational and linguistic aspects of the construction of the Tycho Brahe Parsed Corpus of Historical Portuguese. Gunter Narr Verlag, Tubingen (2002)

    Google Scholar 

  5. Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21(4) (1995)

    Google Scholar 

  6. Yarowsky, D., Ngai, G.: Inducing multilingual pos taggers and np bracketers via robust projection across aligned corpora. In: Proceedings of NAACL 2001: Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, Morristown, NJ, pp. 1–8 (2001)

    Google Scholar 

  7. Kroch, A., Taylor, A.: Penn-helsinki parsed corpus of middle english (2000)

    Google Scholar 

  8. Kroch, A., Santorini, B., Delfs, L.: Penn-helsinki parsed corpus of early modern english (2004)

    Google Scholar 

  9. Taylor, A., Warner, A., Pintzuk, S., Beths, F.: The york-toronto-helsinki parsed corpus of old english prose (2003)

    Google Scholar 

  10. Pollidori, V., Larson, P.: Il Tesoro della Lingua Italiana delle Origini(TLIO): il progetto lessicograco e i suoi risultati attuali. Franco Cesati Editore, Dordrecht, Germany (2005)

    Google Scholar 

  11. Barbera, Manuel Barbera, C.M., Marello, C.: Corpus Taurinense: italiano antico annotato in modo nuovo. Bulzoni Editore, Roma, Dordrecht, Germany (2003)

    Google Scholar 

  12. Basili, R., Di Stefano, A., Gigliucci, R., Moschitti, A., Pennacchiotti, M.: Automatic analysis and annotation of literary texts. In: Wokshop on Cultural Heritage, 9th AIIA Conference, Milan, Italy (2005)

    Google Scholar 

  13. Basili, R., Zanzotto, F.M.: Parsing engineering and empirical robustness. Natural Language Engineering 8/2-3 (2002)

    Google Scholar 

  14. Collins, M.: Head-driven statistical models for natural language parsing. Computational Linguistics 29(4) (December 2003)

    Google Scholar 

  15. Charniak, C.: A maximum-entropy-inspired parser. In: NAACL, Seattle, Washington (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pennacchiotti, M., Zanzotto, F.M. (2008). Natural Language Processing Across Time: An Empirical Investigation on Italian. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85287-2_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85286-5

  • Online ISBN: 978-3-540-85287-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics