Natural Language Processing Across Time: An Empirical Investigation on Italian

Pennacchiotti, Marco; Zanzotto, Fabio Massimo

doi:10.1007/978-3-540-85287-2_36

Marco Pennacchiotti² &
Fabio Massimo Zanzotto³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

International Conference on Natural Language Processing

Abstract

In this paper, we study how existing natural language processing tools for Italian perform on ancient texts. The first goal is to understand to what extent such tools can be used “as they are” for the automatic analysis of old literary works. Indeed, while NLP tools for Italian achieve today good performance, it is not clear if they could be successfully used for the humanities, to support the critical study of historical works. Our analysis will show how tools’ performance systematically vary across different time periods, and within literary movements. As a second goal, we want to verify whether or not simple customization methods can improve the tools performance over the old works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

TEIconsortium: TEI P5: Guidelines for Electronic Text Encoding and Interchange. TEI Consortium (2005)
Google Scholar
Moon, T., Baldridge, J.: Part-of-speech tagging for middle English through alignment and projection of parallel diachronic texts. In: Proceedings of the 2007 JointConference on Empirical Methods in Natural Language Processing and ComputationalNatural Language Learning (EMNLP-CoNLL), pp. 390–399 (2007)
Google Scholar
Rocio, V., Alves, M.A., Lopes, J.G.P., Xavier, M.F., Vicente, G.: Automated creation of a partially syntactially annotated corpus of medieval portuguese using contemporary portuguese resources. In: Proceedings of the ATALA workshop on Treebanks, Paris, France (1999)
Google Scholar
Britto, H., Finger, M., Galves, C.: Computational and linguistic aspects of the construction of the Tycho Brahe Parsed Corpus of Historical Portuguese. Gunter Narr Verlag, Tubingen (2002)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21(4) (1995)
Google Scholar
Yarowsky, D., Ngai, G.: Inducing multilingual pos taggers and np bracketers via robust projection across aligned corpora. In: Proceedings of NAACL 2001: Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, Morristown, NJ, pp. 1–8 (2001)
Google Scholar
Kroch, A., Taylor, A.: Penn-helsinki parsed corpus of middle english (2000)
Google Scholar
Kroch, A., Santorini, B., Delfs, L.: Penn-helsinki parsed corpus of early modern english (2004)
Google Scholar
Taylor, A., Warner, A., Pintzuk, S., Beths, F.: The york-toronto-helsinki parsed corpus of old english prose (2003)
Google Scholar
Pollidori, V., Larson, P.: Il Tesoro della Lingua Italiana delle Origini(TLIO): il progetto lessicograco e i suoi risultati attuali. Franco Cesati Editore, Dordrecht, Germany (2005)
Google Scholar
Barbera, Manuel Barbera, C.M., Marello, C.: Corpus Taurinense: italiano antico annotato in modo nuovo. Bulzoni Editore, Roma, Dordrecht, Germany (2003)
Google Scholar
Basili, R., Di Stefano, A., Gigliucci, R., Moschitti, A., Pennacchiotti, M.: Automatic analysis and annotation of literary texts. In: Wokshop on Cultural Heritage, 9th AIIA Conference, Milan, Italy (2005)
Google Scholar
Basili, R., Zanzotto, F.M.: Parsing engineering and empirical robustness. Natural Language Engineering 8/2-3 (2002)
Google Scholar
Collins, M.: Head-driven statistical models for natural language parsing. Computational Linguistics 29(4) (December 2003)
Google Scholar
Charniak, C.: A maximum-entropy-inspired parser. In: NAACL, Seattle, Washington (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computational Linguistics, Saarland University, Saarbrücken, Germany
Marco Pennacchiotti
DISP, Universitá di Roma Tor Vergata, Roma, Italy
Fabio Massimo Zanzotto

Authors

Marco Pennacchiotti
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Massimo Zanzotto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 41296, Göteborg, Sweden
Bengt Nordström & Aarne Ranta &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pennacchiotti, M., Zanzotto, F.M. (2008). Natural Language Processing Across Time: An Empirical Investigation on Italian. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_36

Download citation

DOI: https://doi.org/10.1007/978-3-540-85287-2_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics